What are the responsibilities and job description for the Cloud & Infrastructure Monitoring Engineer position at CornerStone Technology Talent Services?
Job Title: Cloud & Infrastructure Monitoring Engineer
Location: North Fort Worth – 4 days onsite required
Work Authorization: USC or GC candidates only
Work Schedule: 24/7 Operations Environment – Some weekend availability and after-hours support required
*Kindly no C2C or subcontractors for this role
Position Overview:
We are seeking a highly experienced Cloud & Infrastructure Monitoring Engineer with 10 years of experience in Monitoring and Alerting. The ideal candidate will have expertise in Dynatrace, CloudWatch, or similar monitoring tools and a strong background in cloud architecture and DevOps principles.
Key Responsibilities:
- Incident and System Management: Collaborate with internal teams and vendors to analyze and resolve critical IT and Telecom service interruptions, ensuring system availability through incident, problem, and change management.
- System Monitoring and Optimization: Proactively monitor systems, identify optimization opportunities, and implement tools/process improvements to enhance monitoring and alerting.
- Incident Response & Root Cause Analysis: Engage with major incident response teams for escalations and real-time monitoring during critical events.
Required Qualifications:
Experience: 10 years in Event Monitoring, Alerting, DevOps, Infrastructure Support, or IT Major Incident Management.
Technical Skills:
- Monitoring Tools: Dynatrace, CloudWatch, Zabbix, SCOM, or similar.
- Operating Systems: Proficiency in Windows, Unix, Linux, and VMware.
- Scripting: Python, Node.js, Ruby, Perl, Bash/sh.
- ServiceNow: Strong familiarity with IT service management.
- ITIL Framework: Understanding of ITIL best practices (certification preferred).
- Cloud & DevOps: Experience in distributed systems, performance tuning, and infrastructure automation.
- SLAs & KPIs: Proven experience in service-level and performance-driven environments.
- Soft Skills: Excellent communication, documentation, and problem-solving abilities.
- Availability: Must be able to work in a 24/7 environment and provide on-call support when needed.
Preferred Qualifications:
- Cloud Certifications: AWS, Azure, or similar.
- Infrastructure as Code: Terraform, Ansible, etc.
- ITIL V3 or V4 Certification.
- Advanced Technical Expertise in various operating systems and enterprise environments.
- Proven ability to enhance monitoring and alerting processes.