What are the responsibilities and job description for the Monitoring and Alerting Engineer position at Digital Minds Technologies Inc.?
Job Details
Job Title: Monitoring and Alerting Engineer
Location: Fort worth, TX (4 days/wk onsite required)
Project Duration: Long Term Contract Position Overview: The Operations Awareness Engineer will monitor, alert, and support our systems to ensure seamless operations. Ideal candidates will have 3-5 years of experience with Dynatrace, CloudWatch or similar tools, and a solid understanding of cloud architecture and DevOps principles.
Key Responsibilities:
- Incident and System Management: Collaborate with internal teams and suppliers to analyze and resolve critical IT and Telecom service interruptions, and protect system availability through incident, problem, and change management.
- System Monitoring and Optimization: Monitor systems for faults, identify optimization opportunities, and implement tools and process changes to improve monitoring and alerting.
- Incident Response and Root Cause Analysis: Work with major incident response teams for escalations and monitoring during major incidents
Qualifications:
- Self-Motivated: Ability to define, develop, and execute plans; manage system outages; and handle high-stress situations.
- Availability: Able to work in a 24/7 environment and provide on-call support.
- Experience: Proven experience interacting at all levels.
- Technical Skills:
- Bachelor's degree in
Computer Science, Information Systems, or Engineering preferred. - Technical certifications or 5 years in Event monitoring and alerting, DevOps, Infrastructure Support or IT Major Incident Management
- Experience with monitoring tools (Dynatrace, CloudWatch, Zabbix, SCOM).
- DevOps application performance tuning.
- Strong writing skills for documentation.
- Proficient in distributed systems/administration (Windows, Unix, Linux, VMWare, etc.).
- Knowledge of ITIL best practices (certification is a plus).
- Familiarity with SDLC lifecycle.
- Experience in SLA/KPI-driven environments.
- ServiceNow proficiency.
- General scripting/programming skills (Python, Node.js, Ruby, Perl, Bash/sh).
Preferred Qualifications:
- Cloud certifications (AWS, Azure, etc.).
- Experience with infrastructure as code tools (Terraform, Ansible, etc.).
- ITIL V3 or V4 certification.
- Advanced technical skills in various operating systems and environments.
- Proven ability to improve monitoring and alerting processes