What are the responsibilities and job description for the Application Monitoring Engineer with Dynatrace position at Prudent Technologies and Consulting, Inc.?
Operations Awareness Engineer / Application Monitoring Engineer with Dynatrace
This is a backfill position @ Fort Worth Texas 76131
4 days a week onsite
After hours/weekend work schedule may require Sat or Sun periodically but for the most part is M-F 7:45/8am to 4:45/5pm or so
This role is more of a monitoring/engineering type of role
Application Monitoring
Incident mgt skills,
Session replays
Need Dynatrace and CloudWatch with AWS specifically- are most critical skills needed
Need to be more of a monitoring/alerting engineer
One can't be just a Dynatrace user, but rather need someone who can work with the application team (more of an engineering role)
Identifying threshold issues, memory issues, user issues, etc
Change management and problem management experience are necessary
This is a forward facing role
Application performance management engineer would be ideal
Looking for someone with minimum 4-5 years experience would be perfect (but open to less experience for the right candidate)
DDU, DEM license familiarity required
Needs to know what session replay
Position Overview: The Operations Awareness Engineer will monitor, alert, and support our systems to ensure seamless operations. Ideal candidates will have 3-5 years of experience with Dynatrace, CloudWatch or similar tools, and a solid understanding of cloud architecture and DevOps principles.
Key Responsibilities:
• Incident and System Management: Collaborate with internal teams and suppliers to analyze and resolve critical IT and Telecom service interruptions, and protect system availability through incident, problem, and change management.
• System Monitoring and Optimization: Monitor systems for faults, identify optimization opportunities, and implement tools and process changes to improve monitoring and alerting.
• Incident Response and Root Cause Analysis: Work with major incident response teams for escalations and monitoring during major incidents
Qualifications:
• Self-Motivated: Ability to define, develop, and execute plans; manage system outages; and handle high-stress situations.
• Availability: Able to work in a 24/7 environment and provide on-call support.
• Experience: Proven experience interacting at all levels.
Technical Skills:
o Bachelor's degree in Computer Science, Information Systems, or Engineering preferred.
o Technical certifications or 5 years in Event monitoring and alerting, DevOps, Infrastructure Support or IT Major Incident Management
o Experience with monitoring tools (Dynatrace, CloudWatch, Zabbix, SCOM).
o DevOps application performance tuning.
o Strong writing skills for documentation.
o Proficient in distributed systems/administration (Windows, Unix, Linux, VMWare, etc.).
o Knowledge of ITIL best practices (certification is a plus).
o Familiarity with SDLC lifecycle.
o Experience in SLA/KPI-driven environments.
o ServiceNow proficiency.
o General scripting/programming skills (Python, Node.js, Ruby, Perl, Bash/sh).