What are the responsibilities and job description for the Observability Engineer /Monitoring Engineer position at ACL Digital?
Datadog Engineer / Observability Engineer
Location: Roswell, GA
Mode: Work from Office (4 days)
Duration: 11 Months contract
The NOC Lead will manage cloud infrastructure operations and ensure optimal performance of the project’s cloud environment. This role demands strong expertise in cloud platforms, incident management, operations best practices, and hands-on experience with monitoring tools and observability. The NOC Lead will collaborate closely with the customer’s stakeholders to drive operational excellence, streamline processes, and enhance cloud systems' reliability, scalability, and security.
Key Responsibilities:
Cloud Infrastructure Management:
Manage, monitor, and optimize cloud infrastructure across platforms (e.g., AWS, Azure).
Ensure high availability, scalability, and cost-efficiency of cloud systems.
Oversee deployment and maintenance of applications and services in the cloud environment.
Monitoring Expertise:
Design and implement advanced monitoring, alerting, and observability solutions using Monitoring tools like (Datadog, Grafana, Prometheous).
Configure dashboards, custom metrics, and anomaly detection to provide deep insights into system performance.
Conduct training sessions for the customer’s team on effective Datadog usage.
Incident and Problem Management:
Take ownership of incident management, ensuring rapid detection, escalation, and resolution of issues.
Oversee real-time incident detection, escalation, and resolution processes.
Perform root cause analysis and implement long-term solutions to prevent recurrence.
Develop and enforce operational playbooks for handling critical incidents.
Security and Compliance:
Ensure adherence to security best practices and compliance with customer and industry standards.
Collaborate with security teams to implement identity and access controls, encryption, and vulnerability management.
Reporting and Optimization:
Generate and present regular operational reports to customer stakeholders, including SLA adherence and performance metrics.
Analyze trends to identify areas for optimization and proactively recommend improvements.
Leadership and Collaboration:
Lead and mentor the CloudOps team to deliver top-notch operational performance.
Collaborate with development, QA, and security teams to align operations with business goals.
Present operational reports and insights, including SLA adherence and mobile application performance metrics.
Process Improvement and Automation:
Continuously analyze current processes and identify areas for improvement.
Implement automation tools and techniques to enhance efficiency.
Establish and document standard operating procedures (SOPs) for NOC operations.
Implement Infrastructure as Code (IaC) using tools such as Terraform, CloudFormation, or Ansible.
Automate repetitive operational tasks to enhance efficiency.
Required Qualifications:
Technical Skills:
Bachelor’s degree in Computer Science, IT, or a related field; a Master’s degree is a plus.
8 years of experience in cloud operations, with at least 3 years in a leadership role.
Strong expertise in monitoring tools like Datadog, Grafana, Prometheous including advanced configuration and monitoring setup.
Proficiency in cloud platforms such as AWS, Azure, or Google Cloud Platform.
Hands-on experience with automation tools like Terraform, Ansible, or CloudFormation.
Solid understanding of DevOps practices, CI/CD pipelines, and container orchestration (e.g., Docker, Kubernetes).
Certifications:
Datadog certifications or proven expertise in the platform is a significant advantage.
Soft Skills:
Strong leadership and team management skills with the ability to work onsite in a customer-facing role.
Excellent communication and interpersonal skills for effective collaboration with stakeholders.
Proactive and solution-oriented mindset to drive improvements and resolve challenges.
Ability to work under pressure, prioritize tasks, and manage multiple priorities effectively