What are the responsibilities and job description for the NOC/Ops Lead position at Calsoft Labs?
Job Details
NOC/Ops Lead
11 Months contract with possible ext.
Roswell, GA - Work from Office (4 days)
The NOC Lead will manage cloud infrastructure operations and ensure optimal performance of the project s cloud environment. This role demands strong expertise in cloud platforms, incident management, operations best practices, and hands-on experience with monitoring tools and observability. The NOC Lead will collaborate closely with the customer s stakeholders to drive operational excellence, streamline processes, and enhance cloud systems' reliability, scalability, and security.
Key Responsibilities:
Cloud Infrastructure Management:
- Manage, monitor, and optimize cloud infrastructure across platforms (e.g., AWS, Azure).
- Ensure high availability, scalability, and cost-efficiency of cloud systems.
- Oversee deployment and maintenance of applications and services in the cloud environment.
Monitoring Expertise:
- Design and implement advanced monitoring, alerting, and observability solutions using Monitoring tools like (Datadog, Grafana, Prometheous).
- Configure dashboards, custom metrics, and anomaly detection to provide deep insights into system performance.
- Conduct training sessions for the customer s team on effective Datadog usage.
Incident and Problem Management:
- Take ownership of incident management, ensuring rapid detection, escalation, and resolution of issues.
- Oversee real-time incident detection, escalation, and resolution processes.
- Perform root cause analysis and implement long-term solutions to prevent recurrence.
- Develop and enforce operational playbooks for handling critical incidents.
Security and Compliance:
- Ensure adherence to security best practices and compliance with customer and industry standards.
- Collaborate with security teams to implement identity and access controls, encryption, and vulnerability management.
Reporting and Optimization:
- Generate and present regular operational reports to customer stakeholders, including SLA adherence and performance metrics.
- Analyze trends to identify areas for optimization and proactively recommend improvements.
Leadership and Collaboration:
- Lead and mentor the CloudOps team to deliver top-notch operational performance.
- Collaborate with development, QA, and security teams to align operations with business goals.
- Present operational reports and insights, including SLA adherence and mobile application performance metrics.
Process Improvement and Automation:
- Continuously analyze current processes and identify areas for improvement.
- Implement automation tools and techniques to enhance efficiency.
- Establish and document standard operating procedures (SOPs) for NOC operations.
- Implement Infrastructure as Code (IaC) using tools such as Terraform, CloudFormation, or Ansible.
- Automate repetitive operational tasks to enhance efficiency.
Required Qualifications:
Technical Skills:
- Bachelor s degree in Computer Science, IT, or a related field; a Master s degree is a plus.
- 8 years of experience in cloud operations, with at least 3 years in a leadership role.
- Strong expertise in monitoring tools like Datadog, Grafana, Prometheous including advanced configuration and monitoring setup.
- Proficiency in cloud platforms such as AWS, Azure, or Google Cloud Platform.
- Hands-on experience with automation tools like Terraform, Ansible, or CloudFormation.
- Solid understanding of DevOps practices, CI/CD pipelines, and container orchestration (e.g., Docker, Kubernetes).
Certifications:
- Datadog certifications or proven expertise in the platform is a significant advantage.
Soft Skills:
- Strong leadership and team management skills with the ability to work onsite in a customer-facing role.
- Excellent communication and interpersonal skills for effective collaboration with stakeholders.
- Proactive and solution-oriented mindset to drive improvements and resolve challenges.
- Ability to work under pressure, prioritize tasks, and manage multiple priorities effectively.