What are the responsibilities and job description for the DevOps Engineer position at Ascentt?
Description:
- Cloud Infrastructure Management: Design, implement, and manage cloud-based infrastructure on AWS and Azure, ensuring optimal scalability, performance, and security.
- CI/CD Pipeline Development: Develop and maintain CI/CD pipelines using GitHub Actions for automated code deployments and testing.
- System Monitoring and Incident Management:
- Implement and configure Datadog for comprehensive system monitoring.
- Develop and maintain Datadog dashboards to visualize system performance and metrics.
- Set up proactive alerts in Datadog to detect and respond to incidents swiftly, ensuring high system reliability and uptime.
- Conduct root cause analysis of incidents and implement corrective actions using Datadog insights.
- Collaboration with AI Teams: Work closely with AI teams to support the operational aspects of LLMs, including deployment strategies and performance tuning.
- Infrastructure as Code (IaC): Implement IaC using tools like Terraform or AWS CloudFormation to automate infrastructure provisioning and management.
- Container Orchestration: Manage container orchestration systems such as Kubernetes or AWS ECS.
- Operational Support for LLMs: Provide operational support for LLMs, focusing on performance optimization and reliability.
- Scripting and Automation: Utilize scripting languages such as Python and Bash for automation and task management.
- Security and Compliance: Ensure compliance with security standards and best practices, implementing robust security measures.
- Documentation: Document system configurations, procedures, and best practices for internal and external stakeholders.
- DevOps Collaboration: Work with development teams to optimize deployment workflows, introduce best practices for DevOps, and improve overall efficiency.
- Technology and Industry Awareness: Stay up-to-date with emerging technologies and industry trends to suggest improvements and upgrades.
Key Duties:
- Extensive experience with AWS and Azure cloud platforms.
- Proficiency in developing CI/CD pipelines using GitHub Actions.
- Strong experience with Datadog for system monitoring, including implementation, configuration, and maintenance.
- Demonstrated ability to create and maintain Datadog dashboards for performance visualization.
- Proven expertise in setting up alerts and conducting incident response with Datadog.
- Hands-on experience with container orchestration systems such as Kubernetes or AWS ECS.
- Proficiency in Infrastructure as Code (IaC) tools like Terraform or AWS CloudFormation.
- Familiarity with operational aspects of Large Language Models (LLMs) is highly desirable.
- Strong scripting skills in Python, Bash, or similar languages.
- In-depth knowledge of security standards and best practices.
- Excellent documentation skills.
- Proven ability to work collaboratively with development and AI teams.
- Commitment to staying current with industry trends and emerging technologies