What are the responsibilities and job description for the Cloud/DevOps SRE - Infrastructure & Automation position at ClifyX?
Job Details
Job Description:
Cloud/DevOps Site Reliability Engineer (SRE) with hands-on experience in Kubernetes, Docker, and containerized environments and Automation .
The ideal candidate will be responsible for managing production infrastructure services, ensuring high availability, reliability, and performance of our internal cloud systems.
The role requires expertise in automation , infrastructure management, and operational support, with a good working knowledge/ understanding of various database technologies (Oracle, SingleStore, ClickHouse, MongoDB), Kafka/Python/Shell-scripting a plusCloud Infrastructure Management: Implement and maintain scalable cloud infrastructure across various environments. This would mostly be internal cloud and may not be 3PC.
Kubernetes & Docker Management: Deploy, manage, and scale applications using Kubernetes, Docker, and container orchestration tools.
Automation & CI/CD: Develop/Implement automation scripts for infrastructure provisioning, deployment, and continuous integration/delivery (CI/CD) pipelines.
Production Support & Monitoring: Ensure high availability, monitoring, incident resolution, and performance tuning for production environments.
Collaboration: Work closely with development teams to optimize cloud-native applications and improve system efficiency.
Incident Response: Lead and coordinate incident management, troubleshooting, and post-mortem analysis for production systems.
Continuous Improvement: Advocate for and implement best practices for cloud-native infrastructure, automation, and security.