What are the responsibilities and job description for the Cloud/DevOps SRE - Infrastructure & Automation Specialist position at ClifyX, INC?
Job Description
Job Description
Cloud / DevOps Site Reliability Engineer (SRE) with hands-on experience in Kubernetes, Docker, and containerized environments.
The ideal candidate will be responsible for managing production infrastructure services, ensuring high availability, reliability, and performance of our internal cloud systems.
The role requires expertise in "automation , infrastructure management, and operational support, with a good working knowledge / understanding of various database technologies (Oracle, SingleStore, ClickHouse, MongoDB), Kafka / Python / Shell-scripting a plusCloud Infrastructure Management : Implement and maintain scalable cloud infrastructure across various environments. This would mostly be internal cloud and may not be 3PC.
Kubernetes & Docker Management : Deploy, manage, and scale applications using Kubernetes, Docker, and container orchestration tools.
Automation & CI / CD : Develop / Implement automation scripts for infrastructure provisioning, deployment, and continuous integration / delivery (CI / CD) pipelines.
Production Support & Monitoring : Ensure high availability, monitoring, incident resolution, and performance tuning for production environments.
Collaboration : Work closely with development teams to optimize cloud-native applications and improve system efficiency.
Incident Response : Lead and coordinate incident management, troubleshooting, and post-mortem analysis for production systems.
Continuous Improvement : Advocate for and implement best practices for cloud-native infrastructure, automation, and security.