What are the responsibilities and job description for the DevOps and Site Reliability Engineer (SRE) position at Cloud Bigdata?
Job Details
JOB DESCRIPTION:
Develop and maintain CI/CD pipelines to automate deployments and improve workflow efficiency.
Manage cloud infrastructure (AWS, Google Cloud Platform, Azure) using Infrastructure-as-Code (IaC) tools like Terraform and Ansible.
Ensure system reliability by monitoring, troubleshooting, and improving application performance.
Implement and track Service-Level Indicators (SLIs), Service-Level Objectives (SLOs), and Service-Level Agreements (SLAs).
Automate infrastructure management using scripting languages (Python, Bash) and configuration management tools.
Collaborate with development teams to improve deployment practices and system architecture.
Manage incident response, conduct root cause analysis, and implement post-mortem processes.
Monitor and ensure scalability by analyzing system capacity and performance bottlenecks.
Ensure the security and compliance of cloud environments and deployments.
Continuously improve automation, reliability, and performance of systems.
REQUIRED SKILL SET:
Hands on experience in a DevOps, SRE, or similar role.
Strong knowledge of cloud platforms (AWS, Google Cloud Platform, Azure) and containerization (Docker, Kubernetes).
Experience with Infrastructure-as-Code (Terraform, CloudFormation, Ansible).
Expertise in CI/CD tools (Jenkins, GitLab CI, etc.) and version control (Git).
Solid understanding of monitoring and logging tools (Prometheus, Grafana, ELK Stack).
Proficiency in scripting (Python, Bash) and automation.
Strong problem-solving skills with a focus on system reliability and performance.
Knowledge of microservices architecture and distributed systems is a plus.
Cloud certifications and experience with Agile methodologies are preferred.