What are the responsibilities and job description for the Site Reliability Engineer position at Chabez Tech LLC?
Job Details
Role: System Reliability Engineer II (Fintech)
Mandatory Skills: Kubernetes, AWS Cloud, Reliability Engineering, Java Microservices
Location: Austin TX, New York
Optional Skills: Agile Scrum
Overview:
Seeking an experienced Platform Site Reliability Engineer (SRE) with 3 years of expertise in Kubernetes, AWS, and cloud-native infrastructure. The role focuses on enhancing the reliability, scalability, and performance of digital payment platforms through automation, monitoring, and performance optimization.
Key Responsibilities:
- Kubernetes Management: Deploy, scale, and optimize Kubernetes clusters.
- AWS Cloud: Utilize AWS services (EC2, S3, EKS, RDS, etc.) for scalable infrastructure.
- Automation & IaC: Develop workflows with Terraform, CloudFormation, or Ansible.
- CI/CD Pipelines: Build and maintain CI/CD pipelines (e.g., Jenkins, GitLab CI).
- Monitoring: Implement observability tools like Prometheus, Grafana, or CloudWatch.
- Incident Management: Troubleshoot issues, conduct root cause analysis, and ensure system resilience.
- Security: Apply best practices for securing infrastructure and compliance.
Required Qualifications:
- Kubernetes: Expertise in multi-cluster management and optimization.
- AWS: Proficiency in key AWS services and cloud best practices.
- IaC Tools: Hands-on experience with Terraform or similar tools.
- Scripting: Automation skills using Python, Bash, or Go.
- Monitoring: Familiarity with observability stacks (ELK, Grafana).
- Collaboration: Strong communication and teamwork abilities.
Preferred Qualifications:
- AWS and Kubernetes certifications (e.g., CKA, AWS Solutions Architect).
- Experience with service mesh (e.g., Istio) and microservices.
- Knowledge of cost optimization for AWS infrastructure.