What are the responsibilities and job description for the Site Reliability Engineer (SRE) position at Resource Informatics Group?
Job Description
Job Description
Role : SRE (Observability) Engineer
Start Date : December 16, 2024
Note : Before taking interview candidate need to write coding test, Immediate closure opportunity.
This position is remote. Candidates must pass a HackerEarth Assessment to qualify skills in Automation (Chef, Ansible, Terraform), Python, and general SRE. Please stay on top of your submitted candidates, as we will interview those that qualify next week.
Description
We are seeking a highly skilled SRE ( Observability) Engineer with a deep understanding of modern observability practices and tools. The ideal candidate will have hands-on experience with provisioning, configuring, and developing infrastructure solutions, along with a strong focus on automation, scalability, and reliability. This role involves a mix of development, system architecture, and troubleshooting responsibilities, providing opportunities to influence the evolution of our infrastructure.
Responsibilities
- Design, implement, and manage observability solutions using tools like Dynatrace , Prometheus, Thanos, or Grafana.
- Develop metrics, alerts, and silences for comprehensive system monitoring.
- Automate infrastructure tasks using Chef (recipes, cookbooks), Ansible (tasks, playbooks), or Terraform with a strong focus on syntax and GitLab CI / CD configuration.
- Script solutions using Python , PowerShell , or Bash to enable automation across the infrastructure.
- Propose and implement innovative ideas to reduce manual workload and improve operational efficiency through automation.
- Provision and configure cloud resources via CLI or APIs on Azure , GCP , or AWS .
- Troubleshoot and resolve system issues with an SRE (Site Reliability Engineering) mindset , focusing on root cause analysis and corrective actions.
- Develop and enhance documentation, including application guides, runbooks, and system configurations, ensuring clarity in the "why" and "how" of operations.
- Plan, design, and execute scalable and redundant system architecture to meet organizational goals.
Required Skills
Preferred Skills