What are the responsibilities and job description for the Site Reliability Engineer position at Highbrow LLC?
Job Title: Site Reliability Engineer (SRE)
Job Summary:
We are seeking a Site Reliability Engineer (SRE) to join our team and ensure the availability, scalability, and performance of our systems. The ideal candidate will have a strong background in DevOps, automation, cloud platforms, and system reliability. You will work closely with development and operations teams to build and maintain robust, high-availability infrastructure.
Key Responsibilities:
✅ Design, implement, and maintain scalable and reliable infrastructure.
✅ Automate deployment, monitoring, and incident response processes.
✅ Develop and maintain CI/CD pipelines for seamless software delivery.
✅ Monitor system performance, troubleshoot issues, and implement fixes proactively.
✅ Work with development teams to optimize application performance and reliability.
✅ Ensure security best practices are followed across infrastructure and applications.
✅ Participate in on-call rotations to handle incidents and outages.
✅ Implement observability tools for logging, metrics, and alerting.
Required Skills & Qualifications:
✔ Experience with Cloud Platforms: AWS, GCP, or Azure.
✔ Automation & Infrastructure as Code (IaC): Terraform, Ansible, or CloudFormation.
✔ Monitoring & Logging: Prometheus, Grafana, ELK Stack, or Datadog.
✔ Scripting & Programming: Python, Bash, or Go.
✔ Containerization & Orchestration: Docker, Kubernetes.
✔ CI/CD Pipelines: Jenkins, GitHub Actions, GitLab CI/CD, or CircleCI.
✔ Networking & Security: DNS, Load Balancers, Firewalls, IAM, and VPNs.
✔ Version Control: Git, GitHub, or Bitbucket.
Preferred Qualifications:
🔹 Experience with incident management and post-mortem analysis.
🔹 Background in software development and system architecture.
🔹 Knowledge of service mesh and API gateways.
🔹 Experience in high-availability and disaster recovery planning