What are the responsibilities and job description for the Site Reliability Engineer position at Cleverland Holdings LLC?
We're looking for a Site Reliability Engineer who is passionate about maintaining high system availability and delivering scalable solutions. If you thrive in a fast-paced environment and are dedicated to enhancing system reliability and performance, we want you on our team.
Job Description
As a Site Reliability Engineer, you will be instrumental in managing our operational systems and ensuring the reliability and stability of our online and internal platforms. You will work closely with development and infrastructure teams to integrate software engineering practices into system operations, aiming for high availability, optimal performance, and scalability.
Responsibilities:
- Monitor and analyze the performance of production systems using tools such as Datadog, Sentry, and Grafana.
- Proactively address system issues and anomalies before they become critical.
- Develop and maintain automated tools for system health monitoring, disaster recovery, and performance benchmarks.
- Work with cross-functional teams to design and implement enhancements and fixes to improve system reliability and performance.
- Document system design and procedures related to system maintenance and operations.
- Conduct post-incident reviews and lead efforts to implement effective solutions to prevent recurrence.
- Ensure all system operations comply with security standards and regulatory requirements.