What are the responsibilities and job description for the Director, Site Reliability Engineering position at Soni?
Our client, a leading player in the crypto/blockchain space, is seeking an experienced Director of Site Reliability Engineering (SRE) to drive the strategy, development, and optimization of their infrastructure. As the SRE Director, you will lead a high-performing team, ensuring system reliability, scalability, and security while collaborating with engineering, security, and product teams.
Key Responsibilities:
- Lead the development and execution of the SRE strategy, focusing on system reliability, performance, and security.
- Manage and mentor a team of SREs, fostering a culture of automation, observability, and continuous improvement.
- Define and manage Service Level Objectives (SLOs), Service Level Agreements (SLAs), and error budgets to maintain a balance between innovation and system stability.
- Architect and manage containerized environments using Kubernetes and cloud-native technologies.
- Oversee Infrastructure as Code (IaC) using Terraform, ensuring compliance and repeatability.
- Build and enhance CI/CD pipelines, improving software delivery and security.
- Lead observability efforts with tools like Datadog, Prometheus, and OpenTelemetry.
- Drive incident response, post-mortem reviews, and improvements to system design and operational procedures.
- Optimize infrastructure for blockchain nodes, validators, and smart contracts, ensuring high availability and security.
Required Qualifications:
- 10 years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering.
- Deep expertise in Kubernetes, containers, and cloud-native architectures.
- Strong proficiency with Terraform and other Infrastructure as Code (IaC) tools.
- Extensive experience with AWS and on-prem environments.
- Hands-on experience with observability tools such as Datadog, Prometheus, and Grafana.
- Proven track record securing and optimizing blockchain infrastructure (e.g., Ethereum, Solana, Bitcoin).
- Experience leading high-performing SRE teams and working cross-functionally with engineering and security teams.
- Strong problem-solving, incident management, and communication skills.
Compensation: $200,000 - 250,000
Salary is based on a range of factors that include relevant experience, knowledge, skills, other job-related qualifications.
Salary : $200,000 - $250,000