Responsibilities:
As a Senior Site Reliability Engineer (SRE), you will play a crucial role in designing, implementing, and maintaining highly scalable and reliable systems and services. Your primary focus will be on ensuring the availability, performance, and efficiency of the companys infrastructure and applications. You will collaborate with cross-functional teams, including product, devops & qa, cloud infrastructure teams to drive improvements and solve complex technical challenges. Responsibilities: System Design and Architecture: Contribute to the design and architecture of scalable and highly available systems and services, considering factors such as reliability, performance, security, and cost-effectiveness. Infrastructure Automation: Develop and maintain infrastructure automation tools and frameworks, leveraging technologies such as infrastructure-as-code (IaC) and configuration management tools. Automate deployment, monitoring, and management processes to increase efficiency and reduce manual effort. Monitoring and Alerting: Implement effective monitoring and alerting systems to proactively identify and resolve issues. Develop and maintain monitoring tools and dashboards to provide real-time visibility into system performance and availability. Incident Response and Troubleshooting: Respond to critical incidents, perform root cause analysis, and implement preventive measures to minimize the impact of future incidents. Work closely with development teams to address performance bottlenecks and reliability issues. Capacity Planning and Performance Optimization: Analyze system performance and capacity metrics to identify areas for improvement. Collaborate with teams to optimize resource utilization, enhance system performance, and plan for future growth. Continuous Improvement and Best Practices: Stay up-to-date with industry best practices, emerging technologies, and trends in Site Reliability Engineering. Drive continuous improvement initiatives, implement best practices, and mentor junior team members. Collaboration and Communication: Collaborate with cross-functional teams, including developers, operations, and product managers, to understand requirements, provide technical guidance, and ensure the reliability and scalability of systems. Communicate effectively with stakeholders, both technical and non-technical, to provide updates and address concerns. Coaching and Mentoring: Provide guidance and support to junior colleagues, helping them develop their skills and grow in their careers. Share knowledge, review code, and assist with technical challenges to foster a collaborative and learning-oriented environment.