What are the responsibilities and job description for the Sr. Director, Global Site Reliability Engineering position at Cboe Global Markets?
Job Description:
The Sr. Director of Global Site Reliability Engineering (“SRE”) will be responsible for overseeing the global reliability, scalability, and performance of Cboe’s critical infrastructure and services. This role will lead a team of engineers across multiple regions including APAC, North America, and Europe, driving operational excellence and ensuring that our systems are designed, built, and operated with a focus on resilience, availability, and cost efficiency. The Cboe SRE team is a highly skilled technical unit responsible for platform engineering, configuration management, implementation, capacity planning, performance tuning and analysis, troubleshooting, and process automation.
The Sr. Director of Global SRE is a thought leader that influences how Cboe thinks about engineering to define future directions around process, architecture, automation, and quality. They will drive consistent solutions and process across global teams, providing mentorship within and outside of the SRE team.
The Sr. Director of Global SRE will maximize efficiencies across the SRE functional unit, lead significant projects, drive consensus and implement or migrate to new technologies or solutions. They will collaborate closely with operations, engineering, business, and global leaders to define and implement SRE best practices and strategies that align with the company’s growth objectives.
Responsibilities:
Global Leadership & Strategy:
US Geographic Differentials:
Any communication from Cboe regarding this position will only come from a Cboe recruiter who has a @cboe.com email or via LinkedIn Recruiter. Cboe does not use any other third party communication tools for recruiting purposes.
The Sr. Director of Global Site Reliability Engineering (“SRE”) will be responsible for overseeing the global reliability, scalability, and performance of Cboe’s critical infrastructure and services. This role will lead a team of engineers across multiple regions including APAC, North America, and Europe, driving operational excellence and ensuring that our systems are designed, built, and operated with a focus on resilience, availability, and cost efficiency. The Cboe SRE team is a highly skilled technical unit responsible for platform engineering, configuration management, implementation, capacity planning, performance tuning and analysis, troubleshooting, and process automation.
The Sr. Director of Global SRE is a thought leader that influences how Cboe thinks about engineering to define future directions around process, architecture, automation, and quality. They will drive consistent solutions and process across global teams, providing mentorship within and outside of the SRE team.
The Sr. Director of Global SRE will maximize efficiencies across the SRE functional unit, lead significant projects, drive consensus and implement or migrate to new technologies or solutions. They will collaborate closely with operations, engineering, business, and global leaders to define and implement SRE best practices and strategies that align with the company’s growth objectives.
Responsibilities:
Global Leadership & Strategy:
- Develop and execute the global SRE strategy, ensuring alignment with business goals and technology roadmaps across all regions.
- Drive initiatives that enhance the availability, scalability, and performance of global infrastructure and services, minimizing downtime and service disruptions.
- Maximize efficient use of resources to meet Cboe’s business objectives.
- Define and enhance policies and procedures that are supported by the SRE team (e.g., Capacity Planning Policies & Procedures, Change Management Policies & Procedures, Disaster Recovery Plans, …).
- Work closely with software engineering, infrastructure, operations, security, and business teams to design and implement reliable and secure services.
- Build, mentor, and lead a high-performing, globally distributed SRE team of 30 associates, fostering a culture of collaboration, continuous improvement, and technical innovation.
- Develop a strong global leadership team while maintaining a keen eye on succession planning.
- Lead the operational response and troubleshooting efforts for critical incidents, ensuring timely resolution.
- Drive root cause analysis efforts and implement long-term systemic improvements by executing on lessons learned through the Cboe Learning Review process.
- Ensure incident management records are descriptive, accurate, and that all regulatory and compliance reporting obligations are effectively met.
- Spearhead automation efforts to improve operational workflows, reducing manual intervention and improving system uptime.
- Decompose workloads across team to maximize efficiency and make effective use of Jira for project/task management.
- Drive timely implementation of capacity planning decisions avoiding need of last-minute heroic efforts to avoid a capacity limiting issue.
- Support the budget planning process to ensure capacity budget is sufficient to cover annual infrastructure growth and performance needs.
- Participate in quarterly Capacity Planning meetings led by the SRE team, ensuring that any necessary follow up is promptly addressed.
- Oversee the implementation of monitoring and alerting systems, ensuring proactive detection and resolution of issues before they impact customers or result in reportable compliance/regulatory issues.
- Support optimization of expense management for infrastructure expenses by developing cost-efficient strategies for resource allocation, cloud usage, and scaling.
- Ensure that all systems and processes supported by the SRE team meet or exceed regulatory, compliance, and security standards.
- Monitor disaster recovery and business continuity plans to ensure they are well-developed and regularly tested.
- Test all changes to systems and platform functionality prior deployment to production environments.
- Bachelor’s degree in Computer Science, Computer Engineering, Software Engineering, or a related discipline. Masters preferred.
- Minimum 15 years of experience in a Technical/Operations role with a significant amount of focus in an SRE, DevOps, Software Engineering, Systems, Network, or Database Administration, related discipline.
- Minimum 10 years of leadership experience.
- Ideal candidates will have 3 years of experience leading a global team.
- Experience supporting cloud infrastructures (e.g., AWS, Azure, Google Cloud), containerization (e.g., Docker, Kubernetes), monitoring tools (e.g., Prometheus, Datadog, Grafana), and automation frameworks (e.g., Terraform, Pulumi, Ansible, …).
- Excellent listening, written and verbal communications skills including the ability to explain complex technical concepts to non-technical stakeholders.
- Must possess strong analytical, quantitative, and research skills.
- Intellectually curious.
- Excellent organizational and time management skills.
US Geographic Differentials:
- 110%: Austin TX, Chicago IL, Denver CO, San Diego CA
- 115%: Los Angeles CA, Seattle WA
- 120%: Boston MA, Washington DC
- 125%: New York City NY
- 130%: San Francisco CA
Any communication from Cboe regarding this position will only come from a Cboe recruiter who has a @cboe.com email or via LinkedIn Recruiter. Cboe does not use any other third party communication tools for recruiting purposes.
Salary : $174,250 - $215,250