Demo

Site Reliability Engineering (SRE) Manager

Massachusetts Medical Society
Waltham, MA Full Time
POSTED ON 1/15/2025
AVAILABLE BEFORE 3/15/2025
Site Reliability Engineering (SRE) Manager

Category
Information Technology

Job Location
Waltham, Massachusetts

Tracking Code
1119

Position Type
Full-Time/Regular

The Massachusetts Medical Society (MMS) is the statewide professional association for physicians and medical students, supporting 25,000 members. We are dedicated to educating and advocating for the physicians of Massachusetts and patients locally and nationally. A leadership voice in health care, the MMS contributes physician and patient perspectives to influence health-related legislation at the state and federal levels, works in support of public health, provides expert advice on physician practice management, and addresses issues of physician well-being. Under the auspices of NEJM Group, the MMS extends our mission globally by advancing medical knowledge from research to patient care through the New England Journal of Medicine, NEJM Evidence, NEJM AI, NEJM Catalyst, NEJM Journal Watch, and through our accredited and comprehensive continuing medical education programs.

The world has changed, and so has the way we work. The MMS has adopted a flexible work model that allows most employees to choose where they work - at home, onsite in our Waltham office, or a combination of the two - based on their preferences and our business needs. Because what matters is the work we do, not where we do it.

We are seeking a skilled and motivated Site Reliability Engineering (SRE) Manager to lead our growing SRE team and play a critical role in driving operational excellence, technical innovation, and strategic alignment within our hybrid infrastructure. This position balances hands-on technical work with leadership responsibilities, focusing on people management, project execution, and cloud infrastructure architecture. If you are passionate about designing resilient, self-healing systems and empowering development teams through scalable frameworks, we want to hear from you!

Responsibilities:

People and Project Management (50%)

  • Provide leadership, mentorship, and guidance to a team of SREs/DevOps professionals
  • Collaborate with stakeholders to define and prioritize objectives, ensuring alignment with business goals.
  • Oversee project execution, ensuring timely delivery of initiatives like CI/CD pipeline enhancements, observability improvements, and security hardening as well as operational support deliverables.
  • Foster a culture of collaboration, accountability, and continuous improvement across the team.
  • Support career development through regular feedback, technical mentoring, and training opportunities.

Hands-On Technical Contributions (50%)

  • Develop self-service frameworks that empower development teams while maintaining operational standards.
  • Strengthen security practices by establishing robust guardrails and aligning with industry best practices.
  • Enhance system observability by integrating tools like Datadog for monitoring, alerting, and analytics.
  • Evangelize scalable operational practices and play an active role automating and enforcing the same.
  • Architect and implement cloud infrastructure solutions to meet scalability and resilience requirements for delivering and testing highly available platforms for our complex multi-tier applications.
  • Support and improve on-premise and hybrid infrastructure solutions, balancing legacy design with our move to the cloud.
  • Design and improve CI/CD pipelines to optimize deployment speed, reliability, and security.
  • Responsible for writing and maintaining technical documentation.
  • Develop release plans and service level agreements and foster the migration of legacy applications to modern CI/CD pipelines.
  • Own production incidents/issues and provide application support during and - on occasion - outside of normal business hours, responding to infrastructure incidents and alerts and escalating to other subject matter experts as necessary.
  • Work with third-party vendors to resolve infrastructure issues.
  • Other responsibilities as assigned.

Strategic Responsibilities

  • Drive the adoption of resilient, self-healing design patterns across the infrastructure.
  • Partner with development teams to create scalable solutions that streamline workflows and reduce toil.
  • Advocate for operational excellence by implementing and enhancing frameworks for reliability, incident response, and continuous learning.

Qualifications:

Required Skills and Experience

  • Bachelor's degree in a related field with 6 years of experience in software development, SRE, or DevOps, or equivalent education and experience is required.
  • Proven experience in a leadership role managing and scaling SRE or DevOps teams.
  • Hands-on expertise with hybrid cloud architectures, particularly transitioning from on-premises to modern cloud platforms.
  • Excellent knowledge of Linux systems (Amazon Linux) and Windows systems.
  • Understanding of AWS VPC, network management, and datacenter operations.
  • Proficiency in CI/CD pipeline design and tools such as GitHub Actions, Jenkins, or similar.
  • Strong knowledge of observability tools like Datadog, Grafana, or Prometheus.
  • Solid understanding of infrastructure as code (IaC) practices and tools (e.g., Terraform, CloudFormation).
  • Experience with security best practices, including compliance, vulnerability management, and identity/access management.
  • Working knowledge of databases and system performance.
  • Excellent communication and project management skills, with experience using tools like Jira and Confluence.
  • Ability to mentor team members technically and strategically.
  • Must be an excellent and creative problem solver. (You don't need to know everything, but you need to know how to find the solution.)
  • Demonstrated cooperative work style with strong communication, interpersonal and teamwork skills in an Agile environment.
  • Must be self-motivated, with the ability to work with minimal supervision.

Preferred Qualifications

  • Experience with containerization and orchestration tools like Docker and Kubernetes.
  • Previous experience with an API management tool (MuleSoft preferred).
  • Experience with self-healing system design and automated failure recovery strategies.
  • Hands-on experience with scripting languages such as Python, Bash, or PowerShell.
  • Familiarity with Agile methodologies and practices.

Benefits:

Our generous benefits offerings include: 3 weeks of paid vacation, 6 personal days, 12 sick days, 13 paid holidays, medical and dental plans, 401(k) plans with company match, backup childcare assistance, tuition assistance and more!

The MMS has earned praise as one of the Top Places to Work in Massachusetts by The Boston Globe for the past 15 years in a row! The Globe surveys employees regarding their opinions about company leadership, benefits, ethics, values and culture, and recognizes those companies who receive high marks from their employees.

Massachusetts Medical Society is an Equal Opportunity Employer: Min/Fem/Vet/Disabled


The Massachusetts Medical Society is an EOE: Minorities, Females, Veterans and Disabled.


 

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Site Reliability Engineering (SRE) Manager?

Sign up to receive alerts about other jobs on the Site Reliability Engineering (SRE) Manager career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$154,184 - $199,940
Income Estimation: 
$189,563 - $242,917
Income Estimation: 
$92,369 - $122,605
Income Estimation: 
$117,024 - $149,811
Income Estimation: 
$117,024 - $149,811
Income Estimation: 
$137,568 - $176,908
Income Estimation: 
$154,509 - $200,187
Income Estimation: 
$188,252 - $252,911
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Not the job you're looking for? Here are some other Site Reliability Engineering (SRE) Manager jobs in the Waltham, MA area that may be a better fit.

Reliability Engineer

Beacon Engineering Resources, Boston, MA

Site Reliability Engineer (SRE)

Air Space Intelligence, Boston, MA

AI Assistant is available now!

Feel free to start your new journey!