Demo

Senior Site Reliability Engineer

TAG - The Aspen Group
Chicago, IL Full Time
POSTED ON 1/14/2025
AVAILABLE BEFORE 4/6/2025

The Aspen Group (TAG) is one of the largest and most trusted retail healthcare business support organizations in the U.S. and has supported over 20,000 healthcare professionals and team members with close to 1,500 health and wellness offices across 48 states in four distinct categories : dental care, urgent care, medical aesthetics, and animal health. Working in partnership with independent practice owners and clinicians, the team is united by a single purpose : to prove that healthcare can be better and smarter for everyone. TAG provides a comprehensive suite of centralized business support services that power the impact of five consumer-facing businesses : Aspen Dental, ClearChoice Dental Implant Centers, WellNow Urgent Care, Chapter Aesthetic Studio, and AZPetVet. Each brand has access to a deep community of experts, tools and resources to grow their practices, and an unwavering commitment to delivering high-quality consumer healthcare experiences at scale.​

As a reflection of our current needs and planned growth we are very pleased to offer a new opportunity to join our dedicated team as a Senior Site Reliability Engineer.

The Senior Site Reliability Engineer (SRE) & Monitoring Specialist will be responsible for ensuring the reliability, performance, and scalability of our systems. This role involves implementing and managing monitoring solutions, responding to incidents, and optimizing system performance to meet business objectives.

Responsibilities

Site Reliability Engineering :

  • Design, build, and maintain scalable and reliable systems to support our applications and services.
  • Develop and manage Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to ensure systems meet reliability targets.
  • Drive improvements in system reliability, availability, and performance through proactive measures and automation.

Monitoring & Observability :

  • Implement and manage comprehensive monitoring and alerting solutions to ensure full visibility into system health and performance.
  • Develop and maintain dashboards and reporting tools that provide actionable insights for troubleshooting and performance optimization.
  • Evaluate and integrate new monitoring tools and technologies as needed to enhance observability.
  • I ncident Management :

  • Lead and participate in incident response efforts, including troubleshooting, root cause analysis, and resolution.
  • Develop and maintain incident management processes to improve response times and minimize service disruptions.
  • Conduct post-incident reviews to identify areas for improvement and implement preventive measures.
  • Performance Optimization :

  • Analyze performance metrics and logs to identify and address bottlenecks and inefficiencies in the system.
  • Collaborate with development teams to optimize code and infrastructure for better performance and reliability.
  • Perform capacity planning to ensure systems can handle current and future loads.
  • A utomation & Process Improvement :

  • Develop and implement automation solutions to streamline operations and reduce manual intervention.
  • Identify and drive process improvements to enhance operational efficiency and effectiveness.
  • Maintain documentation related to monitoring, incident management, and SRE best practices.
  • Collaboration & Communication :

  • Work closely with engineering, operations, and product teams to align on reliability and monitoring goals.
  • Communicate effectively with stakeholders, providing regular updates on system health, incidents, and performance improvements.
  • Foster a culture of collaboration and knowledge sharing within the team and across the organization.
  • Requirements :

  • Bachelor's degree in Computer Science or a related field.
  • At least 5 years of experience in Site Reliability Engineering or a similar role.
  • Strong proficiency in at least one programming language such as Python, Java, or Go.
  • Experience with containerization technologies such as Docker and Kubernetes.
  • Strong understanding of networking, distributed systems, and cloud infrastructure.
  • Familiarity with monitoring and logging tools such as Prometheus, Grafana, ELK Stack, and Splunk.
  • Excellent problem-solving skills and the ability to work independently and in a team environment.
  • Experience with incident management and root cause analysis.
  • If you are a Senior SRE Engineer with a passion for ensuring the reliability and performance of production systems, we encourage you to apply for this exciting opportunity.

    If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
    Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

    What is the career path for a Senior Site Reliability Engineer?

    Sign up to receive alerts about other jobs on the Senior Site Reliability Engineer career path by checking the boxes next to the positions that interest you.
    Income Estimation: 
    $114,618 - $136,401
    Income Estimation: 
    $144,264 - $191,312
    Income Estimation: 
    $140,435 - $166,410
    Income Estimation: 
    $114,618 - $136,401
    Income Estimation: 
    $144,264 - $191,312
    Income Estimation: 
    $140,435 - $166,410
    Income Estimation: 
    $140,435 - $166,410
    Income Estimation: 
    $151,875 - $212,356
    Income Estimation: 
    $169,957 - $202,398
    Income Estimation: 
    $76,670 - $90,826
    Income Estimation: 
    $91,609 - $118,978
    Income Estimation: 
    $92,877 - $110,401
    Income Estimation: 
    $92,877 - $110,401
    Income Estimation: 
    $120,933 - $155,034
    Income Estimation: 
    $114,618 - $136,401
    View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

    Job openings at TAG - The Aspen Group

    TAG - The Aspen Group
    Hired Organization Address Chicago, IL Full Time
    The Aspen Group (TAG) is one of the largest and most trusted retail healthcare business support organizations in the U.S...
    TAG - The Aspen Group
    Hired Organization Address Chicago, IL Full Time
    The Aspen Group (TAG) is one of the largest and most trusted retail healthcare business support organizations in the U.S...
    TAG - The Aspen Group
    Hired Organization Address Chicago, IL Full Time
    The Aspen Group (TAG) is one of the largest and most trusted retail healthcare business support organizations in the U.S...
    TAG - The Aspen Group
    Hired Organization Address Chicago, IL Full Time
    The Aspen Group (TAG) is one of the largest and most trusted retail healthcare business support organizations in the U.S...

    Not the job you're looking for? Here are some other Senior Site Reliability Engineer jobs in the Chicago, IL area that may be a better fit.

    Senior Site Reliability Engineer

    Iris Software Inc., Chicago, IL

    AI Assistant is available now!

    Feel free to start your new journey!