Demo

Site Reliability Engineer

Stax Payments
Orlando, FL Full Time
POSTED ON 3/26/2025
AVAILABLE BEFORE 4/24/2025
Description

We are seeking an experienced and strategic Site Reliability Engineer (SRE) to drive the stability, reliability, and observability of our mission-critical systems. This role is crucial to ensuring high availability, performance, and operational excellence for our services. The SRE will be responsible for designing and implementing robust reliability frameworks, overseeing system monitoring, incident response, and leading key initiatives to improve system performance.

This role requires a strong leadership mindset, balancing proactive risk mitigation with rapid incident response. The ideal candidate will work closely with engineering, operations, and leadership teams to define and uphold service-level objectives (SLOs) and optimize system resilience.

Key Responsibilities & Objectives

  • Develop and enforce service-level indicators (SLIs) and objectives (SLOs) to measure and improve system health.
  • Implement and manage comprehensive observability strategies, ensuring real-time visibility into system performance, availability, and health.
  • Oversee incident management and response processes, ensuring quick mitigation of production issues and leading post-mortem investigations to drive systemic improvements.
  • Optimize system reliability through failure analysis, capacity planning, and proactive risk assessment.
  • Define and implement best practices for on-call management, reducing alert fatigue while ensuring critical issues are addressed efficiently.
  • Assist with writing RCAs by providing technical details of the incident
  • Continuously refine operational runbooks, incident response plans, and system reliability guidelines to enhance organizational resilience.
  • Analyze system performance trends, production issues, and historical outages to proactively address weaknesses before they impact customers.
  • Drive cultural change within the organization, promoting a reliability-first mindset across all teams.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or a related field.
  • 5 years of experience in a Site Reliability Engineering, Production Engineering, or Systems Engineering role.
  • Proven expertise in managing high-availability, distributed systems in a production environment.
  • Deep understanding of observability practices, including monitoring, logging, and tracing with tools such as Prometheus, Grafana, Datadog, New Relic, or OpenTelemetry.
  • Extensive experience in incident response, RCAs, post-mortems, and continuous improvement processes.
  • Strong background in capacity planning, load balancing, and performance tuning for large-scale applications.
  • Experience with operational leadership, on-call management, and defining reliability strategies within complex environments.
  • Familiarity with networking, security best practices, and risk management strategies for distributed architectures.
  • Strong analytical and problem-solving skills to diagnose system failures and implement long-term solutions.

Preferred Skill Set

  • Incident Management & Alerting: Experience with Jira Service Management, PagerDuty, Opsgenie, or equivalent tools.
  • Cloud Infrastructure Management: Hands-on expertise with AWS, GCP, or Azure.
  • Database Performance Optimization: Experience working with relational and NoSQL databases
  • Capacity Planning & Scalability Strategies: Ability to assess and predict infrastructure needs for growth.
  • Technical Leadership & Communication: Proven ability to work cross-functionally and drive reliability initiatives at scale.

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Site Reliability Engineer?

Sign up to receive alerts about other jobs on the Site Reliability Engineer career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$154,184 - $199,940
Income Estimation: 
$189,563 - $242,917
Income Estimation: 
$76,670 - $90,826
Income Estimation: 
$91,609 - $118,978
Income Estimation: 
$92,877 - $110,401
Income Estimation: 
$92,877 - $110,401
Income Estimation: 
$120,933 - $155,034
Income Estimation: 
$114,618 - $136,401
Income Estimation: 
$114,618 - $136,401
Income Estimation: 
$144,264 - $191,312
Income Estimation: 
$140,435 - $166,410
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Stax Payments

Stax Payments
Hired Organization Address Orlando, FL Full Time
Description Maintain general accounting systems, policies, and procedures to ensure that proper information is reported ...

Not the job you're looking for? Here are some other Site Reliability Engineer jobs in the Orlando, FL area that may be a better fit.

Senior Site Reliability Engineer

AvidMindz Inc., Lake, FL

Senior Site Reliability Engineer

Talent Groups, Lake, FL

AI Assistant is available now!

Feel free to start your new journey!