Demo

Principal, Site Reliability Engineer

Sysco LABS
Houston, TX Full Time
POSTED ON 1/19/2025
AVAILABLE BEFORE 2/17/2025
Job Summary

Impactful changes across the platform and sustained leadership roles. Responsible for designs and future direction for high availability, performant web/mobile applications, resilient and scalable systems, and metrics and monitoring. Responsible for defining best practices across development, product, architecture, and leadership to collaborate and mentor reliability across the platform. Forward thinking and action to be ahead of issues before they occur through automation and careful analysis. Critical thinking and debugging skills of highly complex environments including networking packet analysis, kubernetes, nginx, streaming (kafka), edge networks, caching, and application layer generalist. Fully accountable for overall system reliability and performance.

Duties And Responsibilities

  • Develop and refine strategy and process for all reliability tracking across the platform in conjunction with senior members of the team.
  • Lead strategic discussions to continue the evolution of flexibility and sustainability of the entire product suite.
  • Partner with support teams, DevOps, Engineering, and customers to inform decisions and implement improvements.
  • Responsible for RCA findings related to reliability are addressed at initial injection to prevent regression.
  • Looking broadly across the platform for latent reliability issues and address them before they are surfaced.
  • Provide the orchestration for the production environment by monitoring availability and taking a holistic view of system health
  • Architect the software and systems to manage platform infrastructure and applications.
  • Documenting and performing annual reviews for tribal knowledge and best practices.
  • Define the objectives for system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve
  • Gather and analyze metrics for trending performance tuning and fault finding
  • Partner with development teams to improve services through rigorous testing and release procedures
  • Provide leadership for system design, platform management, and capacity planning
  • Balance feature development speed and reliability with well-defined service level objectives
  • Actively maintain a thorough understanding of system architecture, applications, and related integrations. Partner with the Platform team to understand and improve system monitoring and alerting.
  • Drive active-active multisite reliability targets.
  • Ability to drive performance and reliability in a multi-cloud environment.
  • Implement Enterprise level procedures and processes.
  • Hands on experience with the top Cloud providers.

Education Required

Bachelor’s degree in computer science, computer engineering or related field, or relevant training.

Education Preferred

Or equivalent combination of experience and education.

Experience Required

8 years experience in Site Reliability Role.

8 years experience with enterprise cloud platforms.

Availability to work extended or off-cycle hours and participate in a 24/7 Site Reliability on-call rotation.

Experience Preferred

8 years’ experience in cloud operations / DevOps role.

Experience with AWS.

Experience with APM tools such as DataDog, New Relic, Nagios or Splunk.

Experience in an agile development environment.

Physical Demands

Reasonable accommodations will be made to enable individuals with disabilities to perform the essential functions of this job.

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Principal, Site Reliability Engineer?

Sign up to receive alerts about other jobs on the Principal, Site Reliability Engineer career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$76,670 - $90,826
Income Estimation: 
$91,609 - $118,978
Income Estimation: 
$92,877 - $110,401
Income Estimation: 
$92,877 - $110,401
Income Estimation: 
$120,933 - $155,034
Income Estimation: 
$114,618 - $136,401
Income Estimation: 
$92,369 - $122,605
Income Estimation: 
$117,024 - $149,811
Income Estimation: 
$117,024 - $149,811
Income Estimation: 
$137,568 - $176,908
Income Estimation: 
$137,568 - $176,908
Income Estimation: 
$158,960 - $205,707
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Sysco LABS

Sysco LABS
Hired Organization Address Houston, TX Full Time
Essential Functions Create product roadmaps and strategic direction. Establish KPIs for all sites and experiences. Execu...
Sysco LABS
Hired Organization Address Houston, TX Full Time
Job Summary We are seeking a skilled Software Architect specializing in backend technologies to design, implement, and o...

Not the job you're looking for? Here are some other Principal, Site Reliability Engineer jobs in the Houston, TX area that may be a better fit.

EHSS Site Manager

Allied Reliability, Houston, TX

Senior Site Reliability Engineer

Synthesis Health, Houston, TX

AI Assistant is available now!

Feel free to start your new journey!