Demo

Senior Site Reliability Engineer

XperiencOps Inc
Pleasanton, CA Full Time
POSTED ON 1/14/2025
AVAILABLE BEFORE 3/10/2025

The Senior Site Reliability Engineer (SRE) plays a vital role in ensuring the reliability, scalability, and performance of our enterprise software platform. This is a senior-level position that requires deep technical expertise, strong problem-solving skills, and the ability to collaborate effectively in a fast-paced, demanding environment. Our customers, the largest enterprises in the world, expect 24/7 platform availability and top-tier performance.

The ideal candidate has strong expertise in AWS cloud technologies, a deep understanding of serverless architectures (AWS Lambda), and a passion for building resilient systems to enhance the customer experience.

Platform Reliability:
  • Design, implement, and manage highly available and scalable systems to meet customer expectations for 24/7 uptime.
  • Monitor, troubleshoot, and resolve platform incidents using tools such as Sentry, New Relic, and custom monitoring frameworks.
  • Lead post-incident reviews to ensure root cause analysis and preventative measures are in place.
Automation and Optimization:
  • Develop and maintain automation for infrastructure management, monitoring, and incident response.
  • Optimize platform performance and scalability, proactively identifying and addressing bottlenecks.
  • Contribute to the development of CI/CD pipelines to improve deployment reliability and speed.
Collaboration:
  • Partner with L2 engineers to resolve complex customer issues, providing guidance and technical expertise as needed.
  • Work closely with product engineering to ensure platform improvements align with customer needs.
  • Actively contribute to the documentation and sharing of best practices to improve team performance and customer outcomes.
Leadership:
  • Mentor junior engineers and provide technical leadership in reliability engineering.
  • Drive cross-functional initiatives to improve platform stability and customer satisfaction.
  • Bachelor's degree in Computer Science or related discipline.
  • 8 years in a Site Reliability Engineering or DevOps role, with experience supporting enterprise-grade software platforms.
  • 3 years of experience in cloud services, in particular AWS.
  • Experience building observability systems on New Relic, Cloudwatch or similar.
  • Experience implementing rate-limiting, API gateways, and load balancing for highly available systems.
  • Exposure to security best practices and compliance frameworks (e.g., SOC2, ISO27001).
  • Proficient in infrastructure as code (IaC) using tools such as Terraform or CloudFormation.
  • Hands-on experience with scripting and programming languages like Python, Go, or Bash.
  • Strong troubleshooting and debugging skills.
  • Excellent communication and collaboration skills.
  • Experience with incident management and post-mortem practices.
  • Soft Skills:
    • Exceptional problem-solving and critical thinking abilities.
    • Strong verbal and written communication skills, with the ability to navigate ambiguity and provide clarity.
    • Ability to work collaboratively in cross-functional teams under pressure.

Key Attributes:
  • Reliability-Driven: Strong commitment to platform reliability and performance.
  • Leadership and Mentorship: Willingness to guide and mentor less experienced team members.
  • Customer-Focused: Dedication to meeting and exceeding customer expectations in a high-pressure environment.

Expectations:
  • Availability to participate in a 24/7 on-call rotation.
  • Ability to work in a fast-paced, ambiguous environment with rapidly changing priorities.
  • Proactive approach to identifying and mitigating risks before they impact customers.
  • Strong sense of accountability and ownership for platform stability and customer satisfaction.

  • Opportunity to work on cutting-edge products and make a real impact.
  • Collaborative and fast-paced work environment.
  • Chance to be part of a rapidly growing startup.
  • Competitive salary and benefits package (health insurance, dental insurance, vision insurance, paid time off, etc.)

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Senior Site Reliability Engineer?

Sign up to receive alerts about other jobs on the Senior Site Reliability Engineer career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$114,618 - $136,401
Income Estimation: 
$144,264 - $191,312
Income Estimation: 
$140,435 - $166,410
Income Estimation: 
$114,618 - $136,401
Income Estimation: 
$144,264 - $191,312
Income Estimation: 
$140,435 - $166,410
Income Estimation: 
$140,435 - $166,410
Income Estimation: 
$151,875 - $212,356
Income Estimation: 
$169,957 - $202,398
Income Estimation: 
$95,852 - $118,073
Income Estimation: 
$100,690 - $126,032
Income Estimation: 
$120,143 - $165,703
Income Estimation: 
$92,877 - $110,401
Income Estimation: 
$120,933 - $155,034
Income Estimation: 
$114,618 - $136,401
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at XperiencOps Inc

XperiencOps Inc
Hired Organization Address San Francisco, CA Full Time
XperiencOps Inc. is seeking a talented and experienced Machine Learning Engineer with expertise in graph databases to jo...
XperiencOps Inc
Hired Organization Address San Mateo, CA Full Time
Job Description Job Description XperiencOps Inc. is seeking a talented and experienced Python backend engineer specializ...
XperiencOps Inc
Hired Organization Address Pleasanton, CA Full Time
XperiencOps Inc. is seeking a talented and experienced Senior SRE/DevOps Engineer to join our team. In this role, you wi...
XperiencOps Inc
Hired Organization Address New York, NY Full Time
Overview: The Partner Success Manager (PSM) is a key member of the Customer Success Team at XOPS, responsible for enabli...

Not the job you're looking for? Here are some other Senior Site Reliability Engineer jobs in the Pleasanton, CA area that may be a better fit.

Senior Site Reliability Engineer

Groq, Mountain View, CA

Senior Site Reliability Engineer

Hireio, Inc., San Jose, CA

AI Assistant is available now!

Feel free to start your new journey!