Demo

Senior Software Engineer - Site Reliability ML

Roblox
San Mateo, CA Full Time
POSTED ON 3/3/2025
AVAILABLE BEFORE 4/28/2025

Job Details

Every day, tens of millions of people come to Roblox to explore, create, play, learn, and connect with friends in 3D immersive digital experiences- all created by our global community of developers and creators.

At Roblox, we're building the tools and platform that empower our community to bring any experience that they can imagine to life. Our vision is to reimagine the way people come together, from anywhere in the world, and on any device.We're on a mission to connect a billion people with optimism and civility, and looking for amazing talent to help us get there.

A career at Roblox means you'll be working to shape the future of human interaction, solving unique technical challenges at scale, and helping to create safer, more civil shared experiences for everyone.

Are you a seasoned engineer with a passion for ML reliability? We're looking for exceptional Software Engineers to join the Reliability team at Roblox. In this pivotal role, you will drive the evolution of our ML systems, ensuring they meet the highest standards of performance, reliability, and efficiency. You'll collaborate with cross-functional teams to build robust ML infrastructure that supports our growth. If you have a track record of solving complex technical challenges, we want to hear from you. Join us in shaping the future of our platform and delivering unparalleled value to our users.

At Roblox, our vision is to achieve 1 billion daily active users. We believe this engineer will be instrumental in driving us towards that ambitious goal.

You Will:
  • Build, automate and standardize process automation to create a "golden path" of ML tooling and platform support that powers the ML Roblox ecosystem.
  • Create tooling that provides production guardrails for developing and delivering ML training and inference services to production.
  • Create performance monitoring services and observability towards understanding ML capacity issues and platform degradations.

You Have:
  • Experience: you have a BS degree (or equivalent professional experience) in Computer Science or related engineering field with at least 6 years of experience including at least 2 years in SRE or Software Engineering.
  • Deep experience running Kubernetes clusters in production environments at large scale that are on-premise and hosted.
  • Hands on experience with Kubernetes observability, maintenance and upgrades of large scale kubernetes clusters.
  • Experience running ML training and inference workloads on Kubernetes, supporting MLOps frameworks like Kubeflow and working with GPUs
  • Experience working with popular machine learning frameworks such as TensorFlow or PyTorch.
  • Passion for systems: You have experience and good habits around building software and tools and getting them adopted.

You Are:
  • A Partner: You know that the best tools integrate broadly with the tooling ecosystem. You approach partners and processes with curiosity and seek to understand a problem deeply before you start coding.
  • A Coder: you have experience writing common programming languages ( Python, Go, C#...).
  • Self-organized: you're excited about getting in front of complex problems, organizing your work by any means possible; overcome emergent issues and contributing to long-running projects as a part of the team.
  • Problem Solver: you ask the right questions to solve issues within your expertise and you use data to test your theories.
  • Planner - You have experience in large project lifecycles. You have experienced working in sprints, breaking down complex tasks into milestones, and reporting status to keep project scheduling accurate.

For roles that are based at our headquarters in San Mateo, CA: The starting base pay for this position is as shown below. The actual base pay is dependent upon a variety of job-related factors such as professional background, training, work experience, location, business needs and market demand. Therefore, in some circumstances, the actual salary could fall outside of this expected range. This pay range is subject to change and may be modified in the future. All full-time employees are also eligible for equity compensation and for benefits.

Annual Salary Range

$238,520-$289,460 USD

Roles that are based in our San Mateo, CA Headquarters are in-office Tuesday, Wednesday, and Thursday, with optional in-office on Monday and Friday (unless otherwise noted).

You'll Love:
  • Industry-leading compensation package
  • Excellent medical, dental, and vision coverage
  • A rewarding 401k program
  • Flexible vacation policy (varies by exemption status)
  • Roflex - Flexible and supportive work policy
  • Roblox Admin badge for your avatar
  • At Roblox HQ:
    • Free catered lunches five times a week and several fully stocked kitchens with unlimited snacks
    • Onsite fitness center and fitness program credit
    • Annual CalTrain Go Pass

Roblox provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws. Roblox also provides reasonable accommodations for all candidates during the interview process.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Salary : $238,520 - $289,460

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Senior Software Engineer - Site Reliability ML?

Sign up to receive alerts about other jobs on the Senior Software Engineer - Site Reliability ML career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$123,167 - $152,295
Income Estimation: 
$146,673 - $180,130
Income Estimation: 
$123,167 - $152,295
Income Estimation: 
$146,673 - $180,130
Income Estimation: 
$146,673 - $180,130
Income Estimation: 
$176,149 - $220,529
Income Estimation: 
$77,657 - $95,021
Income Estimation: 
$97,257 - $120,701
Income Estimation: 
$97,257 - $120,701
Income Estimation: 
$123,167 - $152,295
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Roblox

Roblox
Hired Organization Address San Mateo, CA Full Time
Every day, tens of millions of people come to Roblox to explore, create, play, learn, and connect with friends in 3D imm...
Roblox
Hired Organization Address San Mateo, CA Full Time
Every day, tens of millions of people come to Roblox to explore, create, play, learn, and connect with friends in 3D imm...
Roblox
Hired Organization Address San Mateo, CA Full Time
Job Details Every day, tens of millions of people come to Roblox to explore, create, play, learn, and connect with frien...
Roblox
Hired Organization Address San Mateo, CA Full Time
San Mateo, CA, United States Data Science & Analytics ID : 4442 Every day, tens of millions of people come to Roblox to ...

Not the job you're looking for? Here are some other Senior Software Engineer - Site Reliability ML jobs in the San Mateo, CA area that may be a better fit.

Site Reliability Engineer (Fintech)

Inabia Software & Consulting Inc., San Francisco, CA

AI Assistant is available now!

Feel free to start your new journey!