Demo

Staff Software Engineer, ML Training

Stack AV
Pittsburgh, PA Remote Full Time
POSTED ON 3/6/2025
AVAILABLE BEFORE 5/5/2025

About the Role:

The ML Training Team’s core mandate is training models as fast as possible for the company. The team’s main focus is ensuring our models have 100% gpu utilization and can scale linearly from 8 gpus -> 256 gpus. We also invest in tooling to empower our MLEs, by building profiling/debugging tools, setting up efficiency monitoring and integrating our trainer into our experiment management system.

Responsibilities: 

  • Setup efficiency monitoring for all our training jobs to identify models that need improvement
  • Work with customer teams to benchmark/profile their jobs and make improvements
  • Create standardized APIs for stack-wide abstractions like training datasets, bulk inference jobs, evaluation metrics
  • Optimize dataloaders / training data formats to ensure high gpu utilization
  • Optimize distributed training configurations (network topologies, sharding strategies, pipelines, etc).

Qualifications: 

  • Experience: 5 years as a SWE, ideally building infrastructure/customer facing product, experience in AV or robotics is also great.
  •  Ideal Skills: 
    • Experience with both ML Platforms and building ML-based applications (bonus point if you have modeling experience).
    • Experience building scalable, reliable infra at a fast-paced environment.
    • Experience building or using ML infra built for a large number of customer teams.
    • A deep understanding of design tradeoffs and ability to articulate those tradeoffs and work with others on getting alignment.
    • Experience with building ML models or ML infra in the domains of autonomous vehicles, perception, and decision making (desirable but not required).
    • Experience with model training, model optimization, or large data processing pipelines.
    • Machine Learning Expertise is preferred but not necessary.
    • Knows how to push the GPU to its limit from Python to CUDA kernel level.
    • Built the inference or training loop for a large model (ideally with LLM flavor).
    • Shipped ML products (NLP, computer vision, recommender systems, etc.) at scale to make business impact.
    • Knows how to build low latency / high throughput batch or stream processing pipelines.
    • Knows how to write (readable) high performance C .
    • Prior AV experience.
  • Desired Attributes:
    • High customer empathy, able to communicate with customers well
    • Comfortable reading papers / keeping up with SOTA ML literature

#LI-AW1

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Staff Software Engineer, ML Training?

Sign up to receive alerts about other jobs on the Staff Software Engineer, ML Training career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$97,257 - $120,701
Income Estimation: 
$123,167 - $152,295
Income Estimation: 
$97,257 - $120,701
Income Estimation: 
$123,167 - $152,295
Income Estimation: 
$146,673 - $180,130
Income Estimation: 
$176,149 - $220,529
Income Estimation: 
$77,657 - $95,021
Income Estimation: 
$97,257 - $120,701
Income Estimation: 
$123,167 - $152,295
Income Estimation: 
$146,673 - $180,130
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Stack AV

Stack AV
Hired Organization Address Stanton, PA Full Time
About the Team We are a team of passionate experts in autonomy, robotics, AI, machine learning, large scale compute, ele...
Stack AV
Hired Organization Address Pittsburgh, PA Full Time
About the Role: The tracking team is responsible for tracking road actors and road elements as well as estimating a temp...
Stack AV
Hired Organization Address Pittsburgh, PA Full Time
About the Role: Data is one of the main drivers in the success of any ML company and the autonomous vehicle industry is ...
Stack AV
Hired Organization Address Pittsburgh, PA Intern
Internship Program: Stack is revolutionizing transportation through AI and is seeking the best and brightest interns to ...

Not the job you're looking for? Here are some other Staff Software Engineer, ML Training jobs in the Pittsburgh, PA area that may be a better fit.

Staff Software Engineer, ML Platform

Stack AV, Pittsburgh, PA

AI Assistant is available now!

Feel free to start your new journey!