What are the responsibilities and job description for the Senior Distributed Systems Engineer position at Luma AI?
We are looking for people with strong ML & Distributed systems backgrounds. This role will work within our Research team, closely collaborating with researchers to build the platforms for training our next generation of foundation models.
Responsibilities
- Work with researchers to scale up the systems required for our next generation of models trained on multi-thousand GPU clusters.
- Profile and optimize our model training code-base to achieve best in class hardware efficiency.
- Build systems to distribute work across massive GPU clusters efficiently.
- Design and implement methods to robustly train models in the presence of hardware failures.
- Build tooling to help us better understand problems in our largest training jobs.
Experience
Compensation
180,000 - $250,000 a year
The pay range for this position in California is $180,000 - $250,000yr ; however, base pay offered may vary depending on job-related knowledge, skills, candidate location, and experience. We also offer competitive equity packages in the form of stock options and a comprehensive benefits plan.
Your application is reviewed by real people.
Salary : $180,000 - $250,000