What are the responsibilities and job description for the Senior Machine Learning Performance Engineer position at Wayve?
The Role
We are seeking skilled engineers to join our Machine Learning Platform team working on optimising large scale training jobs as we aim to scale our models through the next order of magnitude. The Machine Learning Platform team owns our GPU training infrastructure and software abstractions around it, and you will have a specific focus on improving training efficiency.
Challenges you will own
- Maximising the MFU of our large scale training jobs.
- Profiling and identifying bottlenecks in training code.
- Implementing GPU kernels to improve training throughput.
- Working closely with Research teams to integrate and test training efficiency improvements.
- Owning and improving our GPU training clusters.
About You
Essential:
- 5 years experience in performance optimization or ML engineering.
- Experience optimize large scale training jobs on GPU compute clusters.
- Experience in working in platform teams and working with research teams.
- Experience in reporting and tracking over time benchmarked performance in an open and accessible way.
- Ability to write high quality, well-structured and tested Python code
- BS or MS in Machine Learning, Computer Science, Engineering, or a related technical discipline or equivalent experience
Desirable:
- Solid experience working with concurrent, parallel and distributed computing.
- Experience using Nvidia NSight Systems.
- Experience implementing GPU kernels.
- Knowledge of computing fundamentals - what makes code fast, secure and reliable.
We understand that everyone has a unique set of skills and experiences and that not everyone will meet all of the requirements listed above. If you’re passionate about self-driving cars and think you have what it takes to make a positive impact on the world, we encourage you to apply.
This is a full-time role based in our office in Sunnyvale, California. At Wayve we want the best of all worlds so we operate a hybrid working policy that combines time together in our offices and workshops to fuel innovation, culture, relationships and learning, and time spent working from home. We operate core working hours so you can determine the schedule that works best for you and your team.
#LI-HH1