What are the responsibilities and job description for the Senior Research Engineer- Training efficiency position at Luma AI?

Responsibilities

Ensure efficient implementation of models & systems with a focus on large-scale training.
Identify and implement optimization techniques for massively parallel and distributed systems, including the underlying communication layer.
Identify and remedy efficiency bottlenecks (memory, speed, utilization, communication) by profiling and implementing high-performance PyTorch code, deferring to Triton, CUDA, and lower levels as necessary.
Work closely together with the rest of the research team to ensure systems are planned to be as efficient as possible from start to finish.
Conduct research & experiments on state-of-the-art large-scale generative AI models with the goal to improve latency & throughput for training and inference.

Must have experience

Experience training large models using Python & Pytorch, including practical experience working with the full development pipeline from data processing, preparation & dataloading to training and inference.
Experience profiling GPU & CPU code in Pytorch for optimal device utilization (examples: torch profiler, NVIDIA Nsight systems/compute, memory profilers, trace viewers, custom tooling).
Experience writing & improving highly parallel & distributed Pytorch code of large generative models, with familiarity in FSDP, Tensor Parallel, Sequence/Context Parallel, Pipeline Parallel etc.
Experience working with transformer models and attention implementations.

Good to have experience

Experience with high-performance Triton/CUDA and writing custom PyTorch kernels and ops. Top candidates will be able to write fused kernels for common hot paths, understand when to make use of lower level features like tensor cores or warp intrinsics, and will understand where these tools can be most impactful.
Experience writing high-performance parallel C . Bonus if done within an ML context with Pytorch, like for data loading, data processing, inference code.
Experience building inference / demo prototype code (incl. Gradio, Docker etc.).

Compensation Range: $220K - $300K

Salary : $220,000 - $300,000

Apply for this job

Receive alerts for other Senior Research Engineer- Training efficiency job openings

Job openings at Luma AI

Growth Marketer

Luma AI

Palo Alto, CA Full Time

As the first Growth Marketer at Luma you will work as part of the Go-To-Market team focusing on user acquisition across ...

Brand Designer

Luma AI

Palo Alto, CA Full Time

Luma is looking to hire a multi-disciplinary Brand Designer to develop and maintain the brand identity and communicate t...

Recruiting Coordinator

Luma AI

Palo Alto, CA Full Time

We're seeking a detail-oriented Recruiting Coordinator to join our talent team at Luma AI. In this pivotal role, you'll ...

Sourcer - Product & Research

Luma AI

Palo Alto, CA Full Time

We're looking for a talented Product & Research Sourcer to join our growing talent team and help us identify exceptional...

Not the job you're looking for? Here are some other Senior Research Engineer- Training efficiency jobs in the San Francisco, CA area that may be a better fit.

Senior Software Engineer (AS)

Lever Implementation Training Environment, San Francisco, CA

Senior Research Engineer- Training efficiency

What are the responsibilities and job description for the Senior Research Engineer- Training efficiency position at Luma AI?

What is the career path for a Senior Research Engineer- Training efficiency?

Job openings at Luma AI

Not the job you're looking for? Here are some other Senior Research Engineer- Training efficiency jobs in the San Francisco, CA area that may be a better fit.

We don't have any other Senior Research Engineer- Training efficiency jobs in the San Francisco, CA area right now.

AI Assistant is available now!