Demo

Senior HPC engineer, Research infrastructure

Luma AI
Palo Alto, CA Full Time
POSTED ON 1/7/2025
AVAILABLE BEFORE 3/6/2025

Help Luma build some of the biggest & fastest AI supercomputing clusters in the world! As a High-Performance Computing engineer, you’ll work at the intersection of hardware and software, designing systems that deliver the maximum possible performance for running large-scale AI models. We work at the very cutting edge of speed and scale, combining the traditions of High-Performance Computing (HPC) in a modern cloud environment. 


For this role, it’s important you understand how to combine CPU’s, GPU’s, and network devices into systems that are then deployed at a large scale to peak efficiency. You understand the lowest levels of the software platforms that sit on top of this hardware, including how to best optimize the Linux kernel and user-space code. You are capable of writing code to automate the monitoring and healing of these systems, commanding a large number of servers with few people.

\n


Responsibilities
  • In this role, you will work closely with and directly accelerate machine learning researchers, but don't need to be a machine learning expert yourself. 
  • We value people who can quickly obtain a deep technical understanding of new domains and enjoy being self-directed and identifying the most important problems to solve. 
  • You’ll be managing training HPC clusters at Luma from provisioning to performance tuning.
  • Areas of work will include observability, distributed job tracing, GPU diagnostics, software environment management and additional tooling plus work on the actual code to enable necessary features.
  • We believe that increasing compute is a huge lever to AI progress. You will have a direct impact on our ability to grow to an unprecedented scale and likewise produce unprecedented results.


Experience
  • 8 years experience as infrastructure engineer or Devops in large and complex distributed systems.
  • Deep understanding of networking, bonus points for experience in HPC networking.
  • Experience developing high-quality software in a general-purpose programming language, preferably including Python.
  • Excellent problem-solving skills and attention to detail.
  • Experience with GPUs in large scale clusters is strongly preferred.
  • Strong knowledge of observability and monitoring in distributed systems.
  • Tenacious at troubleshooting hardware and network topology failures in distributed systemsIndependently driven and able to own problems and build solutions from end-to-end.
  • Experience with large scale data center operations, proficiency in cloud orchestration and system tools.
  • Please note this role is not meant for recent grads.


Compensation
  • In addition to cash base pay, you'll also receive a sizable grant of Luma's equity.
  • The pay range for this position is $180000- 220000/yr for Bay Area. Base pay offered will vary depending on job-related knowledge, skills, candidate location, and experience. 


\n
$180,000 - $220,000 a year
In addition to cash base pay, you'll also receive a sizable grant of Luma's equity.
The pay range for this position is $180000- 250000/yr for Bay Area. Base pay offered will vary depending on job-related knowledge, skills, candidate location, and experience. 
\n

Your application is reviewed by real people.

Salary : $180,000 - $220,000

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Senior HPC engineer, Research infrastructure?

Sign up to receive alerts about other jobs on the Senior HPC engineer, Research infrastructure career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$86,680 - $110,316
Income Estimation: 
$110,730 - $135,754
Income Estimation: 
$117,033 - $148,289
Income Estimation: 
$110,730 - $135,754
Income Estimation: 
$128,617 - $162,576
Income Estimation: 
$117,033 - $148,289
Income Estimation: 
$128,617 - $162,576
Income Estimation: 
$163,289 - $195,234
Income Estimation: 
$117,033 - $148,289
Income Estimation: 
$59,440 - $93,329
Income Estimation: 
$69,043 - $113,369
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Luma AI

Luma AI
Hired Organization Address Palo Alto, CA Full Time
We are looking for someone to lead our hiring function. This candidate will work closely with the executive team to unde...
Luma AI
Hired Organization Address Palo Alto, CA Full Time
Luma is looking for a Technical Artist to join our Applied team. Luma’s Applied team takes our underlying foundation mod...
Luma AI
Hired Organization Address Stanford, CA Full Time
The SRE role at Luma AI sits with the Infrastructure and Research teams and is responsible for our GPU clusters. Luma ru...
Luma AI
Hired Organization Address Palo Alto, CA Full Time
We are looking for engineers with significant problem-solving experience in PyTorch, CUDA, and distributed systems. You ...

Not the job you're looking for? Here are some other Senior HPC engineer, Research infrastructure jobs in the Palo Alto, CA area that may be a better fit.

HPC Engineer, AI Infrastructure

Tesla Motors, Inc., Palo Alto, CA

AI/HPC Infrastructure Software Engineer

Hewlett Packard Enterprise, San Jose, CA

AI Assistant is available now!

Feel free to start your new journey!