Demo

Senior HPC engineer, Research infrastructure

Luma AI
Stanford, CA Full Time
POSTED ON 2/13/2025
AVAILABLE BEFORE 5/12/2025

Help Luma build some of the biggest & fastest AI supercomputing clusters in the world! As a High-Performance Computing engineer, you'll work at the intersection of hardware and software, designing systems that deliver the maximum possible performance for running large-scale AI models. We work at the very cutting edge of speed and scale, combining the traditions of High-Performance Computing (HPC) in a modern cloud environment.

For this role, it's important you understand how to combine CPU's, GPU's, and network devices into systems that are then deployed at a large scale to peak efficiency. You understand the lowest levels of the software platforms that sit on top of this hardware, including how to best optimize the Linux kernel and user-space code. You are capable of writing code to automate the monitoring and healing of these systems, commanding a large number of servers with few people.

Responsibilities

  • In this role, you will work closely with and directly accelerate machine learning researchers, but don't need to be a machine learning expert yourself.
  • We value people who can quickly obtain a deep technical understanding of new domains and enjoy being self-directed and identifying the most important problems to solve.
  • You'll be managing training HPC clusters at Luma from provisioning to performance tuning.
  • Areas of work will include observability, distributed job tracing, GPU diagnostics, software environment management and additional tooling plus work on the actual code to enable necessary features.
  • We believe that increasing compute is a huge lever to AI progress. You will have a direct impact on our ability to grow to an unprecedented scale and likewise produce unprecedented results.

Experience

  • 8 years experience as infrastructure engineer or Devops in large and complex distributed systems.
  • Deep understanding of networking, bonus points for experience in HPC networking.
  • Experience developing high-quality software in a general-purpose programming language, preferably including Python.
  • Excellent problem-solving skills and attention to detail.
  • Experience with GPUs in large scale clusters is strongly preferred.
  • Strong knowledge of observability and monitoring in distributed systems.
  • Tenacious at troubleshooting hardware and network topology failures in distributed systemsIndependently driven and able to own problems and build solutions from end-to-end.
  • Experience with large scale data center operations, proficiency in cloud orchestration and system tools.
  • Compensation

  • In addition to cash base pay, you'll also receive a sizable grant of Luma's equity.
  • The pay range for this position is $180000- 220000 / yr for Bay Area. Base pay offered will vary depending on job-related knowledge, skills, candidate location, and experience.
  • 180,000 - $220,000 a year

    In addition to cash base pay, you'll also receive a sizable grant of Luma's equity.

    The pay range for this position is $180000- 250000 / yr for Bay Area. Base pay offered will vary depending on job-related knowledge, skills, candidate location, and experience.

    Your application is reviewed by real people.

    Salary : $180,000 - $250,000

    If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
    Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

    What is the career path for a Senior HPC engineer, Research infrastructure?

    Sign up to receive alerts about other jobs on the Senior HPC engineer, Research infrastructure career path by checking the boxes next to the positions that interest you.
    Income Estimation: 
    $123,167 - $152,295
    Income Estimation: 
    $146,673 - $180,130
    Income Estimation: 
    $86,680 - $110,316
    Income Estimation: 
    $110,730 - $135,754
    Income Estimation: 
    $117,033 - $148,289
    Income Estimation: 
    $110,730 - $135,754
    Income Estimation: 
    $128,617 - $162,576
    Income Estimation: 
    $117,033 - $148,289
    Income Estimation: 
    $59,440 - $93,329
    Income Estimation: 
    $69,043 - $113,369
    View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

    Job openings at Luma AI

    Luma AI
    Hired Organization Address Stanford, CA Full Time
    We are looking for our first Data Scientist. You are a highly motivated individual contributor. You will define a data-d...
    Luma AI
    Hired Organization Address Milpitas, CA Full Time
    Hippocratic AI has developed a safety-focused Large Language Model (LLM) for healthcare. The company believes that a saf...
    Luma AI
    Hired Organization Address Palo Alto, CA Full Time
    Community is at the heart of Luma’s growth. Our users—creators, storytellers, teachers, developers, and innovators—inspi...
    Luma AI
    Hired Organization Address Stanford, CA Full Time
    We are looking for people with strong ML & Distributed systems backgrounds. This role will work within our Research team...

    Not the job you're looking for? Here are some other Senior HPC engineer, Research infrastructure jobs in the Stanford, CA area that may be a better fit.

    AI/HPC Infrastructure Software Engineer

    Hewlett Packard Enterprise, San Jose, CA

    AI Assistant is available now!

    Feel free to start your new journey!