Demo

Senior Distributed Systems Engineer

PIKA Inc
Stanford, CA Full Time
POSTED ON 3/1/2025
AVAILABLE BEFORE 5/25/2025

We are seeking highly skilled engineers with expertise in machine learning, distributed systems, and high-performance computing to join our Research team. In this role, you will collaborate closely with researchers to build and optimize platforms that train next-generation foundation models on massive GPU clusters. Your work will play a critical role in advancing the efficiency and scalability of cutting-edge generative AI technologies.

Key Responsibilities

  • Scale and optimize systems for training large-scale models across multi-thousand GPU clusters.
  • Profile and enhance the performance of training codebases to achieve best-in-class hardware efficiency.
  • Develop systems to distribute workloads efficiently across massive GPU clusters.
  • Design and implement robust solutions to enable model training in the presence of hardware failures.
  • Build tools to diagnose issues, visualize processes, and evaluate datasets at scale.
  • Optimize and deploy inference workloads for throughput and latency across the entire stack, including data processing, model inference, and parallel processing.
  • Implement and improve high-performance CUDA, Triton, and PyTorch code to address efficiency bottlenecks in memory, speed, and utilization.
  • Collaborate with researchers to ensure systems are designed with optimal efficiency from the ground up.
  • Prototype cutting-edge applications using multimodal generative AI.

Qualifications

  • Experience :
  • 3 years of professional experience in ML pipelines, distributed systems, or high-performance computing.

  • Hands-on experience training large models using Python and PyTorch, with familiarity in the full pipeline : data processing, loading, training, and inference.
  • Proven expertise in optimizing and deploying inference workloads, with experience in profiling GPU / CPU code (e.g., Nvidia Nsight).
  • Deep understanding of distributed systems and frameworks, such as DDP, FSDP, and tensor parallelism.
  • Strong experience writing high-performance parallel C and custom PyTorch kernels, with knowledge of CUDA and Triton optimization techniques.
  • Bonus : Experience with generative models (e.g., Transformers, Diffusion Models, GANs) and prototype development (e.g., Gradio, Docker).
  • Technical Skills :
  • Proficiency in Python, with significant experience using PyTorch.

  • Advanced skills in CUDA / Triton programming, including custom kernel development and tensor core optimization.
  • Strong generalist software engineering skills and familiarity with distributed and parallel computing systems.
  • Note : This position is not intended for recent graduates.

    Compensation

    The salary range for this role in California is $175,000-$250,000 per year. Actual compensation will depend on job-related knowledge, skills, experience, and candidate location. We also offer competitive equity packages in the form of stock options and a comprehensive benefits plan.

    Salary : $175,000 - $250,000

    If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
    Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

    What is the career path for a Senior Distributed Systems Engineer?

    Sign up to receive alerts about other jobs on the Senior Distributed Systems Engineer career path by checking the boxes next to the positions that interest you.
    Income Estimation: 
    $86,680 - $110,316
    Income Estimation: 
    $110,730 - $135,754
    Income Estimation: 
    $117,033 - $148,289
    Income Estimation: 
    $105,809 - $128,724
    Income Estimation: 
    $136,611 - $163,397
    Income Estimation: 
    $135,163 - $163,519
    Income Estimation: 
    $131,953 - $159,624
    Income Estimation: 
    $150,859 - $181,127
    Income Estimation: 
    $110,730 - $135,754
    Income Estimation: 
    $128,617 - $162,576
    Income Estimation: 
    $117,033 - $148,289
    Income Estimation: 
    $73,784 - $86,677
    Income Estimation: 
    $90,372 - $103,622
    Income Estimation: 
    $61,825 - $80,560
    Income Estimation: 
    $90,032 - $105,965
    Income Estimation: 
    $85,996 - $102,718
    Income Estimation: 
    $70,609 - $91,165
    Income Estimation: 
    $86,680 - $110,316
    Income Estimation: 
    $117,033 - $148,289
    View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

    Job openings at PIKA Inc

    PIKA Inc
    Hired Organization Address Stanford, CA Full Time
    Founding Product Designer At Pika, we're building groundbreaking tools for creators, empowering them to tell their stori...
    PIKA Inc
    Hired Organization Address Palo Alto, CA Full Time
    Summary: In search of a dynamic Full-Stack Engineer to take the reins of our product and backend engineering. The ideal ...
    PIKA Inc
    Hired Organization Address Palo Alto, CA Full Time
    About this Role : Pika is seeking a talented and experienced product design lead to join our team. The ideal candidate w...

    Not the job you're looking for? Here are some other Senior Distributed Systems Engineer jobs in the Stanford, CA area that may be a better fit.

    Senior Distributed Systems Engineer

    Luma AI, Palo Alto, CA

    Senior Distributed Systems Engineer

    Luma AI, Stanford, CA

    AI Assistant is available now!

    Feel free to start your new journey!