Demo

Senior ML infrastructure engineer

Kuzco
San Francisco, CA Full Time
POSTED ON 1/23/2025
AVAILABLE BEFORE 3/23/2025

Kuzco is seeking a Senior ML Infrastructure Engineer to join our team. This role involves developing large-scale, fault-tolerant systems that handle millions of large language model inference requests per day. If you are passionate about developing next-generation ML systems that operate at scale, we want to hear from you.

About Kuzco

We are building a distributed LLM inference network that combines idle GPU capacity from around the world into a single cohesive plane of compute that can be used for running large-language models like Llama and Mistral. At any given moment, we have over 5,000 GPUs and hundreds of terabytes of VRAM connected to the network. Learn more here.

We are a small, well-funded team of staff-level engineers who work in-person in downtown San Francisco on difficult, high-impact engineering problems. Everyone on the team has been writing code for over 10 years, and has founded and run their own software companies. We are high-agency, adaptable, and collaborative. We value creativity alongside technical prowess and humility. We work hard, and deeply enjoy the work that we do; we are almost always online at least six days per week.

About the Role

You will be responsible for designing and implementing the core systems that power our globally distributed LLM inference network. You'll work on problems at the intersection of distributed systems, machine learning, and resource optimization.

Key Responsibilities

  • Design and implement scalable distributed systems for our inference network
  • Develop models for efficient resource allocation across a network of heterogeneous hardware and quickly changing topology
  • Optimize network latency, throughput, and availability
  • Build robust logging and metrics systems to monitor network health and performance
  • Conduct reviews of architecture and system design to ensure use of best practices
  • Collaborate with founders, engineers, and other stakeholders to improve our infrastructure and product offerings

What We're Looking For

  • Very strong problem-solving skills and ability to work in a startup environment
  • 5 years of experience in building high performance systems
  • Strong programming skills in Typescript, Python, and one of Go, Rust, or C
  • Solid understanding of distributed systems concepts
  • Knowledge of orchestrators and schedulers like Kubernetes and Nomad
  • Use of AI tooling in development workflow (ChatGPT, Claude, Cursor, etc)
  • Experience with LLM inference engines like vLLM or TensorRT-LLM is plus
  • Experience with GPU programming and optimization (CUDA experience is a plus)

Compensation

We offer competitive compensation, equity in a high-growth startup, and comprehensive benefits. The base salary range for this role is $180,000 - $250,000, plus equity and benefits, depending on experience.

Equal Opportunity

Kuzco is an equal opportunity employer. We welcome applicants from all backgrounds and don't discriminate based on race, color, religion, gender, sexual orientation, national origin, genetics, disability, age, or veteran status.

If you're excited about building the future of developer-first AI infrastructure, we'd love to hear from you. Please send your resume, LinkedIn, and GitHub to sam@kuzco.xyz.

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Senior ML infrastructure engineer?

Sign up to receive alerts about other jobs on the Senior ML infrastructure engineer career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$86,680 - $110,316
Income Estimation: 
$110,730 - $135,754
Income Estimation: 
$117,033 - $148,289
Income Estimation: 
$86,680 - $110,316
Income Estimation: 
$110,730 - $135,754
Income Estimation: 
$117,033 - $148,289
Income Estimation: 
$110,730 - $135,754
Income Estimation: 
$128,617 - $162,576
Income Estimation: 
$117,033 - $148,289
Income Estimation: 
$70,609 - $91,165
Income Estimation: 
$86,680 - $110,316
Income Estimation: 
$117,033 - $148,289
Income Estimation: 
$128,617 - $162,576
Income Estimation: 
$163,289 - $195,234
Income Estimation: 
$117,033 - $148,289

Sign up to receive alerts about other jobs with skills like those required for the Senior ML infrastructure engineer.

Click the checkbox next to the jobs that you are interested in.

  • Analysis of Algorithms Skill

    • Income Estimation: $119,030 - $151,900
    • Income Estimation: $125,799 - $152,617
  • Bug/Defect Analysis Skill

    • Income Estimation: $150,756 - $194,140
    • Income Estimation: $151,231 - $194,242
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Kuzco

Kuzco
Hired Organization Address Burlington, WA Full Time
About us Kuzco Inc is a small contractor for FedEx Ground based out of Burlington, WA. We currently service Oak Harbor a...

Not the job you're looking for? Here are some other Senior ML infrastructure engineer jobs in the San Francisco, CA area that may be a better fit.

ML Infrastructure Engineer (Staff/Senior)

Abridge AI Inc., San Francisco, CA

Senior AI/ML Engineer, AI Infrastructure - Remote

UnitedHealth Group, San Francisco, CA

AI Assistant is available now!

Feel free to start your new journey!