What are the responsibilities and job description for the Senior Distributed Systems Engineer position at PIKA Inc?
We are seeking highly skilled engineers with expertise in machine learning, distributed systems, and high-performance computing to join our Research team. In this role, you will collaborate closely with researchers to build and optimize platforms that train next-generation foundation models on massive GPU clusters. Your work will play a critical role in advancing the efficiency and scalability of cutting-edge generative AI technologies.
Key Responsibilities
- Scale and optimize systems for training large-scale models across multi-thousand GPU clusters.
- Profile and enhance the performance of training codebases to achieve best-in-class hardware efficiency.
- Develop systems to distribute workloads efficiently across massive GPU clusters.
- Design and implement robust solutions to enable model training in the presence of hardware failures.
- Build tools to diagnose issues, visualize processes, and evaluate datasets at scale.
- Optimize and deploy inference workloads for throughput and latency across the entire stack, including data processing, model inference, and parallel processing.
- Implement and improve high-performance CUDA, Triton, and PyTorch code to address efficiency bottlenecks in memory, speed, and utilization.
- Collaborate with researchers to ensure systems are designed with optimal efficiency from the ground up.
- Prototype cutting-edge applications using multimodal generative AI.
Qualifications
3 years of professional experience in ML pipelines, distributed systems, or high-performance computing.
Proficiency in Python, with significant experience using PyTorch.
Note : This position is not intended for recent graduates.
Compensation
The salary range for this role in California is $175,000-$250,000 per year. Actual compensation will depend on job-related knowledge, skills, experience, and candidate location. We also offer competitive equity packages in the form of stock options and a comprehensive benefits plan.
Salary : $175,000 - $250,000