What are the responsibilities and job description for the Staff Software Engineer, ML Serving Platform position at DoorDash USA?
About the Team
DoorDash is building the world’s most reliable on-demand logistics engine. Behind the scenes, our Machine Learning Platform (MLP) powers critical real-time decision-making for millions of orders each day, supporting business-critical use cases like Ads, Groceries, Logistics, Fraud, and Search.
We’re looking for a Staff Software Engineer to lead our ML Serving initiatives—enabling seamless, high-performance, and highly scalable model inference at DoorDash. You will guide a small, talented team in developing and operating a next-generation ML serving platform that handles millions of QPS across a global marketplace.
About the Role
As the technical lead for ML Serving Platform, you will have a direct impact on DoorDash’s most mission-critical services, ensuring our models deliver real-time predictions reliably and at scale. You will partner closely with ML Engineers, and Infrastructure teams to design and operate the serving stack—everything from containerized model deployments to advanced feature lookups and GPU-based serving.
You will drive the roadmap, architecture, and best practices for a platform that must remain highly available, isolated across workloads, and cost-efficient—all while enabling rapid experimentation and iteration for model owners.
This is a hybrid opportunity in San Francisco, Sunnyvale, Seattle, or New York.
You’re excited about this opportunity because you will…
- Lead the Vision & Architecture - Set the technical direction for an extremely high-QPS (multi-million QPS) serving platform, enabling rapid and reliable deployment of ML models across a variety of use cases.
- Build for Scale & Reliability - Own and evolve our model serving stack to ensure zero-downtime, 24/7 operations. You’ll tackle unique scaling challenges around throughput, isolation, and latency.
- Enable Self-Serve Model Deployments - Develop abstractions, ensuring that ML Engineers can seamlessly bring their own models (BYOM), and custom GPU-accelerated workloads online.
- Improve Developer Velocity - Drive innovations that reduce time-to-production. Standardize workflows for deploying, validating, and monitoring ML services with strong observability and debugging capabilities.
- Collaborate Across the Company - Work closely with teams in Ads, Fraud, Logistics, Groceries, and more to tailor the serving platform for their specific needs while maintaining a core set of robust, reusable components.
- Mentor & Lead - Guide a small but growing team of senior engineers. Champion best practices, set coding standards, conduct design reviews, and help shape DoorDash’s ML culture.
We’re excited about you because…
- 8 years of industry experience in software engineering, with at least 1 year of technical lead experience.
- Deep expertise in building large-scale, distributed systems—you’re comfortable architecting services that handle millions of requests per second with single-digit millisecond latencies.
- Strong knowledge of CS fundamentals and experience with programming languages like Python, golang, Kotlin, C , or Java.
- Experience with production ML systems—you’ve built or operated high-QPS inference services, real-time feature stores, or large-scale data pipelines.
- Passion for reliability & performance—you’ve developed strategies for zero-downtime deployments, high availability, and low-latency serving, and you understand cost vs. performance trade-offs.
- Track record of technical leadership—you excel at collaboration, driving projects end-to-end, and mentoring other engineers in best practices.
Nice To Haves
- GPU experience for ML serving and real-time inference.
- Familiarity with deep learning frameworks (PyTorch, TensorFlow) and large language models (LLMs) such as GPT or BERT.
- Experience with microservices and container orchestration (Kubernetes, EKS).
- Cloud computing experience (AWS, GCP, etc.), including cost attribution and optimization.
- Background in model lifecycle management (MLflow, ML Orchestration systems, or metadata tracking).
Notice to Applicants for Jobs Located in NYC or Remote Jobs Associated With Office in NYC Only
We use Covey as part of our hiring and/or promotional process for jobs in NYC and certain features may qualify it as an AEDT in NYC. As part of the hiring and/or promotion process, we provide Covey with job requirements and candidate submitted applications. We began using Covey Scout for Inbound from August 21, 2023, through December 21, 2023, and resumed using Covey Scout for Inbound again on June 29, 2024.
The Covey tool has been reviewed by an independent auditor. Results of the audit may be viewed here: Covey