What are the responsibilities and job description for the Staff Software Engineer, Platform position at Uber?
About The Team
Our team is responsible for developing and maintaining an industry-leading goal-seeking repair engine designed to ensure tenant health across instances and maintain optimal host utilization. This sophisticated system not only continuously monitors and heals infrastructure but also supports extensibility by enabling labor services to be pluggable, paving the way for the open sourcing of the Allocation Engine.
We play a critical role in defining the authoritative signal for workload tenant health and performance, ensuring workloads operate within their declared SLAs. This serves as the key contract between product and platform: while the platform has the freedom to implement infrastructural changes, it must always honor the SLA requirements. Our work results in a unified repair engine that spans all layers of the stack, including physical networking.
The repair engine is the cornerstone of maintaining steady-state health and serves as the gatekeeper for granting permissions to other systems making changes. At its core, our team is the guardian of tenant health, ensuring a reliable and resilient platform for all workloads.
About The Role
Software engineers at Uber have a deep impact across a wide variety of business and technology decisions spanning multiple projects and locations. They are pragmatic technologists able to craft scalable systems while delivering efficient code. They are not only collaborative role models, but also empathic thought leaders within a larger group. They are humble teachers, technically mentoring a team of hardworking engineers while also executing on delivering exciting projects!
We are seeking a talented Staff Software Engineer to join our Service Mesh team, dedicated to shaping the future of L4/L7 layer networking and service mesh infrastructure, achieving unprecedented levels of reliability, scalability that meets the need for Uber rapidly growing global businesses.
What The Candidate Will Need / Bonus Points
Our team is responsible for developing and maintaining an industry-leading goal-seeking repair engine designed to ensure tenant health across instances and maintain optimal host utilization. This sophisticated system not only continuously monitors and heals infrastructure but also supports extensibility by enabling labor services to be pluggable, paving the way for the open sourcing of the Allocation Engine.
We play a critical role in defining the authoritative signal for workload tenant health and performance, ensuring workloads operate within their declared SLAs. This serves as the key contract between product and platform: while the platform has the freedom to implement infrastructural changes, it must always honor the SLA requirements. Our work results in a unified repair engine that spans all layers of the stack, including physical networking.
The repair engine is the cornerstone of maintaining steady-state health and serves as the gatekeeper for granting permissions to other systems making changes. At its core, our team is the guardian of tenant health, ensuring a reliable and resilient platform for all workloads.
About The Role
Software engineers at Uber have a deep impact across a wide variety of business and technology decisions spanning multiple projects and locations. They are pragmatic technologists able to craft scalable systems while delivering efficient code. They are not only collaborative role models, but also empathic thought leaders within a larger group. They are humble teachers, technically mentoring a team of hardworking engineers while also executing on delivering exciting projects!
We are seeking a talented Staff Software Engineer to join our Service Mesh team, dedicated to shaping the future of L4/L7 layer networking and service mesh infrastructure, achieving unprecedented levels of reliability, scalability that meets the need for Uber rapidly growing global businesses.
What The Candidate Will Need / Bonus Points
- Design, develop, and maintain the service mesh infrastructure (discovery, traffic management, routing) to ensure high reliability and scalability
- Collaborate with cross-functional teams (Compute, Foundations, Deployment, Cloud, SRE, Storage, and Product teams), design and implement scalable, reliable, and high-performance L7/L4 layer networking solutions, through sophisticated experiments
- Participate in on-call rotations to provide timely resolution of critical incidents and ensure system availability.
- Conduct in-depth debugging and troubleshooting of networking issues, both proactively and reactively.
- Continuously improve the monitoring and alerting systems to enhance system reliability.
- Stay abreast of industry trends and emerging technologies in networking, service mesh, and cloud-native architectures.
- You have a proven record of building and productionizing highly reliable infrastructure at scale.
- 8 years of relevant engineering experience, specifically working on backend services' networking stack or building infrastructure platform-as-a-service services
- Proficient in one of the following programming languages: Java, Go, C .
- Bachelor's degree in Computer Science or related technical field or equivalent practical experience.
- Demonstrated ability to thrive in a fast-paced, collaborative environment with a passion for continuous learning and improvement.
- Experience to lead and guide excellent engineering teams.
- Experience in designing & developing large scale distributed systems.
- Experience with Kubernetes (k8s), Istio service mesh and Envoy is highly desirable.
- A bonus if you are a domain expert in the field of network infrastructure, public cloud, compute, storage, networking, containers/orchestration, observability.
- A solid understanding of designing based on metrics to achieve business objectives
- Passionate about helping teams grow by inspiring and mentoring engineers.
- You have great interpersonal skills, deep technical ability, and a portfolio of successful execution.
Salary : $223,000 - $248,000