Demo

Distributed ML Systems Engineer- Inference

TBWA\Chiat\Day
San Francisco, CA Full Time
POSTED ON 2/18/2025
AVAILABLE BEFORE 5/7/2025

RoleTogether AI is seeking a Distributed ML Systems Engineer to design and build scalable machine learning systems that power our accelerated AI initiatives. This role involves developing large-scale, fault-tolerant distributed systems that handle high-load and high-performance requirements. If you are passionate about designing ML systems that operate at scale and eager to create impactful solutions, we want to hear from you. This position offers the chance to work closely with our AI researchers and infrastructure teams to ensure our systems are robust and efficient. Join us in shaping the future at Together AI!ResponsibilitiesDesign and build large-scale, distributed machine learning systems that are fault-tolerant and high-performance.Develop and optimize distributed processing frameworks and storage systems.Collaborate with researchers, engineers, and product managers to integrate ML systems into our infrastructure.Conduct architecture and design reviews to ensure best practices in system design.Implement robust monitoring and logging systems to ensure the health and performance of our ML systems.Requirements3 years of experience in building large-scale, fault-tolerant, high-performance distributed systems.Strong programming skills in one or more of Python, Go, Rust, or C / C .Excellent understanding of low-level operating systems concepts including multi-threading, memory management, networking, and storage, performance, and scale.Experience with cloud computing platforms (AWS, GCP, Azure etc.) and large-scale infrastructure.Strong problem-solving skills and ability to work in a fast-paced environment.Preferred : Experience with KubernetesPreferred : Experience with PytorchAbout Together AITogether AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society. Together, we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI. Our team has been behind technological advancements such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers and engineers in our journey to build the next-generation AI infrastructure.CompensationWe offer competitive compensation, startup equity, health insurance, and other competitive benefits. The US base salary range for this full-time position is $160,000 - $230,000 equity benefits. Our salary ranges are determined by location, level, and role. Individual compensation will be determined by experience, skills, and job-related knowledge.Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunities to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.#J-18808-Ljbffr

Salary : $160,000 - $230,000

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Distributed ML Systems Engineer- Inference?

Sign up to receive alerts about other jobs on the Distributed ML Systems Engineer- Inference career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$142,209 - $179,056
Income Estimation: 
$177,932 - $225,503
Income Estimation: 
$59,440 - $93,329
Income Estimation: 
$69,043 - $113,369
Income Estimation: 
$70,609 - $91,165
Income Estimation: 
$86,680 - $110,316
Income Estimation: 
$117,033 - $148,289
Income Estimation: 
$86,680 - $110,316
Income Estimation: 
$110,730 - $135,754
Income Estimation: 
$117,033 - $148,289
Income Estimation: 
$110,730 - $135,754
Income Estimation: 
$128,617 - $162,576
Income Estimation: 
$117,033 - $148,289
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at TBWA\Chiat\Day

TBWA\Chiat\Day
Hired Organization Address Washington, DC Full Time
Location : Remote / Virtual from the Washington DC Metropolitan Area Reports to : Senior Director, Events Classification...
TBWA\Chiat\Day
Hired Organization Address Lehi, UT Full Time
At Podium, our mission is to arm every local business with a complete platform and outcome-driven AI employees that conv...
TBWA\Chiat\Day
Hired Organization Address Epes, AL Full Time
Key Responsibilities Operate heavy equipment efficiently and safely, adhering to strict adherence to safety rules and re...
TBWA\Chiat\Day
Hired Organization Address Seattle, WA Full Time
Headway’s mission is a big one – to build a new mental health care system everyone can access. We’ve built technology th...

Not the job you're looking for? Here are some other Distributed ML Systems Engineer- Inference jobs in the San Francisco, CA area that may be a better fit.

Senior Distributed ML Systems Engineer

Kuzco, San Francisco, CA

Distributed LLM Inference Engineer

Anyscale, San Francisco, CA

AI Assistant is available now!

Feel free to start your new journey!