Demo

LLM/ML Engineer (Inference)

Reducto
San Francisco, CA Full Time
POSTED ON 3/24/2025
AVAILABLE BEFORE 4/22/2025
About The Role

About Us

The vast majority of enterprise data — from financial statements to health records — are locked in unstructured file formats like PDFs and spreadsheets. Reducto is the most accurate way to parse and extract data from complex documents.

Today we power ingestion pipelines for hundreds of leading AI teams, ranging from popular startups to Fortune 10 enterprises. We’ve grown incredibly quickly (0→7 fig in ARR in 6 months), are loved by customers (>300M pages parsed), and are well funded by tier 1 investors.

The Core Work Will Include

  • Architecting and implementing robust, scalable inference systems for serving state-of-the-art AI models
  • Optimizing model serving infrastructure for high throughput and low latency at scale
  • Developing and integrating advanced inference optimization techniques
  • Working closely with our research team to bring cutting-edge capabilities into production
  • Building developer tools and infrastructure to support rapid experimentation and deployment.

We Would Love To Meet You If You

  • Philosophy: You are your own worst critic. You have a high bar for quality and don’t rest until the job is done right—no settling for 90%. We want someone who ships fast, with high agency, and who doesn't just voice problems but actively jumps in to fix them.
  • Experience: You have deep expertise in Python and PyTorch, with a strong foundation in low-level operating systems concepts including multi-threading, memory management, networking, storage, performance, and scale. You're experienced with modern inference systems like TGI, vLLM, TensorRT-LLM, and Optimum, and comfortable creating custom tooling for testing and optimization.
  • Approach: You combine technical expertise with practical problem-solving. You're methodical in debugging complex systems and can rapidly prototype and validate solutions.

Bonus Points If You

  • Have experience with low-level systems programming (CUDA, Triton) and compiler optimization
  • Are passionate about open-source contributions and staying current with ML infrastructure developments
  • Bring practical experience with high-performance computing and distributed systems
  • Have worked in early-stage environments where you helped shape technical direction
  • Are energized by solving complex technical challenges in a collaborative environment

This is an in person role at our office in SF. We’re an early stage company which means that the role requires working hard and moving quickly. Please only apply if that excites you.

About Reducto

Nearly 80% of enterprise data is in unstructured formats like PDFs

PDFs are the status quo for enterprise knowledge in nearly every industry. Insurance claims, financial statements, invoices, and health records are all stored in a structure that’s simply impractical for use in digital workflows. This isn’t an inconvenience—it’s a critical bottleneck that leads to dozens of wasted hours every week.

Traditional approaches fail at reliably extracting information in complex PDFs

OCR and even more sophisticated ML approaches work for simple text documents but are unreliable for anything more complex. Text from different columns are jumbled together, figures are ignored, and tables are a nightmare to get right. Overcoming this usually requires a large engineering effort dedicated to building specialized pipelines for every document type you work with.

Reducto Breaks Document Layouts Into Subsections And Then Contextually Parses Each Depending On The Type Of Content. This Is Made Possible By a Combination Of Vision Models, LLMs, And a Suite Of Heuristics We Built Over Time. Put Simply, We Can Help You

  • Accurately extract text and tables even with nonstandard layouts
  • Automatically convert graphs to tabular data and summarize images in documents
  • Extract important fields from complex forms with simple, natural language instructions
  • Build powerful retrieval pipelines using Reducto’s document metadata
  • Intelligently chunk information using the document’s layout data

Benefits at Reducto

At Reducto, we’re invested in the well-being and growth of our team. Here’s what we currently offer:

  • Unlimited PTO: We believe great work requires recharging.
  • Lunch: Receive a free lunch to eat with your teammates daily at the office
  • Reimbursed Transportation: Provide us with your receipts and we’ll take care of the costs
  • Insurance: Generous health insurance covering medical, dental, and vision.
  • Health and Wellness Budget: We provide up to $150/mo reimbursement for health and wellness spending, such as gym memberships, fitness classes, or similar.
  • Parental Leave: Work with us to build a leave schedule that works for you and your family

Reducto is an Equal Opportunity Employer committed to diversity and inclusion in the workplace. All qualified applicants will receive consideration for employment without regard to sex, race, color, age, national origin, religion, physical and mental disability, genetic information, marital status, sexual orientation, gender identity/assignment, citizenship, pregnancy or maternity, protected veteran status, or any other status prohibited by applicable national, federal, state or local law.

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a LLM/ML Engineer (Inference)?

Sign up to receive alerts about other jobs on the LLM/ML Engineer (Inference) career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$119,030 - $151,900
Income Estimation: 
$149,493 - $192,976
Income Estimation: 
$103,114 - $138,258
Income Estimation: 
$118,163 - $145,996
Income Estimation: 
$120,777 - $151,022
Income Estimation: 
$129,363 - $167,316
Income Estimation: 
$86,891 - $130,303
Income Estimation: 
$86,680 - $110,316
Income Estimation: 
$110,730 - $135,754
Income Estimation: 
$117,033 - $148,289
Income Estimation: 
$110,730 - $135,754
Income Estimation: 
$128,617 - $162,576
Income Estimation: 
$117,033 - $148,289
Income Estimation: 
$129,363 - $167,316
Income Estimation: 
$145,845 - $177,256
Income Estimation: 
$147,836 - $182,130
Income Estimation: 
$154,597 - $194,610
Income Estimation: 
$86,891 - $130,303
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Reducto

Reducto
Hired Organization Address San Francisco, CA Full Time
The vast majority of enterprise data is in files like PDFs and spreadsheets. That includes everything from financial sta...

Not the job you're looking for? Here are some other LLM/ML Engineer (Inference) jobs in the San Francisco, CA area that may be a better fit.

Applied ML Engineer, LLM

Usespeak, San Francisco, CA

Software Engineer - ML / LLM Inference

Alldus, San Francisco, CA

AI Assistant is available now!

Feel free to start your new journey!