What are the responsibilities and job description for the Inference Performance Engineer position at Acceler8 Talent?

Join Our Inference Performance Team : Optimizing Foundation Models On-Device

We’re building the future of on-device AI by making foundation models smarter, faster, and more efficient. As part of the Inference Performance Team , you’ll work on challenging, high-impact projects to push the limits of what’s possible with foundation model inference.

What You’ll Do

Pinpoint performance bottlenecks and navigate quality-performance trade-offs in reference implementations (e.g., openai / whisper) and our optimized frameworks.
Design, prototype, and test performance improvements tailored to meet enterprise customer needs.
Drive innovation in our open-source inference frameworks by pitching and delivering new ideas.
Help expand support to new platforms—currently focused on Apple but actively growing into Android, Linux, and soon Windows.
Collaborate with ML Research Engineers to turn theoretical advances into practical, real-world optimizations.

Core Qualifications :

3 years of industry experience working on technically challenging problems.

Proficiency in Python or C / C .

Experience with CUDA, OpenCL, or Metal.

A strong understanding of hardware acceleration (GPUs, NPUs, TPUs, CPUs).

Familiarity with modern ML frameworks like TensorFlow, PyTorch, Core ML, or ONNX.

Expertise in GPU kernel programming.

Contributions to major ML frameworks or open-source projects.

Why This Role?

You’ll play a critical role in advancing the performance of foundation models across platforms like Apple, Android, and Linux—shaping the future of efficient, scalable on-device AI.

Apply for this job

Receive alerts for other Inference Performance Engineer job openings

Job openings at Acceler8 Talent

Senior Compiler Engineer

Acceler8 Talent

Sunnyvale, CA Full Time

I am currently seeking a Senior ML Compiler Engineer to join the team, focusing on becoming the compute platform for AGI...

Machine Learning Engineer

Acceler8 Talent

Denver, CO Full Time

🚀 Machine Learning Engineer – AI & Cutting-Edge Innovation 📍 Denver | Hybrid (3 days in-office) | Relocation Available...

Technical Project Manager

Acceler8 Talent

Denver, CO Full Time

Technical Project Manager - Denver, CO A rapidly growing start-up who are revolutionizing tolling technology and are bac...

Software Engineer

Acceler8 Talent

Colorado, CO Full Time

Software Engineer Transform Travel with Python & TypeScript Who We Are We’re pioneering AI-driven traffic solutions. Hel...

Not the job you're looking for? Here are some other Inference Performance Engineer jobs in the Santa Rosa, CA area that may be a better fit.

Inference Performance Engineer

What are the responsibilities and job description for the Inference Performance Engineer position at Acceler8 Talent?

What is the career path for a Inference Performance Engineer?

Job openings at Acceler8 Talent

Not the job you're looking for? Here are some other Inference Performance Engineer jobs in the Santa Rosa, CA area that may be a better fit.

We don't have any other Inference Performance Engineer jobs in the Santa Rosa, CA area right now.

AI Assistant is available now!