What are the responsibilities and job description for the GPU Research Engineer position at Alldus?

Role Overview

We are looking for a GPU Research Engineer to work on optimizing inference performance for large language models (LLMs) by developing and optimizing GPU kernels. This role involves low-level performance tuning, CUDA / Triton programming, and debugging deep learning workloads to maximize throughput and efficiency.

You will collaborate with ML engineers, systems researchers, and hardware teams to push the limits of GPU acceleration for AI workloads.

Responsibilities

Develop, optimize, and debug custom GPU kernels using CUDA, Triton, and other low-level performance libraries .
Profile and analyze deep learning inference workloads to identify bottlenecks and implement optimizations.
Improve memory bandwidth utilization, kernel fusion, tiling strategies, and tensor parallelism for efficient LLM execution.
Work closely with ML and infrastructure teams to enhance model execution across different GPU architectures (e.g., NVIDIA H100, A100, MI300).
Research and implement state-of-the-art techniques for reducing latency, improving throughput, and minimizing memory overhead.
Contribute to open-source deep learning frameworks or internal acceleration toolkits as needed.

Requirements

Strong experience in CUDA, Triton, or OpenCL for GPU programming.

Deep understanding of GPU architectures, memory hierarchy, and parallel computing.

Experience profiling and debugging GPU workloads using NVIDIA Nsight, cuDNN, TensorRT, or PyTorch / XLA .

Solid knowledge of ML frameworks such as PyTorch, JAX, or TensorFlow and their GPU execution models.

Familiarity with numerical precision trade-offs (FP16, BF16, INT8 quantization) and mixed-precision computation.

Proficiency in C and Python.

Prior experience working on inference optimizations for large-scale ML models is a plus.

Nice to Have

Experience with compiler optimizations, MLIR, or TVM.

Contributions to open-source deep learning libraries related to GPU acceleration.

Hands-on experience with distributed inference techniques (tensor / model parallelism).

Knowledge of hardware-specific optimizations for TPUs, NPUs, or FPGAs .

Why Join Us?

Work on cutting-edge AI infrastructure and shape the future of large-scale LLM inference.

Collaborate with world-class researchers and engineers optimizing AI workloads at scale.

Access to state-of-the-art hardware, including the latest GPUs and AI accelerators.

Competitive compensation, equity, and benefits package.

Apply for this job

Receive alerts for other GPU Research Engineer job openings

Job openings at Alldus

Senior Machine Learning Engineer

Alldus

Santa Rosa, CA Full Time

Join our cutting-edge startup as we revolutionize healthcare through AI-driven solutions for transitions of care and pat...

Senior Machine Learning Engineer

Alldus

San Francisco, CA Full Time

Join our cutting-edge startup as we revolutionize healthcare through AI-driven solutions for transitions of care and pat...

AI Research Scientist

Alldus

San Mateo, CA Full Time

My Client is pioneering the future of Automation AI, developing an avant-garde platform that revolutionizes how repetiti...

Product Manager

Alldus

Denver, CO Full Time

Our client, a data-driven organization tacking some of the world’s toughest challenges, are hiring a Product Manager to ...

Not the job you're looking for? Here are some other GPU Research Engineer jobs in the New York, NY area that may be a better fit.

GPU Research Engineer

What are the responsibilities and job description for the GPU Research Engineer position at Alldus?

What is the career path for a GPU Research Engineer?

Job openings at Alldus

Not the job you're looking for? Here are some other GPU Research Engineer jobs in the New York, NY area that may be a better fit.

We don't have any other GPU Research Engineer jobs in the New York, NY area right now.

AI Assistant is available now!