What are the responsibilities and job description for the GPU Research Engineer position at Alldus?
Role Overview
We are looking for a GPU Research Engineer to work on optimizing inference performance for large language models (LLMs) by developing and optimizing GPU kernels. This role involves low-level performance tuning, CUDA / Triton programming, and debugging deep learning workloads to maximize throughput and efficiency.
You will collaborate with ML engineers, systems researchers, and hardware teams to push the limits of GPU acceleration for AI workloads.
Responsibilities
- Develop, optimize, and debug custom GPU kernels using CUDA, Triton, and other low-level performance libraries .
- Profile and analyze deep learning inference workloads to identify bottlenecks and implement optimizations.
- Improve memory bandwidth utilization, kernel fusion, tiling strategies, and tensor parallelism for efficient LLM execution.
- Work closely with ML and infrastructure teams to enhance model execution across different GPU architectures (e.g., NVIDIA H100, A100, MI300).
- Research and implement state-of-the-art techniques for reducing latency, improving throughput, and minimizing memory overhead.
- Contribute to open-source deep learning frameworks or internal acceleration toolkits as needed.
Requirements
Nice to Have
Why Join Us?