What are the responsibilities and job description for the Senior Software Engineer, ML Performance & Systems position at fal?
Join our team at fal, where we are dedicated to pushing the boundaries of model performance for generative media models. You will play a vital role in designing and implementing innovative model serving architectures using our proprietary inference engine, with a clear focus on maximizing throughput while reducing latency and resource consumption.
As a key contributor, you will develop performance monitoring and profiling tools to pinpoint bottlenecks and discover optimization opportunities. Collaboration will be essential as you work closely with our Applied ML team and our customers in frontier labs within the media space, ensuring their workloads are optimized for our accelerator.
Key Responsibilities :
Drive the advancement of model performance for generative media models at fal.
Architect and implement cutting-edge solutions for model serving on our in-house inference engine, prioritizing throughput, latency, and resource efficiency.
Create tools for performance monitoring and profiling to detect bottlenecks and enhance optimization strategies.
Collaborate closely with our Applied ML team and customers, ensuring they derive maximum benefit from our accelerator solutions.
Requirements :
Robust background in systems programming with a proven track record of identifying and resolving performance bottlenecks.
Extensive knowledge of the latest ML infrastructure, including but not limited to PyTorch, TensorRT, TransformerEngine, and Nsight, with a keen interest in staying updated with developments in these areas.
Strong understanding of underlying hardware (currently Nvidia-based systems) and ability to dive deep into the stack to troubleshoot and optimize, including custom GEMM kernels with CUTLASS for common matrix shapes.
Experience with Triton or a strong willingness to learn, along with similar expertise in lower-level accelerator programming.
Familiarity with multi-dimensional model parallelism techniques utilizing a combination of parallelism methods such as tensor parallelism and context / sequence parallelism.
Understanding of the internals of Ring Attention, FA3, and FusedMLP implementations.
Keep a pulse on the job market with advanced job matching technology.
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution.
Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right.
Surveys & Data Sets
What is the career path for a Senior Software Engineer, ML Performance & Systems?
Sign up to receive alerts about other jobs on the Senior Software Engineer, ML Performance & Systems career path by checking the boxes next to the positions that interest you.
Not the job you're looking for? Here are some other Senior Software Engineer, ML Performance & Systems jobs in the Sunnyvale, CA area that may be a better fit.
We don't have any other Senior Software Engineer, ML Performance & Systems jobs in the Sunnyvale, CA area right now.