Demo

Senior Software Engineer, ML Performance & Systems

fal
Alameda, CA Full Time
POSTED ON 2/22/2025
AVAILABLE BEFORE 5/15/2025

Join our team at fal, where we are dedicated to pushing the boundaries of model performance for generative media models. You will play a vital role in designing and implementing innovative model serving architectures using our proprietary inference engine, with a clear focus on maximizing throughput while reducing latency and resource consumption.

As a key contributor, you will develop performance monitoring and profiling tools to pinpoint bottlenecks and discover optimization opportunities. Collaboration will be essential as you work closely with our Applied ML team and our customers in frontier labs within the media space, ensuring their workloads are optimized for our accelerator.

Key Responsibilities :

  • Drive the advancement of model performance for generative media models at fal.
  • Architect and implement cutting-edge solutions for model serving on our in-house inference engine, prioritizing throughput, latency, and resource efficiency.
  • Create tools for performance monitoring and profiling to detect bottlenecks and enhance optimization strategies.
  • Collaborate closely with our Applied ML team and customers, ensuring they derive maximum benefit from our accelerator solutions.

Requirements :

  • Robust background in systems programming with a proven track record of identifying and resolving performance bottlenecks.
  • Extensive knowledge of the latest ML infrastructure, including but not limited to PyTorch, TensorRT, TransformerEngine, and Nsight, with a keen interest in staying updated with developments in these areas.
  • Strong understanding of underlying hardware (currently Nvidia-based systems) and ability to dive deep into the stack to troubleshoot and optimize, including custom GEMM kernels with CUTLASS for common matrix shapes.
  • Experience with Triton or a strong willingness to learn, along with similar expertise in lower-level accelerator programming.
  • Familiarity with multi-dimensional model parallelism techniques utilizing a combination of parallelism methods such as tensor parallelism and context / sequence parallelism.
  • Understanding of the internals of Ring Attention, FA3, and FusedMLP implementations.
  • Compensation :

  • 180,000 - $500,000 equity comprehensive benefits package
  • Location : San Francisco, CA
  • What we offer at fal :

  • Engaging and challenging projects.
  • Emphasis on work-life balance.
  • Attractive salary and equity options.
  • Employee-friendly equity terms, including early and extended exercise options.
  • Opportunity to work in our downtown San Francisco office, with remote options available for exceptional candidates.
  • Visa sponsorship available to assist with relocation to San Francisco.
  • Comprehensive health, dental, and vision insurance (US).
  • Regular team events and offsites.
  • Generous paid vacation policy of 4 weeks.
  • Salary : $180,000 - $500,000

    If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
    Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

    What is the career path for a Senior Software Engineer, ML Performance & Systems?

    Sign up to receive alerts about other jobs on the Senior Software Engineer, ML Performance & Systems career path by checking the boxes next to the positions that interest you.
    Income Estimation: 
    $123,167 - $152,295
    Income Estimation: 
    $146,673 - $180,130
    Income Estimation: 
    $123,167 - $152,295
    Income Estimation: 
    $146,673 - $180,130
    Income Estimation: 
    $146,673 - $180,130
    Income Estimation: 
    $176,149 - $220,529
    Income Estimation: 
    $77,657 - $95,021
    Income Estimation: 
    $97,257 - $120,701
    Income Estimation: 
    $97,257 - $120,701
    Income Estimation: 
    $123,167 - $152,295
    View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

    Job openings at fal

    fal
    Hired Organization Address Fremont, CA Full Time
    Join our team at fal, where we are dedicated to pushing the boundaries of model performance for generative media models....
    fal
    Hired Organization Address Sunnyvale, CA Full Time
    Join our team at fal, where we are dedicated to pushing the boundaries of model performance for generative media models....
    fal
    Hired Organization Address San Jose, CA Full Time
    Join our team at fal, where we are dedicated to pushing the boundaries of model performance for generative media models....
    fal
    Hired Organization Address San Francisco, CA Full Time
    Join our team at fal, where we are dedicated to pushing the boundaries of model performance for generative media models....

    Not the job you're looking for? Here are some other Senior Software Engineer, ML Performance & Systems jobs in the Alameda, CA area that may be a better fit.

    AI Assistant is available now!

    Feel free to start your new journey!