Recent Searches

You haven't searched anything yet.

1 Benchmarking and System Validation Software Engineer Job in Raleigh, NC

SET JOB ALERT
Details...
Enfabrica
Raleigh, NC | Full Time
$110k-132k (estimate)
4 Days Ago
Benchmarking and System Validation Software Engineer
Enfabrica Raleigh, NC
Apply
$110k-132k (estimate)
Full Time 4 Days Ago
Save

Enfabrica is Hiring a Benchmarking and System Validation Software Engineer Near Raleigh, NC

Job Description

Job Description

Summary

We are seeking a talented Software engineer to join our Durham, North Carolina team focused on Benchmarking, System Validation and Test Automation for large-scale distributed systems. In this role, you will be involved with writing applications to benchmark next-generation computing infrastructure at performance and scale with real-world Machine Learning workloads along with building system topologies to validate our customer use cases.

Roles and Responsibilities:

  • Model and Benchmark large scale Machine Learning workloads 
  • Characterize performance of distributed deep learning applications with data and model parallelism, and model sharding across devices and memories 
  • Write applications, libraries and kernel modules that stress I/O technology capabilities including those that stress NCCL and CUDA GPU technology
  • Develop low-level SW applications to test I/O performance of next-gen compute systems
  • Validate customer use cases using our technology, and assist with such deployments
  • Implement broad System and Solution Level testing
  • Create White Papers that showcase Data Center I/O technology

Desired Knowledge and Skill Set:

  • Hands on experience with ML Collective Communication and CUDA programming
  • Hands on experience with ML frameworks such as PyTorch and TensorFlow
  • Familiarity with standard Machine Learning workload benchmarks for Training and Inference
  • Strong coding skills in multiple languages such as Python, C and C
  • Background in low-level I/O performance analysis of networking and server systems 
  • Good knowledge of TCP/IP and performance of other networking protocols 
  • Detailed understanding of server components and applicable drivers for CPUs, memory, GPUs, networking devices and storage
  • Experience validating large scale, Data Center networking and server solutions
  • Working knowledge of high performance communication technologies like MPI, Infiniband, RDMA, GPU-Direct and NVLink is desirable
  • Linux systems knowledge
  • 5 years of software development experience working closely with hardware
This role will require employee to be on-site in the Raleigh, North Carolina office. No hybrid work option.

About Us 

Enfabrica is on a mission to revolutionize AI compute systems and infrastructure at scale through the development of superior-scaling networking silicon and software which we call the Accelerated Compute Fabric. Founded and led by an executive team assembled from first-class semiconductor and distributed systems/software companies throughout the industry, Enfabrica sets themselves apart from other startups with a very strong engineering pedigree, a proven track record of delivering, deploying and scaling products in data center production environments, and significant investor support for our ambitious journey! Together, with their differentiated approach to solving the I/O bottlenecks in distributed AI and accelerated compute clusters, Enfabrica is unleashing the revolution in next-gen computing fabrics.

Powered by JazzHR

cyhtou2MYB

Job Summary

JOB TYPE

Full Time

SALARY

$110k-132k (estimate)

POST DATE

09/06/2024

EXPIRATION DATE

09/22/2024

Show more

Enfabrica
Full Time
$113k-138k (estimate)
1 Week Ago
Enfabrica
Full Time
1 Month Ago