What are the responsibilities and job description for the Senior AI Infrastructure Engineer position at Signify Technology?

Job Title : Senior AI Infrastructure Engineer

Location : Remote but must be located in the Bay Area

Salary Range : $200,000-$250,000 Equity

About the Company

They are a fast-growing startup in the 3D generation space, focused on creating tools for 3D artists and game developers. With over 1 million users, their platform is at the forefront of revolutionizing the creation of 3D content using advanced AI and machine learning. Their products enable game developers to quickly generate high-quality 3D models. As they continue to expand, they are looking for an experienced Senior AI Infrastructure Engineer to help scale their AI and machine learning infrastructure.

About the Role

In this role, the engineer will be responsible for training and managing GPU clusters, scaling data processing workflows, and optimizing the performance of AI models on cloud infrastructure. They will work hands-on with large-scale datasets and GPUs to build and scale the infrastructure required to support cutting-edge AI applications such as Text-to-3D and Image-to-3D generation. The ideal candidate will have experience managing their own GPU clusters (8 GPUs), scaling workloads, and working with large image datasets in a cloud environment.

Responsibilities

GPU Cluster Management : Lead the training and inferencing processes for image-based AI models on GPU clusters. Manage and scale 8 GPUs, ensuring efficient operation and optimal performance across the cluster. This includes setup, monitoring, and troubleshooting of GPU resources.
Data Processing & Scaling : Work directly with large-scale data processing workflows. Ensure data is processed, cleaned, and ready for training. Scale data pipelines to support high throughput in cloud environments such as AWS or Azure.
Model Tuning & Training : Work with teams to fine-tune AI models on large image datasets. Train models from scratch or fine-tune pre-trained models for specific use cases, ensuring high performance and scalability. Fine-tuning multi-GPU setups will be a critical part of the role.
Cloud Infrastructure : Utilize cloud platforms like AWS or Azure to manage and scale GPU clusters. Optimize cloud resources for large-scale training jobs and ensure infrastructure supports the growing demands of their AI models.
Collaboration & Innovation : Collaborate closely with AI and ML teams to deploy new algorithms, experiment with distributed training, and enhance infrastructure. Play a key role in scaling their GenAI products and ensuring systems can handle millions of AI operations per month.

Required Skills

Experience with GPU Clusters : Proven hands-on experience managing and training models on GPU clusters of 8 GPUs, ideally managing the infrastructure independently (not via a company). Comfortable with both training and inferencing tasks on large-scale systems.

Large-Scale Data Experience : Experience processing large image datasets for machine learning tasks, including data preprocessing, scaling data workflows, and ensuring smooth pipelines for large training jobs.

Model Training & Tuning : Experience in training and fine-tuning deep learning models (primarily image-based models) using frameworks like PyTorch, TensorFlow, or similar. Proficiency in tuning models on GPUs to maximize performance.

Cloud Platforms & Tools : Experience working with cloud platforms like AWS or Azure to scale GPU clusters for deep learning workloads. Knowledge of cloud-based orchestration tools (e.g., Ray) is a plus.

Programming Skills : Proficiency in Python for developing and optimizing training pipelines. Experience with distributed computing and parallel processing tools is highly valued. Familiarity with JAX, PyTorch, or similar libraries for model training is beneficial.

Salary : $200,000 - $250,000

Apply for this job

Receive alerts for other Senior AI Infrastructure Engineer job openings

Senior AI Infrastructure Engineer

What are the responsibilities and job description for the Senior AI Infrastructure Engineer position at Signify Technology?

What is the career path for a Senior AI Infrastructure Engineer?

Job openings at Signify Technology

Not the job you're looking for? Here are some other Senior AI Infrastructure Engineer jobs in the San Francisco, CA area that may be a better fit.

We don't have any other Senior AI Infrastructure Engineer jobs in the San Francisco, CA area right now.

AI Assistant is available now!