What are the responsibilities and job description for the Machine Learning Infrastructure Engineer position at GLASS Imaging?
About the role
At Glass Imaging, we're looking for a highly skilled Machine Learning (ML) Infrastructure Engineer to re-design and develop the backbone of our ML training and evaluation ecosystem. As an experienced professional with a track record of success architecting solutions in this area, you'll have the freedom to reshape our platform from the ground up, crafting everything from GPU allocation and data management to experiment tracking and evaluation pipelines.
You'll be working closely with our ML researchers and engineers to understand their needs, streamline their workflows, and ensure that our platform can scale with the team. You will help create automated repeatable solutions, reducing manual overheads. While we prioritize clean, maintainable code, we operate in a fast-moving research environment where adaptability is key – this role offers plenty of opportunity to explore new ideas, refine solutions, and continuously improve our infrastructure.
Right now, we're looking for someone eager to tackle these challenges hands-on, but as our team grows, this role will have the opportunity to take on more leadership responsibility, guiding the continued development of our ML platform and helping shape the team around it.
Responsibilities
- Design & build a scalable, efficient Python infrastructure for training and evaluating ML models.
- Improve automation of ML train/test infrastructure. E.g., Inference tools that log, cache, and visually compare model outputs, provide code-free methods to run models on new datasets.
- Develop and manage systems for GPU resource allocation, dataset management, experiment tracking, and evaluation pipelines. Integrate job scheduling (e.g., SLURM).
- Implement automated dataset versioning and validation.
- Build tools for reporting and visualizing model metrics and performance.
- Improve developer efficiency by creating tools and workflows that streamline ML model iteration and testing. Add and improve performance profiling.
- Ensure scalability and reliability of the ML platform as the company grows.
- Collaborate closely with ML researchers and engineers to understand their workflows and translate their needs into robust infrastructure.
- Introduce best practices for code organization, version control, and reproducibility in ML experiments. Encourage modularity, reusability and portability.
Required Skills
- Strong software engineering / software architect level skills
- Experience designing and building infrastructure for ML training workflows
- Familiarity with performance profiling and optimization for ML training
- Excellence in Python, Linux scripting, and typical ML frameworks (e.g., PyTorch, TensorFlow)
- Experience with GPU management, distributed computing, and optimizing training pipelines
- Passion for turning messy, unstructured codebases into clean, scalable platforms
- Seeing the big picture in terms of code repo structure, job orchestration, task pipelining, and on-prem ML Ops for efficient resource usage
Preferred Skills
- Proficiency in C
- Experience with customization/design of ML experiment tracking tools (e.g., Weights & Biases, ClearML, etc.); creation / customization of web GUIs and dashboards or Mac OS apps
- Knowledge of database and storage solutions for ML datasets
- Experience managing on site linux servers, NAS arrays, with large scale datasets
- Knowledge of cloud computing (e.g. AWS, GCP, etc.) and containerization (Docker, Kubernetes, etc.)
- Knowledge in image restoration and image quality assessment
Location & Travel
We are primarily hiring for positions in our SF Bay Area, CA (primarily in-person) office but may consider other arrangements for outstanding circumstances.
Compensation & Benefits
- Competitive pay
- Stock options
- Health/Dental/Vision Insurance
- 401(k)
- Visa Sponsorship
About GLASS Imaging
Our mission is to bring professional-level image quality to everyone by making cutting-edge image processing accessible to all devices—from smartphones and XR devices to infrastructure maintenance and security applications. We believe that AI-driven processing can extract every ounce of image quality from any camera, making capturing better pictures with any camera easier for everyone.
But we aren't just enhancing how images are processed; we're revolutionizing how they're captured, redefining the core principles of camera design and reimagining how lenses, sensors, and AI-driven processing work together. We're fundamentally changing how cameras operate to unlock unprecedented levels of performance and image quality.
Founded by former Apple engineers behind Portrait Mode and other groundbreaking iPhone camera features, we're a team of passionate and experienced engineers pushing the boundaries of photography. Join us in shaping the future of camera technology.
Equal Opportunity & Diversity Statement
Glass Imaging is committed to fostering a diverse and inclusive workplace. We celebrate differences and do not discriminate based on race, ethnicity, gender, sexual orientation, age, disability, veteran status, or any other protected status. We encourage individuals from all backgrounds to apply.