What are the responsibilities and job description for the AI/ML Ops Engineer position at Covera Health?
About the role
We’re looking for an AI/ML Ops Engineer to strengthen and streamline our model training and deployment pipelines. This role is critical to our ability to deliver high-quality, reliable AI solutions by ensuring seamless collaboration between development, data science, and operations teams. You’ll be empowered to build and maintain end-to-end infrastructure, foster best practices, and continuously improve our systems and processes.
In this role, you will be expected to:
- Design, build, and maintain scalable cloud and colocated infrastructure on Azure, including GPU management for model training
- Create and manage robust pipelines (e.g., Ray, Ray Train, Kubernetes) to automate the training, testing, and deployment of AI models
- Utilize monitoring solutions (e.g., Grafana) and diagnose complex issues—including bugs and bottlenecks in data pipelines, GPU usage, and model training
- Work closely with engineering and data science teams to ensure seamless integration and deployment of applications and models
- Uphold infrastructure security standards, including identity and access management, data encryption, and vulnerability assessments
- Meticulously document processes, plans, and solutions using Jira, Confluence and GitHub, maintaining clear records for future reference
- Foster a culture of experimentation, innovation, and iterative learning across teams
Your profile:
- 3 years of proven experience in DevOps with exposure to AI/ML pipelines, including Docker, Kubernetes, PyTorch, distributed GPU management, and log aggregation
- Hands-on experience with cloud services; Azure experience preferred
- Proficiency in Linux system administration, networking, and application deployment; strong Python and Bash scripting skills
- Familiarity with SQL and NoSQL databases, and data management practices
- Experience with observability tools (e.g., Grafana) and a knack for diagnosing and resolving production/non-production issues swiftly
- Excellent problem-solving abilities, with strong teamwork, communication, and extraordinary dedication to documentation
- Comfortable working in Agile/Scrum environments, iterating quickly, and adapting to changing requirements
- Bachelor’s in Computer Science, Engineering, or related field—or equivalent experience
- Familiarity with Databricks, Postgres, MongoDB, Ray Train preferred
Benefits
You will be a full-time employee with a competitive salary, stock options, and great benefits. These benefits include medical, dental, and vision insurance, HRA, 401k, pre-tax commuter benefits, flexible paid time off, and a comfortable office space filled with various quality snacks and beverages. Most importantly, you’ll get to know each of us and we love to work together to find solutions. We are a talented, fun, focused, and unique team of people who are truly passionate about changing healthcare for the better!
The minimum and maximum salary for this position ranges from $115,000 to $135,000, in addition to a discretionary bonus and comprehensive benefits package. Final salary will be based on a number of factors including but not limited to, a candidate’s qualifications, skills, competencies, experience, expertise and location.
At Covera Health, we strive to build diverse teams that reflect the people we want to empower through our technology. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, or Veteran status. Equal Opportunity is the Law, and Covera Health is proud to be an equal-opportunity workplace and affirmative action employer. If you have a specific need that requires accommodation, please let a member of the People Team know.
Salary : $115,000 - $135,000