What are the responsibilities and job description for the DevOps Engineer - Distributed AI Systems position at Covariant?
We're Covariant, a company that's redefining the possibilities of AI-powered robotics. As a Production Engineer, you'll join our team of experts who are passionate about building cutting-edge technology that gives robots the ability to see, reason, and act on the world around them.
About the Role
We're seeking an experienced engineer to join our production team, where you'll design, build, and manage the infrastructure that powers our innovative AI robotics solutions. Your expertise in cloud computing, containerization, and automation will be essential in architecting scalable and resilient systems that meet the demands of our products.
Your Key Responsibilities
- Collaborate with brilliant researchers to evolve our training and inference tooling to be state-of-the-art
- Help other teammates architect and build scalable tooling for our edge robot fleet
- Own and orchestrate large GPU clusters across different cloud providers using IaC and scripts
What You'll Need
To succeed in this role, you'll need a strong background in software engineering and operations. You should also have a solid foundation in Python, Linux, and networking, as well as a commitment to continuous learning and a willingness to pick up new languages or technologies as needed.