What are the responsibilities and job description for the DevOps with MLOps position at TalentTank Recruiting Inc.?
DevOps Engineer with MLOps
The DevOps Engineer is a pivotal individual contributor role focused on building and maintaining scalable infrastructure for deploying, monitoring, and managing machine learning models. This role requires deep expertise in modern DevOps practices and some MLOps experience to ensure the seamless integration and optimization of machine learning workflows. The ideal candidate has hands-on experience with tools such as Kubernetes, Argo, Helm, and Terraform and is skilled in building custom Kubernetes operators to enhance MLOps capabilities.
Key Responsibilities
DevOps and Infrastructure Management
-
Design and maintain scalable, reliable cloud and on-premises infrastructure for ML workflows using Kubernetes, Terraform, and Helm.
-
Build and manage custom Kubernetes operators to automate and streamline MLOps processes.
-
Implement CI/CD pipelines for deploying machine learning models using tools such as Argo and Jenkins.
-
Optimize infrastructure for high availability, scalability, and cost-effectiveness in ML environments.
ML Workflow Design and Automation
-
Develop and automate end-to-end ML pipelines, including model training, testing, deployment, and monitoring.
-
Standardize and document workflows to ensure reproducibility and scalability across teams.
-
Collaborate with data scientists to integrate models seamlessly into production systems.
Model Deployment and Optimization
-
Deploy ML models in Kubernetes environments, ensuring robust performance and reliability.
-
Monitor and troubleshoot deployed models to minimize downtime and improve accuracy.
-
Optimize resource usage for small language models, balancing computational efficiency with performance.
Data Integrity and Pipeline Optimization
-
Ensure data quality and integrity by implementing validation and monitoring mechanisms throughout the ML lifecycle.
-
Develop efficient pipelines for feature extraction, training, and inference tailored to specific use cases.
-
Design solutions for handling unstructured and dynamic data sources in production environments.
Collaboration and Communication
-
Act as a bridge between DevOps, MLOps, and machine learning teams to align on infrastructure and deployment strategies.
-
Translate complex technical requirements into actionable workflows and documentation for diverse teams.
-
Provide guidance and mentorship to engineering teams on best practices for DevOps and MLOps.
Monitoring and Governance
-
Implement real-time monitoring systems to track model performance, detect drift, and manage SLAs.
-
Develop and enforce governance frameworks to ensure ethical, transparent, and compliant use of AI/ML systems.
-
Safeguard data privacy and security in accordance with industry standards and regulations.
Innovation and Continuous Improvement
-
Stay current with the latest advancements in DevOps and MLOps tools and methodologies, particularly in Kubernetes and cloud-native ecosystems.
-
Experiment with new technologies to improve operational efficiency and reduce costs.
-
Participate in industry events to share knowledge and learn from peers.
Qualifications
Education and Experience
-
Bachelor’s, Master’s, or Ph.D. in Computer Science, Engineering, or a related field.
-
Extensive experience in DevOps, with exposure to MLOps and machine learning workflows.
-
Expertise in tools like Kubernetes, Argo, Helm, and Terraform.
-
Proven experience building and managing custom Kubernetes operators for automation and orchestration.
-
Certifications from Azure, GCP and/or AWS
Skills and Competencies
-
Strong understanding of CI/CD pipelines and best practices for DevOps and MLOps.
-
Proficiency in containerization technologies such as Docker and orchestration frameworks like Kubernetes.
-
Hands-on experience with multiple cloud platforms such as AWS, GCP, or Azure.
-
Familiarity with machine learning frameworks like TensorFlow, PyTorch, or Scikit-learn.
-
Strong analytical and problem-solving skills with a focus on system optimization.
-
Excellent communication and collaboration skills to work effectively with cross-functional teams.