What are the responsibilities and job description for the AWS DevOps Engineer---Hybrid role in Reston, VA position at Prime Consulting LLC.?
Job Details
AWS DevOps Engineer
Reston, Virginia
6-12 Months contract
Job Requirements
- 1. Kubernetes Cluster Management:
Design, deploy, and maintain Kubernetes clusters in production environments.
Ensure high availability, scalability, and security of Kubernetes infrastructure.
Implement monitoring, logging, and alerting solutions for cluster health and performance.
- COTS Product Integration:
Integrate commercial off-the-shelf (COTS) software with Kubernetes clusters.
Manage and troubleshoot issues arising from COTS product deployments within the Kubernetes ecosystem.
- ML Workload Orchestration:
Deploy and manage interactive and batch-based machine learning container workflows
Integrate and optimize SageMaker container images for training and inference tasks.
Monitor resource usage and optimize ML workloads for cost and performance efficiency.
- Infrastructure as Code (IaC):
Develop and maintain infrastructure automation using Terraform.
Write and maintain Terraform modules for Kubernetes, cloud infrastructure, and CI/CD pipelines.
Enforce best practices for IaC, including version control, modularity, and code reviews.
- CI/CD Pipelines and Version Control:
Create and manage CI/CD pipelines for deploying applications and Kubernetes manifests using GitLab.
Ensure automation and testing in the software delivery process.
Maintain version control practices for all infrastructure and application code.
- Collaboration and Documentation:
Work closely with ML engineers, data scientists, and software developers to meet workload requirements.
Document processes, configurations, and troubleshooting guides.
Conduct knowledge-sharing sessions and training for team members.
Key Skills and Qualifications
- Technical Expertise: Proficiency in Kubernetes (CKA or CKAD certification is a plus). Strong experience with Terraform for IaC. Familiarity with SageMaker, Docker, and containerized workflows. Solid understanding of GitLab CI/CD and version control principles.
- Cloud and Networking: Hands-on experience with major cloud providers (AWS, Azure). Knowledge of Kubernetes networking, ingress controllers, and service meshes. Experience with cloud-native tools like Helm, Prometheus, Grafana, and Fluentd.
- Machine Learning Workflow Support: Understanding of ML model deployment, training, and inference workflows. Knowledge of integrating SageMaker container images with Kubernetes.
- Security: Knowledge of Kubernetes RBAC, secrets management, and pod security policies. Experience with scanning tools for containers and IaC.
- Collaboration and Communication
- Additional Skills: Familiarity with other IaC tools (e.g., Pulumi, Ansible). Experience with advanced Kubernetes features like Operators and CRDs.
This role is a blend of DevOps, MLOps, and infrastructure engineering, making it essential for the candidate to have both technical breadth and domain-specific expertise.