What are the responsibilities and job description for the Cloud Infra Control Plane Service Engineering Architect position at VISION INFOTECH INC.?
Job Details
Hi
Hope you're doing well!!
Please find the requirement below. If you find yourself comfortable with the requirement please reply back with your updated resume or call me back at
Position: Cloud Infra Control Plane Service Engineering Architect
Location: Remote
Duration: 6 Months
Client won't be able to sponsor any Visa
Note : All candidates should expect to work 3pm pst thru 9pm pst at least 2 days a week as this is when calls exist with the team in South Korea.
Key Responsibilities:
Roles and Responsibilities:
Infrastructure Management:
- Manage and monitor computer clusters, ensuring high availability and performance.
- Implement and maintain automation scripts for infrastructure provisioning and management.
Design and Implementation:
- Design, implement, and maintain computer services for both GPU and non-GPU environments.
- Develop and optimize algorithms for high-performance computing tasks, especially in the AI/ML Training and Inference domain.
Performance Optimization:
- Analyze and optimize the performance of compute workloads.
- Implement best practices for resource utilization and efficiency.
Collaboration:
- Work closely with data scientists, researchers, and other engineering teams to understand and meet their compute requirements.
- Collaborate with hardware vendors to evaluate and integrate new technologies.
Security and Compliance:
- Ensure that compute services comply with security policies and industry standards.
- Implement and maintain security measures to protect data and compute resources.
Troubleshooting and Support:
- Provide support for compute-related issues, including debugging and resolving hardware and software problems.
- Develop and maintain documentation for troubleshooting procedures and best practices.
Continuous Improvement:
- Stay updated with the latest advancements in compute technologies and integrate them into the infrastructure.
- Continuously improve the reliability, scalability, and performance of compute services.
Qualifications:
Education:
- Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
- NVIDIA and AI Certification
Experience:
- Years of experience managing on-premise GPU or non GPU systems
- Proven experience in managing and optimizing GPU and non-GPU computer environments.
- AI Infra Engineering building and operating skills
- Experience with high-performance computing (HPC) and parallel processing including Baremetel, large scale virtual environments.
- Implement virtualization architectures, leveraging expertise with Kubernetes distributions like OpenShift or Rancher, and cloud technologies on bare metal environments.
- Proficiency in hardware technologies such as SR-IOV, DPU, and GPU, with proven experience in implementing these technologies in virtualized and containerized environments. Technical Skills:
- Proficiency in programming languages such as Python, C , or similar.
- Experience with infrastructure as code (IaC) tools like Terraform, Ansible, or similar.
- Familiarity with containerization and orchestration tools like Docker and Kubernetes.
- Familiarity with Kubernetes underlying technologies with CRI, CSI, CNI, Operators, GPU device plugin, RMDA/InfiniBand integration
- Knowledge of cloud platforms (AWS, Azure, Google Cloud Platform) and their compute services.
Soft Skills:
- Strong problem-solving skills and attention to detail.
- Excellent communication and collaboration skills.
- Ability to work in a fast-paced, dynamic environment.
Thanks & Regards:
Amar Pratap
Senior Technical Recruiter
VISION INFOTECH INC
Phone: ext 531
Direct:
Email:
368 Main Street, st #3, Melrose MA 02176
E-Verified Company