What are the responsibilities and job description for the Machine Learning Specialist position at Suncap Technology?
We are seeking a highly skilled and experienced Senior AI/ML Engineer to lead the development of AI/ML-powered solutions and pilot the adoption AI/ML solutions on Oracle Cloud. This role will also be responsible for architecting an on-premises AI/ML environment, ensuring a robust and scalable MLOps pipeline, and integrating model outputs into business reporting tools such as Power BI, Oracle Analytics, or APIs.
Key Responsibilities:
AI/ML Infrastructure & Deployment
Architect and deploy an on-prem AI/ML environment, including GPU clusters and high-performance computing resources.
Collaborate with infrastructure teams to test and optimize networking storage, and compute resources for AI workloads.
Implement scalable storage solutions (e.g., distributed file systems, object storage) for efficient data handling.
Ensure system reliability, security, and performance through best practices in Linux system administration and resource scheduling.
Configure AI model training and inference environments, leveraging containerization (Docker, Kubernetes) and MLOps pipelines.
Design and implement MLOps processes to support efficient model training, validation, deployment, and monitoring.
Configure and set up ML Oracle Cloud from scratch, ensuring a scalable and production-ready infrastructure.
Collaborate with cross-functional teams to understand data requirements and integrate AI/ML solutions into existing enterprise systems.
Work with developers to integrate AI model outputs into business intelligence tools such as Power BI and Oracle Analytics.
Qualifications:
Master's or Ph.D. in Computer Science, Data Science, Machine Learning, or a related field.
Certifications in Oracle Data Science Platform preferred.
Required experience:
3 years of experience in AI/ML engineering with a focus on infrastructure, MLOps, and cloud AI deployment.
Experience configuring and setting up ML platforms on-premises or in Oracle Cloud from scratch.
Strong expertise in Linux-based AI/ML environments, including performance optimization, package management, shell scripting
Experience with HPC environments, GPU clusters (H100, A100, or similar), and distributed AI workloads.
Strong programming skills in Python and experience with AI/ML frameworks such as TensorFlow, PyTorch, or similar.
Hands-on experience with MLOps, including model training, validation, deployment, and monitoring.
Experience integrating AI/ML models into business intelligence tools (Power BI, Oracle Analytics, or APIs).
Experience with high-speed networking, storage solutions, and AI/ML system performance tuning.