What are the responsibilities and job description for the AI Operations Engineer - ITIL Process position at TMS?
Location : Jersey City, NJ and / or Charlotte, NC (Onsite 1-2 trips per month, remote work allowed)
We're seeking a technical expert with a strong background in AI operations, ITIL processes, and cloud-based technologies to support the launch of new capabilities for a large Financial Services provider. The successful candidate will have a deep understanding of GenAI solutions, machine learning, and data engineering. This role will focus on deploying, operating, maintaining, optimizing, and managing inference services that support our Auto Recommend and Enhanced Search solution.
Key Responsibilities :
Provide Level 3 support for incident management, including issue identification, diagnosis, escalation, resolution, and coordination with key stakeholders and providers
Perform vulnerability management, including risk assessment, CVE scanning, patching, and remediation
Integrate and operate monitoring and alerting systems
Tune and troubleshoot model performance
Manage container image deployment and development
Develop and operate end-to-end deployment processes for model and code deployment
Collaborate with cross-functional teams to ensure smooth operation of AI solutions
Requirements :
15 years of enterprise consulting experience with a focus on data, machine learning, and GenAI solutions
Proficiency in designing and delivering solutions that leverage GenAI technologies (e.g., LLMs, Foundation Models)
Deep familiarity with relevant concepts and models / technologies (e.g., transformer models, prompt engineering, model fine-tuning)
Experience delivering and scaling complex infrastructural solutions across diverse platforms
Strong knowledge of vLLM, OpenShift AI, Prometheus, Grafana, Aqua, and automation of deployment and execution of pipelines
Proficient in Python and SQL, with experience in Apache Spark, Apache Hadoop, Informatica, and similar data processing tools
Proven experience with building test procedures and ensuring data pipeline quality, reliability, performance, and scalability
Strong communication and customer-facing skills
Ability to work efficiently in collaborative teams using Agile methodologies
University Degree aligned to Data Engineering and / or Data Science
Relevant industry certifications (e.g., Databricks Certified Data Engineer, Microsoft Certifications, NVIDIA Certifications)