What are the responsibilities and job description for the Head of AI Operations position at Global Payments Inc.?
Description
This senior AI leadership role is pivotal in delivering strategic Generative AI (GenAI) and Machine Learning (ML) initiatives that will transform Global Payments. You will be responsible for ensuring the reliability, scalability, and performance of our production AI systems and services. This role oversees a dedicated team of SREs and Support Engineers responsible for monitoring, incident response, and the stability of AI services in production. You will play a critical role in ensuring our GenAI workloads- from foundational models to fully integrated inference pipelines- run reliably and at scale across cloud and hybrid environments. The ideal candidate will have a robust background in AI/ML platform management, software engineering, and a proven track record of leading technical teams in a fast-paced environment. You will need to foster a culture of innovation and ensure ongoing alignment of initiatives with evolving business strategies, at a global level.
Responsibilities
Global Payments offers a comprehensive benefits package to all of our team members, including medical, dental and vision care, EAP programs, paid time off, recognition programs, retirement and investment options, charitable gift matching programs, and worldwide days of service. To learn more, review our Benefits page at: https://jobs.globalpayments.com/en/why-global-payments/benefits/
This position is eligible to be considered for remote hiring anywhere in the USA.
This senior AI leadership role is pivotal in delivering strategic Generative AI (GenAI) and Machine Learning (ML) initiatives that will transform Global Payments. You will be responsible for ensuring the reliability, scalability, and performance of our production AI systems and services. This role oversees a dedicated team of SREs and Support Engineers responsible for monitoring, incident response, and the stability of AI services in production. You will play a critical role in ensuring our GenAI workloads- from foundational models to fully integrated inference pipelines- run reliably and at scale across cloud and hybrid environments. The ideal candidate will have a robust background in AI/ML platform management, software engineering, and a proven track record of leading technical teams in a fast-paced environment. You will need to foster a culture of innovation and ensure ongoing alignment of initiatives with evolving business strategies, at a global level.
Responsibilities
- Lead the SRE and Support function for AI production systems, including LLM inference services and monitoring, vector databases, orchestration platforms and AI agent frameworks.
- Ensure high availability, low latency performance, and secure operation of GenAI PAIs and applications.
- Build and scale observability frameworks to monitor model drift, hallucination, bias, performance degradation, latency spikes, and bottlenecks.
- Define and enforce SLAs, SLOs, and error tolerance tailored to AI/ML workloads, covering batch, realtime, and on-demand use cases.
- Lead incident management and root cause analysis across AI pipelines, including model serving, feature stores, and data flows.
- Partner with the AI Engineering, MLOps and Platform teams to ensure reliability is baked right into every stage of AI development and deployment.
- Work closely with the Platform teams to implement and support auto-scaling, failover, and self-healing strategies for AI workloads in multi-cloud and hybrid environments.
- Develop and manage on-call strategies, escalation procedures, and global support rotations for critical AI services.
- Passionate about customer success with what your teams build. Take care to measure and monitor, that what your teams build is used, and useful to driving business outcomes.
- Ensure compliance with industry standards and best practices in AI solutions monitoring, including security protocols and data governance policies.
- Stay abreast of emerging technologies and trends in GenAI and Machine Learning to drive continuous improvement and innovation.
- Inspire and motivate your team, and foster a positive and productive work environment consistent with Global Payment’s values
- Bachelor’s or Master's degree in Computer Science, Math, AI, or a related area.
- Strong command of AWS and GCP with experience managing AI workloads.
- Hands-on experience with AWS SageMaker, AWS Bedrock, Google VertexAI, and Snowflake Cortex.
- At least 10 years of experience in software and support engineering, for enterprise-grade cloud based AI systems.
- Deep knowledge of LLMs, inference pipelines, vector databases, RAG and agentic architectures.
- Expertise in designing and running reliable, scalable, and observable production systems.
- Proven ability to lead high-severity incident response and drive root cause analysis postmortems.
- Hands-on experience with observability platforms (e.g. Fiddler AI, Arize, Weights & Biases, etc)
- Deep understanding of containerization and designing ephemeral solutions.
- Ability to define, monitor, and enforce service-level objectives tailored to GenAI workloads.
- Expert on industry trends and various LLMs. This should include commercial Foundational Models from OpenAI, Anthropic, Cohere, Google, as well as open-source models available in those platforms including Mistral, Llama, etc.
- Passionate engineering leader with experience building high performance teams.
- Proficiency in stakeholder management to effectively communicate and manage expectations of those linked to the work outside your team
- Proficiency in project management and resource allocation to ensure timely, efficient and successful delivery of outcomes
- Experience in strategic planning and execution with strong decision-making skills to align initiatives with business goals and make informed choices that benefit the organization
- Some experience in handling compliance and regulatory requirements to ensure engineering practices adhere to relevant laws and regulations
- Familiarity with MLOps workflows, data versioning and model lifecycle management
- Familiarity with Machine Learning model development
- Knowledge in Salesforce AI offerings
- Ability to work proactively with a high level of initiative and accuracy.
- Ability to manage multiple assignments effectively and meet established deadlines.
- Strong interpersonal skills to interact professionally with staff and stakeholders.
- Excellent organizational skills and attention to detail.
- Critical thinking ability ranging from moderately to highly complex tasks.
- Flexibility in adapting to changing business needs and priorities.
- Ability to work creatively and independently with minimal supervision.
- Ability to utilize experience and judgment in accomplishing goals.
- Experience in navigating organizational structures and collaborating across teams.
Global Payments offers a comprehensive benefits package to all of our team members, including medical, dental and vision care, EAP programs, paid time off, recognition programs, retirement and investment options, charitable gift matching programs, and worldwide days of service. To learn more, review our Benefits page at: https://jobs.globalpayments.com/en/why-global-payments/benefits/
This position is eligible to be considered for remote hiring anywhere in the USA.