What are the responsibilities and job description for the MLOps L2 Support Engineer (Production Support)-Reading, PA -Only Locals position at mProgen?
Job Details
Key Responsibilities:
Incident Management & Support:
- Provide L2 support for MLOps production environments, ensuring uptime and reliability.
- Troubleshoot ML pipelines, data processing jobs, and API issues.
- Monitor logs, alerts, and performance metrics using Dataiku, Prometheus, Grafana, or AWS tools such CloudWatch.
- Perform root cause analysis (RCA) and resolve incidents within SLAs.
- Escalate unresolved issues to L3 engineering teams when needed.
Required Skills & Experience:
Experience: 5 years in MLOps, Data Engineering, or Production Support.
Dataiku DSS: Strong experience in Dataiku workflows, scenarios, plugins, and APIs.
Cloud Platforms: Hands-on experience with AWS ML services (SageMaker, Lambda, S3, RDS, ECS, IAM).
CI/CD & Automation: Familiarity with GitHub Actions, Jenkins, or Terraform.
Scripting & Debugging: Proficiency in Python, Bash, SQL for automation & debugging.
Monitoring & Logging: Experience with Prometheus, Grafana, CloudWatch, or ELK Stack.
Incident Response: Ability to handle on-call support, weekend shifts, and SLA-based issue resolution.