Hello Everyone,
Hope you are doing good!
My name is Pavan and I work with SPAR Information System. I have a great opportunity for you, please find the job details below. If you are interested in applying, please send me your updated resume and the best time for you to discuss this opportunity in detail.
Senior Production Support Lead
Location : Atlanta, GA / Frisco, Texas
If you think you are the right match for the following opportunity, apply after reading the complete description.
Duration : Long term contract
Onsite Requirement : Prefer onsite only
Number of days onsite : 3 days
Must Have Skills :
- 5 years of experience in support, production support, or system administration in a complex environment, with at least 2 years in a leadership or supervisory role.
- 6 years of experience with ITIL processes, particularly incident management, change management, and problem management.
- 6 years of experience with monitoring tools (e.g., Splunk, AppDynamics, New Relic, or similar) and experience in log analysis for troubleshooting.
- 6 years of scripting skills (e.g., Python, Shell scripting) for automating routine tasks and improving operational efficiency.
- 6 years of experience with database systems (SQL, Oracle, etc.) and experience with database troubleshooting in a production environment.
- Familiarity with cloud platforms (AWS, Azure, Google Cloud Platform) and containerization technologies (Docker, Kubernetes) is a plus.
Job Summary :
We are seeking a Senior Production Support Lead to oversee the daily operations of our production environments and ensure the smooth and efficient functionality of critical applications and systems. The ideal candidate will have strong technical troubleshooting skills, a solid understanding of production environments, and a proven ability to lead a team in resolving complex production issues quickly and effectively.
Key Responsibilities :
Lead Production Support Operations : Manage and lead the production support team to ensure the stability and availability of critical production systems. Oversee monitoring, incident management, and resolution of production issues.Incident and Problem Management : Coordinate and lead efforts to resolve production incidents promptly, ensuring minimal business impact. Manage the root cause analysis (RCA) process for recurring issues and work towards preventive solutions.System Monitoring and Optimization : Continuously monitor system performance, identify potential bottlenecks or issues, and take proactive measures to improve system performance and reliability.Escalation Handling : Serve as the point of escalation for complex production issues, providing guidance and expertise in troubleshooting and resolution.Collaboration with Development and Infrastructure Teams : Work closely with the development, QA, and infrastructure teams to ensure smooth production deployments, patch management, and post-deployment monitoring.SLA Adherence : Ensure that SLAs are met for all production issues, including response and resolution times. Track and report on SLA performance metrics regularly.Team Leadership : Provide mentorship and guidance to junior team members, conduct regular team meetings, and facilitate knowledge-sharing sessions to build a high-performing support team.Documentation & Knowledge Management : Maintain up-to-date knowledge base articles, troubleshooting guides, and standard operating procedures (SOPs). Ensure proper documentation of all incidents, changes, and resolutions.Change Management : Assist in managing changes in production environments by ensuring thorough testing and validation of changes, and providing post-implementation support.Continuous Improvement : Drive continuous improvement initiatives within the production support process. Identify opportunities to automate repetitive tasks, enhance system reliability, and optimize operational workflows.Qualifications & Skills :
Bachelor's Degree in Computer Science, Information Technology, Engineering, or a related field (or equivalent experience).5 years of experience in IT support, production support, or system administration in a complex environment, with at least 2 years in a leadership or supervisory role.Strong technical troubleshooting skills in areas such as application monitoring, databases, network, and server infrastructure.Experience with ITIL processes, particularly incident management, change management, and problem management.Proficiency with monitoring tools (e.g., Splunk, AppDynamics, New Relic, or similar) and experience in log analysis for troubleshooting.Scripting skills (e.g., Python, Shell scripting) for automating routine tasks and improving operational efficiency.Strong understanding of database systems (SQL, Oracle, etc.) and experience with database troubleshooting in a production environment.Familiarity with cloud platforms (AWS, Azure, Google Cloud Platform) and containerization technologies (Docker, Kubernetes) is a plus.Strong communication skills, with the ability to explain technical issues to non-technical stakeholders and produce clear incident reports.Leadership and Team Management : Proven ability to manage and lead teams effectively.J-18808-Ljbffr