Demo

Lead Site Reliability Engineer

CEI
Pittsburgh, PA Contractor
POSTED ON 2/1/2025
AVAILABLE BEFORE 3/2/2025

One of CEI's largest Financial Services & Banking clients is seeking a Lead SRE to join their growing organization!


Client/Industry: Financial Services & Banking

Job Title: Lead Site Reliability Engineer

Location: Hybrid - 3 Days On-Site / 2 Days Remote | Pittsburgh, PA 15222 ; Cleveland, OH 44136 ; Dallas, TX 75234 ; Birmingham, AL 35233 ; or Phoenix, AZ 85016

Work Schedule/Shift:

Shift 1 (Priority): Saturday & Sunday: 7pm – 7am (EST) / Tuesday & Thursday: 8pm – 5am (EST)

Shift 2: M-F : 11pm – 7am (EST)

Duration/Length of Assignment: 5 Month Contract to Hire


*Must be able to convert to a full-time employee without sponsorship, restrictions, or an additional employer*

  • W2 Employment Only – No Corp to Corp / C2C arrangements.
  • Expected potential for contract extension(s) and/or conversion to Full-Time/Permanent Employment.
  • Optional benefits available during contract (Medical, Dental, Vision, and 401k)


Position Overview:

This role is a critical leadership position within the Site Reliability Center (SRC), responsible for overseeing a team of global contractors supporting enterprise technology and security applications. The SRC Lead will be focused on maintaining system reliability, availability, and performance through proactive monitoring, troubleshooting, and escalation. The team plays a key role in ensuring system health, reducing downtime, and optimizing performance across multiple business applications. Reporting to senior leadership, the SRC Lead will work closely with application support teams, internal stakeholders, and global engineers to coordinate efforts and resolve critical system issues. The primary function of this role is production support, requiring a highly technical approach to troubleshooting system issues, driving incident resolution, and implementing process improvements. The SRC Lead is expected to take ownership of escalated problems, guide discussions with key stakeholders, and update documentation for continued operational success. With 185 combined applications and platforms in scope, the SRC Lead will develop expertise in high-priority critical systems and drive technical conversations to resolution. This position requires strong analytical skills, leadership in a high-paced production environment, and the ability to effectively coordinate across global teams.


Required Skills/Experience/Qualifications:

  • Bachelor's degree in Computer Science, Information Technology, or related field
  • 8 years of experience in Site Reliability Engineering (SRE), DevOps, or technical production support
  • Strong background in monitoring and debugging tools, including LogScale, Splunk, and Dynatrace
  • Hands-on experience with DevOps pipelines using Git, Jenkins, and Artifactory
  • Proficiency in Red Hat Linux, Openshift, and Windows infrastructure
  • Strong understanding of networking concepts, including DNS, load balancing, network tracing, and firewalls
  • Experience working with relational databases such as Oracle and SQL
  • Ability to troubleshoot and support APIs and web services technologies, including SOAP, JSON, and REST
  • Familiarity with directory services, including LDAP and Active Directory
  • Proficiency in Java for troubleshooting and debugging system issues
  • Experience in operational incident management, root cause analysis, and production system monitoring
  • Ability to drive problem resolution, manage impact assessments, and escalate issues appropriately
  • Strong leadership and mentorship skills to guide global engineering teams


Preferred Skills (Not Required):

  • Experience with Python/Java scripting, Ansible, and PowerShell for automation
  • Knowledge of modern development tools and methodologies, including Agile, CI/CD, Git, and Jenkins
  • Experience with Kafka event streaming and ETL tools like Informatica
  • Familiarity with NoSQL databases such as MongoDB and Cassandra
  • Experience with Evolven for system analysis
  • Prior experience in a 24x7 production support environment


Day to Day/Responsibilities:

  • Monitor system health and analyze metrics using LogScale, Splunk, and Dynatrace to proactively identify issues and potential system failures
  • Lead troubleshooting efforts on production issues, coordinating with DevOps teams and escalating when necessary to system SMEs
  • Maintain and support DevOps pipelines using Git, Jenkins, and Artifactory, ensuring reliability of automated deployments
  • Troubleshoot and resolve infrastructure-related issues across Red Hat Linux, Openshift, and Windows environments
  • Analyze and resolve network-related issues including DNS, load balancing, network tracing, and firewall configurations
  • Support database operations by managing Oracle and SQL databases, optimizing performance, and identifying system inefficiencies
  • Lead system impact assessments and provide technical guidance on API integrations, SOAP/REST web services, and JSON data handling
  • Ensure compliance with LDAP and Active Directory configurations for authentication and access control
  • Participate in incident and problem management, identifying recurring issues and implementing long-term fixes
  • Update and maintain runbooks and operational documentation, ensuring clear guidelines for handling recurring incidents
  • Act as the escalation point for global L1.5 engineers, providing technical mentorship and driving a culture of continuous learning
  • Collaborate with cross-functional teams, stakeholders, and internal/external business partners to resolve technical challenges
  • Ensure timely escalation of critical issues and provide post-incident analysis for process improvements
  • Work closely with leadership to reduce Level 3 escalations by improving knowledge-sharing and process automation
  • Provide operational support for large-scale distributed applications, ensuring high availability and reliability
  • Communicate with senior leadership, including directors and CIOs, to report on system status, major incidents, and process improvements
  • Identify and implement automation solutions using Python, Java, Ansible, or PowerShell to improve efficiency and reduce manual tasks
  • Drive performance optimization efforts by analyzing system bottlenecks and recommending improvements for stability and uptime

Salary : $75 - $90

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Lead Site Reliability Engineer?

Sign up to receive alerts about other jobs on the Lead Site Reliability Engineer career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$83,086 - $106,052
Income Estimation: 
$83,298 - $131,726
Income Estimation: 
$101,020 - $131,637
Income Estimation: 
$101,020 - $131,637
Income Estimation: 
$95,435 - $126,957
Income Estimation: 
$130,171 - $173,458
Income Estimation: 
$95,435 - $126,957
Income Estimation: 
$118,925 - $156,720
Income Estimation: 
$130,171 - $173,458
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at CEI

CEI
Hired Organization Address Richmond, VA Contractor
Customer Account Management Representative – Hybrid (Richmond, VA) Pay: $19.50/hr | Contract-to-Hire We are seeking Cust...
CEI
Hired Organization Address Pittsburgh, PA Contractor
QA Manual Tester – Hybrid | Pittsburgh, PA Job Details: Pay Rate: $42 - $45/hr Employment Type: Contract-to-Hire (CTH po...
CEI
Hired Organization Address Orlando, FL Contractor
Summary: Join an innovative team as a Controls Software Engineer working on cutting-edge solutions for ride and show mot...
CEI
Hired Organization Address Cincinnati, OH Contractor
*CINCINNATI LOCALS ONLY. NO RELOCATION CANDIDATES.* Job at a Glance The role focuses on maintaining and enhancing busine...

Not the job you're looking for? Here are some other Lead Site Reliability Engineer jobs in the Pittsburgh, PA area that may be a better fit.

Application Support Analyst- Site Reliability Engineer

BNY External Career Site, Pittsburgh, PA

AI Assistant is available now!

Feel free to start your new journey!