What are the responsibilities and job description for the Support / SRE Lead – Digital Platforms (Web & Mobile) position at Optimize Search Group?
Job Details
Job Title: Support / Site Reliability Engineering (SRE) Lead – Digital Platforms (Web & Mobile)
Location: Irving, TX (Hybrid Monday - Wednesday On-Site)
Duration: Contract with option to hire
Position Summary:
We’re looking for a dynamic and experienced Support / SRE Lead to oversee the stability, performance, and operational excellence of our digital platforms—spanning both web and mobile applications. This role is ideal for a hands-on leader with a deep technical foundation, a passion for reliable systems, and a track record of building high-performing support teams. You’ll collaborate across engineering, product, and operations to ensure exceptional user experiences and robust, scalable platforms.
Key Responsibilities:
-
Lead day-to-day operations for support and reliability across web and mobile platforms.
-
Serve as the senior escalation point for major incidents, driving rapid and effective resolution.
-
Monitor application health, performance, and availability using modern observability practices.
-
Partner with development, QA, and product teams to support seamless deployments and feature rollouts.
-
Define and enforce best-in-class support practices, including incident response, problem management, and post-incident reviews.
-
Establish, track, and improve Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
-
Design and implement scalable monitoring, alerting, and logging frameworks.
-
Develop and maintain automation scripts, CI/CD pipelines, and recovery tools to support system resilience.
-
Maintain a centralized knowledge base for troubleshooting, playbooks, and platform documentation.
-
Drive continuous improvement by analyzing recurring issues and implementing long-term solutions.
-
Collaborate with stakeholders to understand user needs and improve platform usability.
-
Stay up-to-date on technology trends and introduce innovative solutions to enhance system reliability.
Qualifications:
-
8 years of experience in application support, technical operations, or site reliability engineering.
-
Minimum 3 years in a leadership role managing support or SRE teams.
-
Deep understanding of web/mobile app architectures, APIs, cloud services (AWS, Azure, or GCP), and databases.
-
Strong experience with incident response, root cause analysis, and ITIL processes.
-
Proficiency with monitoring and alerting tools (e.g., Datadog, Cloudflare), and log analysis.
-
Hands-on experience with ticketing platforms like Azure DevOps, ServiceNow, or Freshservice.
-
Solid grasp of CI/CD pipelines, DevOps practices, and automation tooling.
-
Excellent communication and stakeholder management skills.
-
Proven ability to lead and develop teams in high-pressure, fast-paced environments.
-
Experience working with cross-functional engineering and product teams.
Nice to Have:
-
Industry experience in e-commerce, fintech, healthcare, or media.
-
Exposure to mobile frameworks (Flutter, React Native, Kotlin, Swift).
-
Experience with CMS platforms (WordPress, Drupal, Crownpeak, AEM).
-
Familiarity with front-end frameworks (JavaScript/TypeScript, React).
-
Relevant certifications (e.g., ITIL, AWS, Azure, SRE).