What are the responsibilities and job description for the API Enablement Site Reliability Engineer Senior Staff Engineer position at Software Guidance & Assistance?
Job Details
Software Guidance & Assistance, Inc., (SGA), is searching for an API Enablement Site Reliability Engineer Senior Staff Engineer for a CONTRACT assignment with one of our premier Insurance clients. This position is hybrid out of any of the following locations: Hartford, CT., Charlotte, NC., Chicago, IL. or Columbus, OH.
Seeking a highly skilled and experienced API Enablement SRE Senior Staff Engineer to join our Team. The ideal candidate will have a strong background in managing and optimizing complex systems, ensuring their reliability, scalability, and performance. This role focuses on enhancing our API Management Platforms and integrating SRE best practices.
Responsibilities:
SGA is an Equal Opportunity Employer and does not discriminate on the basis of Race, Color, Sex, Sexual Orientation, Gender Identity, Religion, National Origin, Disability, Veteran Status, Age, Marital Status, Pregnancy, Genetic Information, or Other Legally Protected Status. We are committed to providing access, equal opportunity, and reasonable accommodation for individuals with disabilities in employment, and our services, programs, and activities. Please visit our company to request an accommodation or assistance regarding our policy.
Seeking a highly skilled and experienced API Enablement SRE Senior Staff Engineer to join our Team. The ideal candidate will have a strong background in managing and optimizing complex systems, ensuring their reliability, scalability, and performance. This role focuses on enhancing our API Management Platforms and integrating SRE best practices.
Responsibilities:
- API Platform and Enablement Team:
o Design, implement, and maintain reliable and scalable SRE practices for API Management Platforms.
o Strong knowledge and experience in API solutions, platforms, API delivery, and API management.
o Strengthen the maturity of SRE practices by building on and executing improvements to observability, resiliency, and stability.
o Assess ecosystem changes to determine risk, impact, and checkout needs for API and Integration Platforms.
o Proactively consider SRE improvements and GenAI opportunities, create solutions, and successfully execute them.
o Create self-service capabilities to enable API provider teams to easily integrate with SRE API best practices. - Incident Management and On-Call Rotation:
o Lead incident management, structured triage, and analysis, including the creation and management of incident runbooks.
o Participate in on-call rotations for incidents and changes, including evenings and weekends.
o Conduct problem analysis, remediation, and continuous improvement to enhance system reliability. - Views, Dashboards, and Unified Views:
o Implement and maintain observability and monitoring solutions, including Splunk and Dynatrace.
o Create unified views, dashboards, and visualizations to provide a single pane of glass and information radiators for system health and performance.
o Create unified views that can be shared across stakeholders to quickly align on the issue root cause. - Resiliency and Strengthening SRE Maturity:
o Design, implement, and maintain reliable and scalable systems and infrastructure.
o Lead the team in SRE and proactive risk mitigation, including resiliency and disaster recovery exercises, change management, and upgrades and patches.
o Level up SRE maturity and demonstrate it through the achievement of KPIs and operational metrics. - Performance and Automation:
o Monitor and optimize the performance, availability, and reliability of systems and applications.
o Develop and maintain automation tools and scripts to streamline operations and improve efficiency. - Risk Management and Metrics:
o Define, operationalize, and integrate SRE-related KPIs, metrics, and ideas into day-to-day activities.
o Proactively manage risks, including assessment of findings, planning remediation, and executing to bring prompt closure to resolve risks.
- Strong knowledge and experience in API solutions, platforms, API delivery, and API management.
- Knowledge and skills in API Platforms (e.g., API Connect, Apigee, AWS API Gateway) and API Management.
- 5 years of experience in site reliability engineering or a related field.
- Expertise in SRE best practices, including incident management, resiliency, monitoring, detection, diagnosis, remediation, and prevention.
- Demonstrated experience in being on call and resolving incidents, including incident management and root cause analysis.
- Experience with large-scale distributed systems.
- Knowledge of CI/CD pipelines and DevOps practices.
- Experience with cloud platforms (e.g., AWS, Azure, Google Cloud Platform).
- Strong knowledge of system design, development, and management.
- Full stack software engineering skill set, including front-end, back-end, and database development.
- Proficiency in scripting languages (e.g., Python, Bash) and automation tools (e.g., Ansible, Terraform).
- Familiarity with monitoring and observability tools (e.g., Splunk, Dynatrace).
- Demonstrated ability to mature SRE practices and strengthen stability through proven KPIs and metrics.
- Excellent documentation, communication, problem-solving, and collaboration skills.
- Experience with GenAI and innovation, and a commitment to continuous improvement.
- Bachelor's degree in Computer Science, Engineering, or a related field.
SGA is an Equal Opportunity Employer and does not discriminate on the basis of Race, Color, Sex, Sexual Orientation, Gender Identity, Religion, National Origin, Disability, Veteran Status, Age, Marital Status, Pregnancy, Genetic Information, or Other Legally Protected Status. We are committed to providing access, equal opportunity, and reasonable accommodation for individuals with disabilities in employment, and our services, programs, and activities. Please visit our company to request an accommodation or assistance regarding our policy.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
Senior Site Reliability Engineer CT
Rose International -
Hartford, CT
Mechanical Engineer
ACS - Engineer. Integrate. Build. -
Hartford, CT
Site Reliability Engineer
Relig Staffing Inc -
Hartford, CT