What are the responsibilities and job description for the API Enablement Site Reliability Engineer Senior Staff Engineer position at Russell Tobin?
What are we looking for in our API Enablement Site Reliability Engineer Senior Staff Engineer?
Job Description:
Job Title: Site Reliability Engg.
Duration: 10 Months (With further Extension or Conversion)
Location: Chicago, IL (Onsite)
Salary: $70 - 75/ hr on W2 (Depends on Experience)
Job Description:
We are seeking a highly skilled and experienced API Enablement SRE Senior Staff Engineer to join our Team. The ideal candidate will have a strong background in managing and optimizing complex systems, ensuring their reliability, scalability, and performance. This role focuses on enhancing our API Management Platforms and integrating SRE best practices.
Key Responsibilities:
API Platform and Enablement Team:
Russell Tobin offers eligible employee’s comprehensive healthcare coverage (medical, dental, and vision plans), supplemental coverage (accident insurance, critical illness insurance and hospital indemnity), 401(k)-retirement savings, life & disability insurance, an employee assistance program, legal support, auto, home insurance, pet insurance and employee discounts with preferred vendors.
#CB
Rate/Salary: $70 - 75/ hr on W2 (Depends on Experience)
Job Description:
Job Title: Site Reliability Engg.
Duration: 10 Months (With further Extension or Conversion)
Location: Chicago, IL (Onsite)
Salary: $70 - 75/ hr on W2 (Depends on Experience)
Job Description:
We are seeking a highly skilled and experienced API Enablement SRE Senior Staff Engineer to join our Team. The ideal candidate will have a strong background in managing and optimizing complex systems, ensuring their reliability, scalability, and performance. This role focuses on enhancing our API Management Platforms and integrating SRE best practices.
Key Responsibilities:
API Platform and Enablement Team:
- Design, implement, and maintain reliable and scalable SRE practices for API Management Platforms.
- Strong knowledge and experience in API solutions, platforms, API delivery, and API management.
- Strengthen the maturity of SRE practices by building on and executing improvements to observability, resiliency, and stability.
- Assess ecosystem changes to determine risk, impact, and checkout needs for API and Integration Platforms.
- Proactively consider SRE improvements and GenAI opportunities, create solutions, and successfully execute them.
- Create self-service capabilities to enable API provider teams to easily integrate with SRE API best practices.
- Lead incident management, structured triage, and analysis, including the creation and management of incident runbooks.
- Participate in on-call rotations for incidents and changes, including evenings and weekends.
- Conduct problem analysis, remediation, and continuous improvement to enhance system reliability.
- Implement and maintain observability and monitoring solutions, including Splunk and Dynatrace.
- Create unified views, dashboards, and visualizations to provide a single pane of glass and information radiators for system health and performance.
- Create unified views that can be shared across stakeholders to quickly align on the issue root cause.
- Design, implement, and maintain reliable and scalable systems and infrastructure.
- Lead the team in SRE and proactive risk mitigation, including resiliency and disaster recovery exercises, change management, and upgrades and patches.
- Level up SRE maturity and demonstrate it through the achievement of KPIs and operational metrics.
- Monitor and optimize the performance, availability, and reliability of systems and applications.
- Develop and maintain automation tools and scripts to streamline operations and improve efficiency.
- Define, operationalize, and integrate SRE-related KPIs, metrics, and ideas into day-to-day activities.
- Proactively manage risks, including assessment of findings, planning remediation, and executing to bring prompt closure to resolve risks.
- Strong knowledge and experience in API solutions, platforms, API delivery, and API management.
- Knowledge and skills in API Platforms (e.g., API Connect, Apigee, AWS API Gateway) and API Management.
- 5 years of experience in site reliability engineering or a related field.
- Expertise in SRE best practices, including incident management, resiliency, monitoring, detection, diagnosis, remediation, and prevention.
- Demonstrated experience in being on call and resolving incidents, including incident management and root cause analysis.
- Experience with large-scale distributed systems.
- Knowledge of CI/CD pipelines and DevOps practices.
- Experience with cloud platforms (e.g., AWS, Azure, GCP).
- Strong knowledge of system design, development, and management.
- Full stack software engineering skill set, including front-end, back-end, and database development.
- Proficiency in scripting languages (e.g., Python, Bash) and automation tools (e.g., Ansible, Terraform).
- Familiarity with monitoring and observability tools (e.g., Splunk, Dynatrace).
- Demonstrated ability to mature SRE practices and strengthen stability through proven KPIs and metrics.
- Excellent documentation, communication, problem-solving, and collaboration skills.
- Experience with GenAI and innovation, and a commitment to continuous improvement.
- Bachelor's degree in Computer Science, Engineering, or a related field.
Russell Tobin offers eligible employee’s comprehensive healthcare coverage (medical, dental, and vision plans), supplemental coverage (accident insurance, critical illness insurance and hospital indemnity), 401(k)-retirement savings, life & disability insurance, an employee assistance program, legal support, auto, home insurance, pet insurance and employee discounts with preferred vendors.
#CB
Rate/Salary: $70 - 75/ hr on W2 (Depends on Experience)
Salary : $70 - $75