What are the responsibilities and job description for the Site Reliability Engineer (SRE) position at Vuesol Technologies Inc.?
Job Details
Site Reliability Engineer (SRE)
Kansas City, Need Locals
Roles and Responsibilities
The SRE role bridges the Development Engineer role and the Production Engineer role with a mixture of development, test, deploy, and support skills that contribute to application reliability and resiliency. The SRE approaches problems as an Engineer and looks to automate processes with code or tools to detect and prevent identified software reliability issues. The SRE role splits time between runtime support issues (toil) and development automation work (dev). The skills are organized by Development, Support, and Common areas.
Software Development and Configuration
The following SRE skills are used to improve reliability of an application/service while it is in development:
Required Skills:
- 8-10 years overall experience
- Hands-On proficiency in at least one high-level language (Java (must), NodeJS, Kotlin, Python, Go) (3-4 yrs)
- Hands-On experience with automated testing tools (JMeter, Junit, Mockito, Postman)
- Hands-On experience with a source code management system like GIT or SVN including pull, push, branch, commit and merge functions
- Hands-On experience creating, configuring and maintaining cloud-based applications and infrastructure for the rapid development and monitoring of applications and services. (AWS, EC2, Fargate, Cloud Formation, RDS, Elastic Cache, S3)
- Experience with Cloud Migrations with reliability and availability as core focus
- Experience in implementing the SRE at the team/enterprise level with hands-on implementation of SRE practices and improving the metrics
- Hands-On experience with monitoring tools (Splunk, Dynatrace, NOI) and dashboard development including development and customization of dashboards
- Hands-On experience with the build, deploy, and packaging process and best practices. Familiar using DevOps automation tools (UCD, Jenkins, Maven, SonarQube, Chef, Ansible, Puppet)
- Scripting skills for automation (Linux bash and Windows)
- Experience with network implementations
- Hands-On experience in developing/implementing SRE reliability practices as part of Microservices delivery to Cloud
General Required Skills:
- Ability to diagnose and optimize software code for reliability and resiliency
- Knowledge of the incident management process and reporting tools (ServiceNow, Jira Service Desk)
- Good communication and documentation skills. An SRE must document their work, collect and document tribal knowledge (the good stuff in people s head), and make it accessible to others.
- Good knowledge in building the frameworks and guiding teams in increasing SRE practice adoption
- Experience triaging incidents and conducting RCAs (Root Cause Analysis)
Nice to have skills:
- Familiar with measuring KPIs like MTTR and MTTD
- Ability to diagnose technical problems, isolate and debug issues, formulate creative solutions, analyze alternative approaches, and implement a timely solution.
- Experience providing alternatives and estimates for implementing a fix or automation to improve reliability.
- Experience working in an Agile squad with epics, stories, sprints, story points, and collaboration using Jira.
- Champion best practices by actively collaborating with other teams in a culture that values white boarding and technical design reviews.
- Contribute as a subject matter expert in multiple areas, constantly pushing yourself to be a better engineer. Assist other squad members to learn from your experience and expertise.
- Ability to juggle several different tasks at a time, and able to frequently adjust for new tasks or higher priority tasks.
- Familiar with software frameworks (JEE, Spring, React, Angular)
- Expertise with web application development technologies like JavaScript, HTML5, CSS.
- Experience with a modern RDBMS or NoSQL, like Postgres, MySQL, DB2, Oracle, MongoDB, and Cloudant
- Knowledge of Dark deploy techniques or feature toggles, and how to implement them.