What are the responsibilities and job description for the DevOps Site Reliability Engineer position at Rogue Fitness?
Overview
Job Description:
The DevOps Site Reliability Engineer (SRE) is responsible for designing, implementing, and maintaining Rogue’s application infrastructure, including directly supporting cutting-edge software solutions from systems that support manufacturing and planning to supply chain and fulfillment.
The DevOps Site Reliability Engineer is a fully onsite role in Columbus, Ohio. Remote work is not available.
Responsibilities
By applying to Rogue, regardless of the platform you choose to use, you are agreeing to Rogue's preferred methods of communication (i.e. text message). Submitting an application, through whatever online forum is ultimately used, constitutes a knowing and voluntary agreement to send and receive text messages during the recruitment process.
Job Description:
The DevOps Site Reliability Engineer (SRE) is responsible for designing, implementing, and maintaining Rogue’s application infrastructure, including directly supporting cutting-edge software solutions from systems that support manufacturing and planning to supply chain and fulfillment.
The DevOps Site Reliability Engineer is a fully onsite role in Columbus, Ohio. Remote work is not available.
Responsibilities
- Design, implement, and maintain our infrastructure and applications to ensure they are highly available, scalable, and reliable
- Collaborate with our development and operations teams to implement automation, monitor performance, and identify and resolve issues before they affect our customers
- Implement best practices for application deployment, configuration, management, and security
- Plan and coordinate deployment processes for infrastructure upgrades with minimum downtime
- Monitor and analyze system performance metrics to identify and address issues
- Develop and maintain infrastructure as code using tools like Terraform, and Kubernetes
- Troubleshoot, determine the root cause of issues, and conduct post mortem analysis
- Implement and maintain CI/CD pipelines for our applications
- Support disaster recovery and business continuity planning
- Provide coverage to respond to production issues and incidents
- Assist application teams with docker, build tools, and local development environment for Windows, OSX, and Linux
- Bachelor of Science Degree in Computer Science, Information Systems, Computer Engineering, or related area
- 3 years of expert-level experience of containerization and orchestration tools like Docker, Kubernetes, and Helm
- 3 years of experience writing and executing scripts using Bash, PowerShell, or a combination of both these scripting languages
- 3 years of experience with Git based version control platforms, such as GIT, Bitbucket, DevOps, or other similar version control platform
- 3 years of demonstrated experience in HTTP/HTTPs, certificates, PPK, and other encryption strategies
- 3 years of headless Linux administration and management experience
- 1 year of system and network architecture experience with a cloud provider, such as Azure, GCP, or similar
- 1 year of experience utilizing monitoring tools, such as Prometheus, Grafana, Application Insights, or GCP Cloud Monitoring
- 1 year demonstrated experience applying programming skills for automation tools and processes, such as with Azure Devops, Terraform, Jenkins, or similar tools
- 1 year strong networking knowledge experience with firewalls, load balancing, and reverse proxy products
- Alternatively, a master’s degree and 1 year of experience in each of the above is acceptable
By applying to Rogue, regardless of the platform you choose to use, you are agreeing to Rogue's preferred methods of communication (i.e. text message). Submitting an application, through whatever online forum is ultimately used, constitutes a knowing and voluntary agreement to send and receive text messages during the recruitment process.