What are the responsibilities and job description for the Site Reliability Engineer (On-Prem) position at QA Wolf?
đș QA Wolf gets engineering teams to 80% automated E2E coverage fast, and keeps it there.
Weâre growing fast and need a Site Reliability Engineer to keep our Oklahoma City data center running smoothly while helping our infrastructure team improve on-prem observability. If you have IT or hardware experience and want to expand into reliability engineering, this is a great place to start.
In this role, you will:
- Maintain and troubleshoot hardware in the data center to ensure everything runs efficiently.
- Respond to on-call hardware issuesâjump in, fix it, and keep things online.
- Work with the infrastructure team to improve observability and prevent issues before they happen.
- Help deploy and configure monitoring tools to increase visibility.
- Document hardware configurations, fixes, and incidents so the team can stay aligned.
What makes you a great fit?
- Experience in IT support or hardware maintenance (data center experience is a plus).
- Strong problem-solving skills and a focus on keeping things running smoothly.
- Works well with a team and communicates effectively.
- Comfortable being on-call to resolve hardware issues when needed.
- Experience with on-prem infrastructure, Kubernetes, VPNs (IPSec, OpenVPN), provisioning bare metal servers, and hybrid cloud connectivity (not required, but helpful).
This is a hands-on role with real impact. Youâll gain experience in infrastructure and reliability engineering while working with a fast-moving team. Ready to jump in? Letâs talk.