What are the responsibilities and job description for the Site Reliability Engineer position at Wheeler Staffing Partners?
Site Reliability Engineer
Location : Fully Remote (Priority to Candidates in TX, Open to VA, NV, FL, PA, NJ, MO, NC)
Employment Type : Direct Hire
Salary : Up to $150,000 (Flexible Based on Experience)
Sponsorship : Client cannot sponsor or work with C2C
About the Role
Wheeler Staffing Partners is seeking a Site Reliability Engineer (SRE) for our client. This role is critical to maintaining and enhancing the reliability, scalability, and performance of mission-critical systems. The ideal candidate has hands-on experience with AWS cloud environments, Infrastructure as Code (IaC), automation tools, containerized workloads, and monitoring systems .
The SRE will work closely with Engineering teams to identify and resolve performance bottlenecks, optimize system reliability, and drive automation for infrastructure operations.
This is a fully remote role , but priority will be given to candidates located in Texas . Candidates in VA, NV, FL, PA, NJ, MO, and NC will also be considered.
Key Responsibilities
Ensure System Reliability & Performance – Maintain and improve the uptime, performance, and scalability of cloud and on-premise infrastructure.
Develop & Automate Processes – Build and implement automation tools for monitoring, deployment, and incident response to reduce manual interventions.
Monitor & Troubleshoot Issues – Use observability tools to proactively detect, diagnose, and resolve infrastructure and application performance issues.
Optimize Cloud & On-Prem Infrastructure – Manage and optimize AWS cloud environments , ensuring cost efficiency and security .
Enhance Disaster Recovery & Resilience – Implement backup strategies, failover systems, and incident response protocols to minimize downtime and data loss.
Collaborate with DevOps & Engineering Teams – Work closely with software engineers, data scientists, and IT teams to enhance system architecture and streamline deployment pipelines .
Security & Compliance – Ensure system security, data integrity, and compliance with regulatory requirements.
Capacity Planning & Scaling – Analyze system performance trends and plan for future scalability needs.
Incident Management & Post-Mortems – Lead incident response efforts , document root causes, and implement preventive measures.
Continuous Improvement – Identify bottlenecks and inefficiencies in infrastructure and implement best practices to enhance reliability.
Required Qualifications
Bachelor’s degree in Computer Science, Information Technology, or a related field (may consider equivalent experience).
3 years of experience in a Site Reliability Engineering (SRE), DevOps, or Infrastructure Engineering role .
Strong expertise in AWS cloud services and Infrastructure as Code (IaC) tools , including :
AWS Cloud Development Kit (CDK)
AWS CloudFormation
Experience with CI / CD tools , such as :
Jenkins, GitHub Actions
Proficiency in containerization and orchestration tools like :
Docker
Strong understanding of :
Load balancers, REST APIs, networking (IP management, subnetting), HA architecture
Serverless cloud computing models
Proficiency in cloud monitoring and observability tools , such as :
AWS CloudWatch, EFK Stack, OpenTelemetry, Datadog, Grafana, New Relic
Ability to define and track golden metrics and establish meaningful alerting thresholds.
Strong analytical skills with experience in root cause analysis and incident management .
Excellent communication and collaboration skills to work across teams.
Preferred Qualifications
Cloud-related certifications such as :
AWS Certified DevOps Engineer
Certified Kubernetes Administrator (CKA)
Experience with Agile methodology or willingness to learn.
Why This Role?
Fully remote opportunity with priority given to Texas-based candidates.
High-impact role in maintaining and scaling mission-critical infrastructure.
Competitive salary with flexibility based on experience.
Opportunity to work with cutting-edge cloud technologies, automation tools, and monitoring platforms .
Collaborate with engineering teams to drive innovation in system reliability.
Salary : $150,000