What are the responsibilities and job description for the Sr Site Reliability Engineer position at Talent Groups?
Job Details
Contract to Hire | Hybrid McKinney, TX | U.S. Citizenship Required
We re seeking a Senior Site Reliability Engineer (SRE) to join our client s team and help design, implement, and maintain highly reliable, scalable, secure, and cost-effective infrastructure solutions. In this role, you'll play a critical part in improving system stability, observability, and overall performance across our platforms.
As a Senior SRE, you'll serve as a bridge between development and operations, applying a software engineering mindset to infrastructure and systems management. You ll proactively identify areas for optimization, build robust infrastructure, and foster a culture of operational excellence throughout the organization.
Key Responsibilities
- System Administration: Install, configure, and maintain Linux environments and container orchestration platforms to ensure high availability and performance. Responsibilities include kernel tuning, user permissions, and troubleshooting both hardware and software issues.
- Network Administration: Design, monitor, and troubleshoot network systems and protocols (e.g., DNS, DHCP, VPN). Secure networks through segmentation and access control.
- Monitoring & Observability: Implement comprehensive observability solutions using tools like Prometheus and Grafana. Set up alerting systems for proactive issue detection and resolution.
- Automation: Leverage Infrastructure as Code (IaC) tools such as Terraform and Ansible to automate provisioning and configuration tasks, ensuring consistency across environments.
- Security: Apply best practices to secure systems and networks, including firewalling, IDS, vulnerability management, and encryption protocols.
- Incident Response: Participate in on-call rotations to troubleshoot and resolve production incidents quickly and effectively.
- Documentation & Collaboration: Create clear documentation for systems, processes, and architectures while fostering strong cross-functional team relationships.
Non-Technical
- Leads through influence, mentoring and empowering peers
- Balances tactical and strategic needs to address both short and long-term organizational priorities based on articulated team and company goals
- Demonstrates intrinsic motivation
- Writes clear, concise, and meaningful documentation Develops and leverages collaborative, empathetic relationships across the organization
- Ability to make and explain thoughtful decisions based on sound logical, analytical, data-driven reasoning
Technical
- Expertise with container management (Kubernetes, ECS, Docker, Helm)
- Expertise with configuration management (Ansible, Chef, Puppet)
- Expertise with infrastructure as code (Terraform, OpenTofu, Pulumi)
- Expertise with monitoring and alerting systems (Cloudwatch, Datadog, New Relic, Site24x7, Dynatrace)
- Expertise with Linux systems deployment, management, performance tuning, and debugging
- Expertise with computer networking Experience with VCS systems and providers (Git, Mercurial, Github, Sourcehut)
- Experience with CI/CD systems (Github Actions, Circle CI, Argo)
- Experience with ticket management systems (Jira, Shortcut, Azure DevOps)
Salary : $70 - $75