What are the responsibilities and job description for the REMOTE Site Reliability Engineer position at TEKsystems?
Our client is searching for a Site Reliability Engineer for a full-time position that can be worked remotely.
*Description*
The Senior Site Reliability Engineer (SRE) will oversee the operations of the applications that run for the enterprise. You will work closely with other technology teams in architecting a platform that will best suit the needs of the business. The primary focus is developing and implementing tooling to make that possible. You will be expected to understand, operate, and automate within the entire stack and all environments.
You must be capable of working independently, collaboratively within the team as well as collaboratively with other teams across the company.
We are seeking a Senior Site Reliability Engineer (SRE) to lead by example by addressing production issues and technically mentoring and coaching other team members to instill a culture of continuous improvement and optimization.
The SRE should have a passion for tackling challenges across a diverse platform of integrated technologies. This role combines software and systems engineering disciplines with operations to build high-performance, scalable, and secure application systems.
SREs are team players who embed themselves within both product and engineering teams as needed to advance the architecture and performance of software systems and train their peers on continuously improving performance and the overall customer experience.
Job Responsibilities:
*Develop and report metrics describing production system availability, uptime, and responsiveness.
*Develop playbooks to address and solve platform issues.
*Develop and implement tools and processes to increase monitoring of production systems and applications.
*Proven success in managing 24/7 operational environments with cloud, container, and IaC technologies and services such as Kubernetes, and Terraform.
*Software development, systems administration, or scripting skills along with knowledge of IaC to improve reliability, observability, and availability of production systems and applications.
*An understanding of configuration management standard methodologies and how they enable reliable production systems.
*Ability to swiftly resolve production issues in real-time, while establishing actions and plans to implement strategic fixes that address root problems.
*Experience performing and detailing root-cause-analyses.
*Builds, improves, and runs critical backend services as well as tooling and automation to allow multiple product teams to release and scale their software reliably and predictably.
*Assist in establishing requirements, methods, and procedures for routine maintenance
*Troubleshoot existing systems to identify errors or deficiencies and develop solutions
*Implement monitoring and alerting schemes to detect and notify when performance thresholds are not being met
*Tune server and application-level performance monitoring and alerting
*Release Engineering
oContinuous Integration
oTools for testing
oContinuous Deployment and zero downtime deployments
oDevelop services and tooling that facilitate high quality releases
*Major Incident Response
*Disaster Recovery Planning
*Excellent written and verbal communication skills
Culture:
*Diligence around learning and sharing best practices around troubleshooting and the resolution of issues
*A driven team player focused on building relationships and mentoring and growing the skills of your team members
*Works closely with development teams to understand, evaluate and propose solutions to meet current and anticipated future growth challenges
*Collaborative team player wiling to be involved at various stages of the SDLC
Qualifications and Requirements:
*3-5 years of software development experience required
*2-3 years general Application/Systems Administration experience required
*Experience with Azure required
*Experience developing in NodeJS, Python, PHP, REACT & Javascript required
*Experience with AD and Azure AD required
*Experience working in a high traffic enterprise environment required
*SSL and certificate management, cloud-based storage required
*Experience with Kubernetes required
*Advanced knowledge of web-based, service-oriented applications, and testing tools required
*Understanding of software load balancing, feature switching, service discovery required
*Advanced knowledge with Caching systems and techniques
*Advanced experience with CI/CD systems
*Understanding of complex web hosting configuration components, including firewalls, load balancers, CDNs, web and database servers
*Experienced in server-side scripting languages
*Pay and Benefits*
The pay range for this position is $125000.00 - $135000.00/yr.
BBQ Guys, Blaze, and PCM is proud to offer a
comprehensive benefits package to all eligible
employees:
* Medical Benefits - BBQGuys, Blaze, and
PCM pays a portion towards the employee
medical premium. Plans are administered
through Gravie.
* Dental & Vision Benefits - Voluntary plans
administered by Reliance Matrix.
* Basic Life and AD&D - 100% for $15,000 for
each Full Time Employee.
* Voluntary Life and AD&D - Voluntary plan
administered by Mutual of Omaha.
* Voluntary STD and LTD - Voluntary plan
administered by Mutual of Omaha.
* Voluntary Critical Illness and Accident plans
administered by Reliance Matrix.
* 401K Contribution - BBQGuys, Blaze, and
PCM makes a discretionary match of
employee deferrals.
Employee Perks
* Flexible Paid Time Off
* Volunteer Time Off (Up to 8 hours per
quarter)
* Parental Leave
* Remote/Hybrid Work Options (for certain
positions)
* EAP for mental health, support, and
counseling
* Opportunities for community involvement
* Employee Purchase Program
* Employee Referral Program
* Health & Wellness Programs
* Employee Recognition and Rewards
* Open Door Policy
* Professional Development Opportunities
*Workplace Type*
This is a fully remote position.
*Application Deadline*
This position is anticipated to close on Mar 27, 2025.
About TEKsystems:
We're partners in transformation. We help clients activate ideas and solutions to take advantage of a new world of opportunity. We are a team of 80,000 strong, working with over 6,000 clients, including 80% of the Fortune 500, across North America, Europe and Asia. As an industry leader in Full-Stack Technology Services, Talent Services, and real-world application, we work with progressive leaders to drive change. That's the power of true partnership. TEKsystems is an Allegis Group company.
The company is an equal opportunity employer and will consider all applications without regards to race, sex, age, color, religion, national origin, veteran status, disability, sexual orientation, gender identity, genetic information or any characteristic protected by law.
*Description*
The Senior Site Reliability Engineer (SRE) will oversee the operations of the applications that run for the enterprise. You will work closely with other technology teams in architecting a platform that will best suit the needs of the business. The primary focus is developing and implementing tooling to make that possible. You will be expected to understand, operate, and automate within the entire stack and all environments.
You must be capable of working independently, collaboratively within the team as well as collaboratively with other teams across the company.
We are seeking a Senior Site Reliability Engineer (SRE) to lead by example by addressing production issues and technically mentoring and coaching other team members to instill a culture of continuous improvement and optimization.
The SRE should have a passion for tackling challenges across a diverse platform of integrated technologies. This role combines software and systems engineering disciplines with operations to build high-performance, scalable, and secure application systems.
SREs are team players who embed themselves within both product and engineering teams as needed to advance the architecture and performance of software systems and train their peers on continuously improving performance and the overall customer experience.
Job Responsibilities:
*Develop and report metrics describing production system availability, uptime, and responsiveness.
*Develop playbooks to address and solve platform issues.
*Develop and implement tools and processes to increase monitoring of production systems and applications.
*Proven success in managing 24/7 operational environments with cloud, container, and IaC technologies and services such as Kubernetes, and Terraform.
*Software development, systems administration, or scripting skills along with knowledge of IaC to improve reliability, observability, and availability of production systems and applications.
*An understanding of configuration management standard methodologies and how they enable reliable production systems.
*Ability to swiftly resolve production issues in real-time, while establishing actions and plans to implement strategic fixes that address root problems.
*Experience performing and detailing root-cause-analyses.
*Builds, improves, and runs critical backend services as well as tooling and automation to allow multiple product teams to release and scale their software reliably and predictably.
*Assist in establishing requirements, methods, and procedures for routine maintenance
*Troubleshoot existing systems to identify errors or deficiencies and develop solutions
*Implement monitoring and alerting schemes to detect and notify when performance thresholds are not being met
*Tune server and application-level performance monitoring and alerting
*Release Engineering
oContinuous Integration
oTools for testing
oContinuous Deployment and zero downtime deployments
oDevelop services and tooling that facilitate high quality releases
*Major Incident Response
*Disaster Recovery Planning
*Excellent written and verbal communication skills
Culture:
*Diligence around learning and sharing best practices around troubleshooting and the resolution of issues
*A driven team player focused on building relationships and mentoring and growing the skills of your team members
*Works closely with development teams to understand, evaluate and propose solutions to meet current and anticipated future growth challenges
*Collaborative team player wiling to be involved at various stages of the SDLC
Qualifications and Requirements:
*3-5 years of software development experience required
*2-3 years general Application/Systems Administration experience required
*Experience with Azure required
*Experience developing in NodeJS, Python, PHP, REACT & Javascript required
*Experience with AD and Azure AD required
*Experience working in a high traffic enterprise environment required
*SSL and certificate management, cloud-based storage required
*Experience with Kubernetes required
*Advanced knowledge of web-based, service-oriented applications, and testing tools required
*Understanding of software load balancing, feature switching, service discovery required
*Advanced knowledge with Caching systems and techniques
*Advanced experience with CI/CD systems
*Understanding of complex web hosting configuration components, including firewalls, load balancers, CDNs, web and database servers
*Experienced in server-side scripting languages
*Pay and Benefits*
The pay range for this position is $125000.00 - $135000.00/yr.
BBQ Guys, Blaze, and PCM is proud to offer a
comprehensive benefits package to all eligible
employees:
* Medical Benefits - BBQGuys, Blaze, and
PCM pays a portion towards the employee
medical premium. Plans are administered
through Gravie.
* Dental & Vision Benefits - Voluntary plans
administered by Reliance Matrix.
* Basic Life and AD&D - 100% for $15,000 for
each Full Time Employee.
* Voluntary Life and AD&D - Voluntary plan
administered by Mutual of Omaha.
* Voluntary STD and LTD - Voluntary plan
administered by Mutual of Omaha.
* Voluntary Critical Illness and Accident plans
administered by Reliance Matrix.
* 401K Contribution - BBQGuys, Blaze, and
PCM makes a discretionary match of
employee deferrals.
Employee Perks
* Flexible Paid Time Off
* Volunteer Time Off (Up to 8 hours per
quarter)
* Parental Leave
* Remote/Hybrid Work Options (for certain
positions)
* EAP for mental health, support, and
counseling
* Opportunities for community involvement
* Employee Purchase Program
* Employee Referral Program
* Health & Wellness Programs
* Employee Recognition and Rewards
* Open Door Policy
* Professional Development Opportunities
*Workplace Type*
This is a fully remote position.
*Application Deadline*
This position is anticipated to close on Mar 27, 2025.
About TEKsystems:
We're partners in transformation. We help clients activate ideas and solutions to take advantage of a new world of opportunity. We are a team of 80,000 strong, working with over 6,000 clients, including 80% of the Fortune 500, across North America, Europe and Asia. As an industry leader in Full-Stack Technology Services, Talent Services, and real-world application, we work with progressive leaders to drive change. That's the power of true partnership. TEKsystems is an Allegis Group company.
The company is an equal opportunity employer and will consider all applications without regards to race, sex, age, color, religion, national origin, veteran status, disability, sexual orientation, gender identity, genetic information or any characteristic protected by law.
Salary : $125,000 - $135,000
Associate Site Reliability Engineer
ApTask -
Houston, TX
Site Reliability Engineer
Collabera -
Houston, TX
DevOps and Site Reliability Engineer (SRE)
Robotics technology LLC -
Texas, TX