What are the responsibilities and job description for the Site Reliability Engineer -W2 position at eTek IT Services, Inc.?
Job Details
Job Description
Job Description
Overview
The Site Reliability Engineer will play a crucial role in ensuring the reliability, scalability, and performance of our infrastructure and applications, ultimately contributing to the seamless operations of our systems. This role is vital in maintaining a high level of uptime and system efficiency, enhancing the overall user experience, and enabling our organization to meet its objectives.
Key Responsibilities- Design and implement monitoring and alerting systems to ensure high availability and performance of services
- Develop automation tools for system provisioning, configuration management, and application deployment
- Collaborate with cross-functional teams to ensure that new software and systems are production-ready
- Perform capacity planning and manage infrastructure capacity efficiently
- Conduct root cause analysis of production issues and implement preventive measures
- Participate in on-call rotations and respond to system emergencies
- Ensure compliance with security and regulatory standards in all aspects of the infrastructure
- Contribute to the continuous improvement of the reliability and performance of systems and applications
- Implement best practices for cloud infrastructure and services
- Lead initiatives to optimize system performance and stability
- Conduct periodic testing of disaster recovery and failover systems
- Document system configurations, processes, and procedures
- Assist in evaluating new technologies and methods to improve reliability and performance
- Bachelor's degree in Computer Science, Information Technology, or a related field
- 3 years of experience in a site reliability engineering role
- Proficiency in Linux system administration and troubleshooting
- Strong programming skills in Python, Shell scripting, or other scripting languages
- Experience with cloud platforms such as AWS, GCP, or Azure
- Expertise in building and maintaining scalable, high-performance systems
- Knowledge of containerization and orchestration technologies (Docker, Kubernetes)
- Hands-on experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK)
- Ability to design and implement automated solutions for infrastructure and application deployment
- Excellent troubleshooting and problem-solving skills
- Understanding of networking concepts and protocols
- Strong communication and collaboration skills
- Relevant certifications (e.g., AWS Certified DevOps Engineer, Google Professional Cloud DevOps Engineer) a plus
}
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.