What are the responsibilities and job description for the Site Reliability Engineer - Hybrid position at Volitiion IIT Inc?
Job Details
Job ID 38655
Position 2: Site Reliability Engineer
Location: Reston, VA (Hybrid onsite)
Duration: 1 year with possible extension
Interview process: Panel interview with 4 members
About the Role:
We are seeking a Site Reliability Engineer (SRE) to join our Enterprise Technology Operations (ETO) team. This role follows the YBYO (You Build, You Operate) model, meaning you will be responsible for both building and supporting the product. The ideal candidate must have a strong problem-solving mindset, experience in incident management, and expertise in AWS infrastructure.
Key Responsibilities:
Design, develop, and maintain automation solutions to enhance system reliability and efficiency.
Implement monitoring and observability solutions to ensure visibility into system performance and health.
Apply SRE principles and best practices, automating repetitive tasks to improve system resilience.
Manage incident response and troubleshooting, ensuring rapid issue resolution.
Optimize and support AWS services, including EC2, Lambda, ECS, Batch, S3, RDS, and CloudWatch.
Conduct resiliency testing, identifying and mitigating potential system weaknesses.
Provide after-hours support when necessary in case of critical incidents.
Collaborate with cross-functional teams and stakeholders to ensure seamless system operations.
Mentor junior team members and drive continuous improvement initiatives.
Adapt to new tools and methodologies, staying ahead in the evolving cloud landscape.
Required Skills & Experience:
Strong SRE experience in AWS with hands-on expertise in cloud infrastructure.
Proficiency in automation, monitoring & observability, and resiliency testing.
Experience with AWS services such as EC2, Lambda, ECS, Batch, S3, RDS, and CloudWatch.
Knowledge of incident management and ability to troubleshoot complex system issues.
Experience in automating repetitive tasks to improve system reliability.
Strong communication skills to interact with portfolio teams and stakeholders.
Ability to take ownership, manage multiple projects, and meet deadlines.
Nice-to-Have Skills:
Exposure to chaos engineering (not a must-have).
Familiarity with new tools and methodologies in cloud and infrastructure automation.
Experience mentoring junior team members and contributing to a culture of continuous improvement.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.