Demo

Site Reliability Engineer - Hybrid

Volitiion IIT Inc
Columbia, DC Full Time
POSTED ON 4/22/2025
AVAILABLE BEFORE 6/22/2025

Job Details

Title: Site Reliability Engineer V
Location: Reston, VA (Hybrid onsite - 3 days a week from day 1)
Assignment duration: 24 months with possibility of extension
Interview process: 2 rounds. First round would be a video interview. Second round would be an in-person interview
Manager's call notes
  • This is an SRE role. SRE is under a shared services team within Fannie Mae who works with different application teams. So, multi-tasking is required.
  • In technical terms, we need expertise with AWS ECS, EC2, RDS, RedShift, EMR, Lambda, Route53, Step Functions etc.
  • Programming experience in Java or Python is required. We are not looking for a full fledged developer but someone who can take the code and modify as needed to create some small automations.
  • Exposure to DevOps is required. GitLab, Terraform and Jenkins would be preferred.
  • Experience with Observability using tools such as AWS CloudWatch, Splunk/SignalFX, Dynatrace, and OpenTelemetry would be helpful.
  • If the candidate has experience in release engineering/production support/performance engineering would be a bonus. This is not a show stopped though.
  • AWS, programming and DevOps are must haves.
  • The candidate has to come to the office 3 days a week in Reston, VA.
  • The SRE at Fannie Mae doesn't work 24*7. They get scheduled on a rotation basis. 20% of their job is production support activities. 80% of the time, they work with application teams studying applications, ability to understand the architecture, give suggestions on how the application can be made better, looking into the resiliency patterns and see if the application is resilient enough and suggest new things and work with them. Observability too. Identify gaps and weak points and work with the architecture team to resolve them. Look into the code scans, alarms to see if they are good enough.
  • SRE may not get 100% access to all the applications but the expectation is identifying the gaps/weak points and tell the application team to fix it.
  • On call rotation schedule: One day a week every week.
  • AI/ML: We have certain machine learning projects which the SRE interacts with. So, AI/ML experience is a plus to have.
  • Previous Fannie Mae experience is a plus.
Job Description
Overall years of experience:
8 years of related experience in their specific area with experience leading teams on projects with similar scope and complexity.
Bachelor's or master's degree in computer science or equivalent.
Certifications: AWS Solutions Architect, Agile Certified Practitioner (ACP), or relevant cloud certifications.

We are seeking a highly skilled and experienced Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a strong background in cloud platforms, DevOps practices, and modern software development frameworks. The SRE will play a critical role in designing, building, and maintaining highly scalable, fault-tolerant, and secure cloud infrastructure while ensuring operational excellence, high availability, and reliability.
Key Responsibilities:
1. Cloud Infrastructure & Automation:
Design, implement, and manage cloud-based infrastructure using platforms like AWS, Azure, or Google Cloud Platform.
Utilize Infrastructure-as-Code (IaC) tools such as Terraform, CloudFormation, and Ansible to automate deployments and configurations.
Create robust automation targeted at anomaly detection, toil reduction, recovery processes, and self-healing mechanisms, and optimize cloud costs.

2. DevSecOps & CI/CD:
Deep understanding of DevSecOps principles and CI/CD pipelines using tools like GitLab, Jenkins, SonarQube, NexArtifactory, and Docker.
Implement security best practices, including IAM roles, RBAC, vulnerability remediation, and SAST/DAST/SCA tools.

3. Observability & Incident Management:
Design and implement monitoring, logging, and distributed tracing solutions using tools like AWS CloudWatch, Splunk/SignalFX, Dynatrace, and OpenTelemetry.
Lead root cause analysis, blameless postmortems, and proactive incident management to minimize MTTR and MTTD.
Define and monitor SLOs, SLIs, and error budgets to ensure system reliability.

4. Microservices & API Management:
Architect and manage microservices, serverless computing, and RESTful APIs.
Ensure fault tolerance and resilience using design patterns like Circuit Breaker, Retry, Timeout, and Bulkhead.

5. Chaos Engineering & Resiliency:
Conduct chaos engineering experiments using tools like AWS FIS and Chaos Toolkit.
Perform resiliency assessments using Resilience Hub and implement self-healing solutions.

6. Database & Application Support:
Manage and optimize database technologies such as PostgreSQL, MongoDB, DynamoDB, Oracle, and Redshift.
Provide production support, including incident response, problem management, and runbook creation. Participate in on-call rotations.

7. Collaboration & Communication:
Collaborate with cross-functional teams to implement shift-left testing practices (BDD, TDD, Unit, Regression).
Create and maintain architecture diagrams, knowledge articles, and disaster recovery plans.
Communicate effectively with stakeholders and demonstrate strong relationship management skills.

Required Skills & Qualifications:
Expertise in cloud platforms (AWS, Azure, or Google Cloud Platform) and container orchestration.
Proficiency in programming/scripting languages such as Python, Java, Node.js, Bash, and PowerShell.
Strong knowledge of database technologies (e.g., PostgreSQL, MongoDB, DynamoDB, Oracle, Redshift).
Experience with DevOps tools (Jenkins, Docker, NexArtifactory) and build tools (Maven, Gradle).
Familiarity with AI/ML integrations, event-driven architectures, and distributed systems.
Expertise in observability, logging, and monitoring tools (AWS CloudWatch, Splunk, Dynatrace, OpenTelemetry).
Strong understanding of security practices, including IAM, RBAC, and vulnerability management.
Experience with chaos engineering, resiliency assessments, and disaster recovery planning.
Proficiency in performance testing tools (JMeter, LoadRunner) and capacity planning.
Excellent verbal and written communication skills, with the ability to collaborate across teams.

Preferred Qualifications:
Experience with AI/ML libraries (e.g., NLTK, Transformers, Spacy, SciPy), Amazon SageMaker, and GenAI tools.
Familiarity with project management tools like JIRA, Confluence, and ServiceNow.
Knowledge of utilities like AWS CLI, POSTMAN, and curl.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Site Reliability Engineer - Hybrid?

Sign up to receive alerts about other jobs on the Site Reliability Engineer - Hybrid career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$92,877 - $110,401
Income Estimation: 
$120,933 - $155,034
Income Estimation: 
$114,618 - $136,401
Income Estimation: 
$114,618 - $136,401
Income Estimation: 
$144,264 - $191,312
Income Estimation: 
$140,435 - $166,410
Income Estimation: 
$82,762 - $100,977
Income Estimation: 
$95,852 - $118,073
Income Estimation: 
$120,143 - $165,703
Income Estimation: 
$76,670 - $90,826
Income Estimation: 
$91,609 - $118,978
Income Estimation: 
$92,877 - $110,401
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Volitiion IIT Inc

Volitiion IIT Inc
Hired Organization Address Washington, DC Full Time
Period of Performance : Base of 12 months Place of Performance : Remote but will be required to work on-site at various ...
Volitiion IIT Inc
Hired Organization Address Columbia, DC Full Time
Job Details Position Title : Task Order 048 Technical Writer Senior Location : 14501 Sweitzer Lane, Laurel, MD 20707 Wor...
Volitiion IIT Inc
Hired Organization Address VA Full Time
Client currently undertaking a multi-year ERP modernization program for migrating and upgrading on-premise Oracle E-Busi...
Volitiion IIT Inc
Hired Organization Address Washington, DC Full Time
Job Description Job Description Location : Washington DC 20004 - Hybrid Seeking an experienced Endpoint Architect to des...

Not the job you're looking for? Here are some other Site Reliability Engineer - Hybrid jobs in the Columbia, DC area that may be a better fit.

AI Assistant is available now!

Feel free to start your new journey!