Demo

Need IT Engineer V ((Senior AWS Engineer for Incident Triage & Monitoring)

Cyber Resource Provider LLC
Reston, VA Full Time
POSTED ON 4/17/2025
AVAILABLE BEFORE 6/17/2025

Job Details

Role: IT Engineer V ((Senior AWS Engineer for Incident Triage & Monitoring)

Location: Reston, VA (5 days onsite)

Interview type: In person

Work mode: Onsite

Exp: 10

Fannie Mae, new role for Sr. AWS Engineer for incident Triage& Monitoring under the Central Command team. See the below job description this is an urgent position they are looking to interview ASAP. All candidates must be local and be able to come onsite for an in-person interview.

Description

In this incident management function, manage incidents to resolution in a 24/7/365 environment using the Fannie Mae incident management processes, effectively guide incident and triage calls from a technical perspective, share technical details obtained from monitoring tools and dashboards to aid troubleshooting, outline details of resolution activities, recommend and implement improved processes, provide timely status updates to stakeholders, assist with postmortem related activities and support various efforts related to operational improvements. Manage efforts to maintain application in production, including troubleshooting stoppages, repairing bugs, documenting application performance, and coordinating with technology infrastructure management.

KEY JOB FUNCTIONS

1. Excellent communicator who can manage IT incidents to resolution in a 24/7/365 environment using the Fannie Mae incident management processes and communicate management of incident status, impact, and resolution actions.

2. Hands on experience managing and monitoring applications deployed on Amazon Web Services (AWS).

3. Troubleshooting and resolving incidents on the AWS cloud infrastructure.

4. Experience with building tools for monitoring and troubleshooting of system resources in an AWS environment. Ability to triage AWS related incidents using monitoring tools on AWS Cloud.

5. Experience with performance engineering of AWS Cloud applications.

6. Hands on experience working with AWS tools like EC2, ELB, RDS, Redshift, DynamoDB, Aurora, Route53, ECS, Lambada, S3, Batch, CloudWatch, CloudTrail, WAF etc.

7. Hands on experience with transaction level monitoring using Dynatrace and Splunk.

8. Ability to perform transaction level monitoring and troubleshooting in AWS cloud platform.

9. Eyes on glass monitoring of the health of applications as well as the underlying infrastructure.

10. Monitoring experience with tools like Extrahop, SolarWinds, Netcool suite, Catchpoint, MoogSoft.

11. Ability to analyze dashboards and reporting/monitoring tools to look at trends and patterns in application health and performance.

12. Proactively looking for hardware, software, and environmental alerts or malfunctions.

13. Effectively lead and guide Incident triage calls from a technical perspective analyzing different components of the infrastructure and application environment via the use of a variety of monitoring tools and processes.

14. Troubleshoot the incidents and identify root cause quickly using operations, wire data analytics, application performance management and event correlation monitoring tools.

15. Perform analysis of data, evaluating multiple application protocols including web, database, storage, and supporting infrastructure such as AWS, UNIX, DNS, LDAP, SSL, SMTP, and FTP.

16. Influence other technical teams on the calls and articulate troubleshooting steps effectively.

17. Lead required technical follow-up calls for critical incidents.

18. Assist with documentation of Root Cause Analysis (RCA) or Correction of Errors (COE) and data quality for all ECC communicated incidents.

19. Ensure appropriate functional and management escalation takes place as per the standards and procedures.

20. Follow up on items that could potentially negatively impact production operations, assist with postmortem related activities, and support various efforts related to operational improvements.

21. Based on recommendations from management, implement new and improved processes, change processes, perform new tasks, create reports, and address ad-hoc requests.

22. Participate in on-call rotation. Ability to work on any shifts as needed including weekends and night shifts.

23. Ability to report incident details and metrics to senior leadership.

EDUCATION

Bachelor's Degree or equivalent required.

MINIMUM EXPERIENCE

10 years of related experience

SPECIALIZED KNOWLEDGE & SKILLS

1. 10 years of working experience with different IT Infrastructure components such as Unix/ Linux Servers, Wintel Servers, AWS, networks, firewalls, routers, load balancers, VPN, Apache, web logic, LDAP, Active Directory, Exchange, Oracle/MS SQL databases, SAN, Virtualization, Email systems, Enterprise monitoring and access management solutions for single sign on. Subject matter expertise is not required and experience with at least eight of the above is preferred.

2. Senior level hands-on working experience with Amazon Web Services (AWS).

3. Understanding different layers of the AWS Infrastructure e.g., WAF, R53, CloudFront, Load Balancing, HA features.

4. Proven methodical approach to problem identification, monitoring, problem solving and resolution.

5. Ability to analyze different components of the infrastructure and application environments during Incident triage calls.

6. Ability to trace transaction failures and debug the root cause in various layers of the AWS infrastructure and services.

7. Aptitude to influence other technical teams on incident calls & articulate troubleshooting steps effectively.

8. Experience & confidence working with all levels of management; excellent written and verbal skills.

9. Able to communicate with senior management quickly and concisely on technical issues in non-technical terms and to run large conference calls during Incident calls with a wide range of personnel and management levels.

10. Strong relationship management skills and aptitude to multi-task and work well in a high stress environment, both within teams and independently.

11. AWS Solution Architect Associate or higher certification

12. Monitoring and observability experience.

13. Experience with monitoring dashboards for incident detection and alerting.

14. Perform end-to-end analysis of transactions under an observability environment.

15. Troubleshoot incidents and identify root cause quickly using wire data analytics, application performance management and event correlation monitoring tools.

16. Diagnose and resolve incidents by providing factual data from the various monitoring and instrumentation systems.

17. Monitor applications and infrastructure using tools like Splunk, Dynatrace, OpenTel, Catchpoint, xMatters, SignalFx, xMatters, SolarWinds, Extrahop etc.

Preferred Qualifications:

1. Management and troubleshooting of Middleware products on UNIX and Linux environments. Knowledge of Service Oriented Architecture (SOA), Java etc.

2. Prior Fannie Mae or Financial industry experience.

3. Experience with OpenTel

What we are seeking:

We are looking for candidates with expert level knowledge in AWS infrastructure and various AWS services. The key focus for this position is application triage, transaction tracing in various layers of AWS infrastructure, log analysis, diagnosis of issues using AWS Console and other monitoring tools, sharing analysis with wider audience during incident calls, strong communication. We are not looking for candidates with only development or deployment experience in AWS.

The candidate should write a paragraph about the hands-on triage work done on AWS Cloud and submit it along with the resume.


Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Need IT Engineer V ((Senior AWS Engineer for Incident Triage & Monitoring)?

Sign up to receive alerts about other jobs on the Need IT Engineer V ((Senior AWS Engineer for Incident Triage & Monitoring) career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$178,619 - $225,190
Income Estimation: 
$132,903 - $169,021
Income Estimation: 
$144,671 - $184,917
Income Estimation: 
$136,361 - $179,761
Income Estimation: 
$86,891 - $130,303
Income Estimation: 
$178,619 - $225,190
Income Estimation: 
$132,903 - $169,021
Income Estimation: 
$144,671 - $184,917
Income Estimation: 
$136,361 - $179,761
Income Estimation: 
$86,891 - $130,303
Income Estimation: 
$141,102 - $168,742
Income Estimation: 
$194,188 - $238,415
Income Estimation: 
$131,611 - $156,576
Income Estimation: 
$141,102 - $168,742
Income Estimation: 
$154,597 - $194,610
Income Estimation: 
$172,688 - $210,712
Income Estimation: 
$170,589 - $211,671
Income Estimation: 
$178,619 - $225,190
Income Estimation: 
$86,891 - $130,303
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Cyber Resource Provider LLC

Cyber Resource Provider LLC
Hired Organization Address Richmond, VA Full Time
Job Details Client is looking for an experienced and highly skilled Azure Lead Data Engineer to lead the design, develop...

Not the job you're looking for? Here are some other Need IT Engineer V ((Senior AWS Engineer for Incident Triage & Monitoring) jobs in the Reston, VA area that may be a better fit.

Delivery Driver (part-time)

Need It Now Delivers - Automotive, District Heights, MD

Delivery Driver (part-time)

Need It Now Delivers - Automotive, Frederick, MD

AI Assistant is available now!

Feel free to start your new journey!