Demo

Incident Management Specialist

Datum Technologies Group
Reston, VA Contractor
POSTED ON 2/5/2025
AVAILABLE BEFORE 3/6/2025

Role: Incident Management Specialist – AWS Infrastructure

Location: Reston, VA (Hybrid)

Availability: On-call rotation, including weekends and night shifts



Summary :

Experienced IT professional specializing in incident management and application triage within a 24/7/365 environment. Skilled in troubleshooting, diagnosing, and resolving production incidents in AWS infrastructure, with expertise in performance monitoring, root cause analysis, and incident resolution. Proven ability to work effectively with cross-functional teams, manage incident status and impact, and lead technical triage calls. Strong communication and relationship management skills, capable of delivering clear updates to both technical and non-technical stakeholders.

What We Are Seeking :

We are seeking a candidate with expert-level knowledge in AWS infrastructure and services, particularly in the context of application triage and incident management. This role focuses on transaction tracing, log analysis, and incident resolution using AWS Console and other monitoring tools. We are not looking for candidates with solely development or deployment experience in AWS, but rather those with hands-on experience in diagnosing and resolving incidents in a cloud environment.

Key Responsibilities

  • Incident Management: Lead and manage IT production incidents to resolution, ensuring minimal downtime and effective communication of incident status, impact, and resolution.
  • AWS Expertise: Hands-on experience with Amazon Web Services (AWS), including EC2, ELB, RDS, Redshift, DynamoDB, Aurora, Route53, ECS, Lambda, S3, CloudWatch, CloudTrail, WAF, and more.
  • Cloud Monitoring: Build and leverage tools for monitoring and troubleshooting system resources in AWS, using platforms like Dynatrace, Splunk, SolarWinds, and MoogSoft.
  • Root Cause Analysis: Perform detailed transaction-level monitoring and troubleshooting of AWS infrastructure, including web, database, storage, and network layers.
  • Incident Triage: Lead technical incident triage calls, analyze system performance, and resolve incidents swiftly using monitoring tools and diagnostics.
  • Process Improvement: Proactively identify opportunities to improve operational processes, implement recommendations, and contribute to postmortem analysis for continuous improvement.
  • Collaboration: Work closely with other technical teams to influence incident resolution and share insights during follow-up calls and root cause analysis.
  • Stakeholder Communication: Provide timely updates and detailed reports on incident status and post-resolution metrics to senior leadership.

Core Skills & Technologies

  • AWS Services: EC2, ELB, RDS, Redshift, DynamoDB, Aurora, Route53, ECS, Lambda, S3, CloudWatch, CloudTrail, WAF
  • Incident Management: Hands-on management of IT incidents, triage, and resolution
  • Monitoring Tools: Dynatrace, Splunk, SolarWinds, MoogSoft, Extrahop, Catchpoint
  • Root Cause Analysis: Incident troubleshooting, transaction tracing, and diagnostics
  • Cloud Infrastructure: Performance engineering, resource monitoring, and cloud operations
  • Communication: Strong written and verbal skills, including executive-level reporting and cross-functional collaboration
  • Technical Areas: AWS, Unix/Linux servers, Wintel servers, networks, databases (Oracle, MS SQL), SAN, virtualization

Qualifications :

  • Manage and resolve complex incidents within AWS infrastructure, providing timely updates to stakeholders and ensuring minimal production downtime.
  • Lead incident triage calls, analyzing application and infrastructure health using AWS and third-party monitoring tools (e.g., Dynatrace, Splunk).
  • Collaborate with cross-functional teams to diagnose root causes and implement corrective actions, ensuring a quick resolution for high-priority incidents.
  • Design and improve incident management processes, proactively recommending changes to minimize recurring issues and enhance system stability.
  • Conduct postmortem analysis for critical incidents, documenting root cause, corrective actions, and lessons learned to improve future performance.
  • AWS Cloud Operations Specialist
  • Provided hands-on support for AWS-based applications, including incident monitoring, root cause analysis, and performance troubleshooting.
  • Implemented tools and dashboards for monitoring AWS infrastructure performance, improving incident detection and response times.
  • Worked closely with development and operations teams to resolve complex production issues, ensuring timely and effective solutions.
  • Supported the transition of legacy systems to AWS, optimizing application performance and operational efficiency.
  • Bachelor's degree in information technology
  • AWS Certified Solutions Architect – Associate
  • AWS Certified DevOps Engineer – Professional (Optional)
  • Certified Incident Management Professional (Optional)

Preferred Skills & Experience:

  • Experience with Service-Oriented Architecture (SOA) and Middleware management in UNIX/Linux environments.
  • Prior experience in the financial industry or with high-transaction applications.
  • Familiarity with OpenTel and advanced transaction monitoring tools

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Incident Management Specialist?

Sign up to receive alerts about other jobs on the Incident Management Specialist career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$115,647 - $153,495
Income Estimation: 
$186,685 - $265,377
Income Estimation: 
$152,958 - $200,151
Income Estimation: 
$186,685 - $265,377
Income Estimation: 
$71,440 - $92,105
Income Estimation: 
$87,466 - $114,731
Income Estimation: 
$115,647 - $153,495
Income Estimation: 
$87,466 - $114,731
Income Estimation: 
$114,790 - $146,930
Income Estimation: 
$115,647 - $153,495
Income Estimation: 
$114,790 - $146,930
Income Estimation: 
$142,618 - $183,267
Income Estimation: 
$115,647 - $153,495
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Datum Technologies Group

Datum Technologies Group
Hired Organization Address Seattle, WA Full Time
Field Engineer Long Term Contract Seattle, WA Qualifications: Implemented and supported desktop devices, applications, p...
Datum Technologies Group
Hired Organization Address Reston, VA Contractor
Job Description: AWS Operations and Incident Management Specialist Long term contract Reston, VA (Hybrid) KEY JOB FUNCTI...
Datum Technologies Group
Hired Organization Address Atlanta, GA Full Time
Sr. Security Engineer Long Term Contract Atlanta.GA Sr. Application & Cloud Container Security Engineer As an experience...
Datum Technologies Group
Hired Organization Address Atlanta, GA Full Time
Sr. Architect-Niche #777 Long-Term Contract Atlanta, GA Qualifications : 3 years of experience working with ITSM process...

Not the job you're looking for? Here are some other Incident Management Specialist jobs in the Reston, VA area that may be a better fit.

Incident Management Specialist

Mindlance, Reston, VA

Incident Response Specialist

Resource Management Concepts, Inc., Dahlgren, VA

AI Assistant is available now!

Feel free to start your new journey!