Demo

Reliability Engineer

Ardent Services
Virginia, VA Full Time
POSTED ON 3/4/2025
AVAILABLE BEFORE 6/4/2025

Why do you need to choose between doing important work and having a fulfilling life? At Ardent , we have both. Ardent employees are committed to solving our customers' most difficult problems-and we are committed to the well-being, personal goals, and professional development of our employee. We are "All In." We put forth our strongest effort possible to get the mission accomplished and we do it together. We respect the skills and experience you bring to the Ardent team. And we provide a rewarding environment to help you succeed.

We offer highly competitive benefits, professional development opportunities, and an exceptional culture that embraces flexibility, innovation, collaboration, and career growth. A collective service mindset underpins our work, and a shared camaraderie to serve clients, colleagues and our communities set us apart. Our full commitment to being "All In" for our employees and our clients is not just our approach, it is our standard. If this sounds like the perfect fit for you, choose Ardent and make a difference with us.

Ardent is seeking a Reliability Engineer to join our team.

This is an onsite role in Ashburn, VA.

Position Description :

We are seeking a skilled Reliability Engineer to support our client's mission by enhancing Production Monitoring and ensuring optimal service delivery for their applications. This role involves proactive issue identification, incident resolution, and system health optimization within a 24x7x365 operational environment. The ideal candidate will lead monitoring solutions, manage ITIL engineers, automate processes, and collaborate across IT and business teams to improve service reliability. Expertise in AWS environments, root cause analysis, and technical troubleshooting is essential, along with strong communication and leadership skills to drive continuous improvement.

Requirements :

  • Experience in Production Monitoring & Support within a 24x7x365 operational environment.
  • Strong expertise in incident management, root cause analysis, and problem resolution for cloud-based applications.
  • Hands-on experience with Amazon Web Services (AWS) and cloud-based monitoring tools.
  • Proficiency in ITIL processes and managing ITIL engineers for efficient service delivery.
  • Ability to build and implement monitoring solutions, automate manual processes, and create alerts to ensure system stability.
  • Experience with system health monitoring, performance optimization, and troubleshooting production issues.
  • Strong leadership skills to collaborate with IT, business, and infrastructure teams to improve production support processes.
  • Effective communication skills to provide updates, incident reports, and status updates to leadership and stakeholders.
  • Ability to develop and maintain technical documentation and knowledge base resources for production support.
  • Experience in triaging and resolving production incidents, assessing severity, and properly escalating issues.

Responsibilities and Duties :

  • Proactive and early notification of potential and actual issues impacting service delivery.
  • Frequent and succinct communication to PSPD leadership during and post incident.
  • Identification of trends and corrective measures.
  • Provide needed metrics to PSPD leadership team.
  • The enhanced Production Monitoring Services Branch will provide resources to staff the operation 24x7x365. The resources should provide additional technical support and diagnosis.
  • Customer Facing :

  • Build monitoring and production support solutions to provide customer with visibility towards our services.
  • Manage ITIL engineers.
  • Triage and resolve production incidents related to the cloud platform and participate in root cause analysis and postmortem discussions.
  • Function as a solution manager in support of the Manager, Production Support by leading the implementation of short-term and long-term solutions, automating manual processes, and building alerts to monitor the operation of services.
  • Asses initial severity, gather impacts, create tickets, engage support teams, and escalate issues properly as they arrive.
  • Optimizes Work Processes :

  • Participate in the creation and maintenance of technical and knowledge base documentation.
  • Troubleshoot production issues problems and collaborate in developing simple technical solutions.
  • Use diagnostic tools to maintain, troubleshoot and restore standard service or data to systems.
  • Lead Implementation of production support activities in an Amazon Web Services environment.
  • Lead technical and design discussions with IT to help enterprises speed their adoption of new technologies and practices.
  • Perform System health monitoring and optimizing performance
  • Define and establish monitoring and other processes and tooling for monitoring and performing routine system health checks to ensure optimization and stability of application.
  • Collaborates :

  • Work as a technical leader alongside business, development, and infrastructure teams.
  • Effectively work with IT and business teams, as well as external customers, to lead the resolution of production incidents and provide communication during outage.
  • Collaborate with other members of IT and business in streamlining production support processes.
  • Work closely with other teams and recommend solutions to improve production support current processes that reflect business needs, security, and SLAs of our production services.
  • Work closely with Infrastructure team and other support staff to identify and resolve incidents and create and implement long term remediation techniques and fixes.
  • Provide support and coach other members of the Production Support team.
  • Communicates Effectively :

  • Communicate clearly and effectively across IT, business process owners, and customers at all levels of the organization.
  • Communicate progress and any challenges to management.
  • Communicate overall status and health of the application to business and application support teams.
  • Active CBP / BI or Top Secret clearance is highly desired. Must be open to working 2nd or 3rd shift in a 24 / 7 / 365 environment.

    Due to the nature of the work we support, all candidates in consideration for this role must be U.S. Citizens willing to undergo the government issued background investigation process.

    Ardent is an equal opportunity employer. We will not discriminate and will take affirmative action measures to ensure against discrimination in employment, recruitment, advertisements for employment, compensation, termination, upgrading, promotions, and other conditions of employment against any employee or job applicant on the bases of race, color, gender, national origin, age, religion, creed, disability, veteran's status, sexual orientation, gender identity or gender expression

    If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
    Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

    What is the career path for a Reliability Engineer?

    Sign up to receive alerts about other jobs on the Reliability Engineer career path by checking the boxes next to the positions that interest you.
    Income Estimation: 
    $92,877 - $110,401
    Income Estimation: 
    $120,933 - $155,034
    Income Estimation: 
    $114,618 - $136,401
    Income Estimation: 
    $92,877 - $110,401
    Income Estimation: 
    $120,933 - $155,034
    Income Estimation: 
    $114,618 - $136,401
    Income Estimation: 
    $76,670 - $90,826
    Income Estimation: 
    $91,609 - $118,978
    Income Estimation: 
    $92,877 - $110,401
    Income Estimation: 
    $154,184 - $199,940
    Income Estimation: 
    $189,563 - $242,917
    Income Estimation: 
    $114,618 - $136,401
    Income Estimation: 
    $144,264 - $191,312
    Income Estimation: 
    $140,435 - $166,410
    View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

    Job openings at Ardent Services

    Ardent Services
    Hired Organization Address Lehi, UT Full Time
    Job Title : Accounting Clerk Location : Lehi, UT Position Overview : As an Accounting Clerk , you will focus on Accounts...
    Ardent Services
    Hired Organization Address Houston, TX Full Time
    About Us : Ardent Services, LLC was formed in 2002 to provide professional electrical and instrumentation services. Arde...
    Ardent Services
    Hired Organization Address Covington, LA Full Time
    NOTICE: This position is for PRINCIPALS ONLY. *** Ardent Services LLC is hiring a Human Resources Manager. Are you ready...

    Not the job you're looking for? Here are some other Reliability Engineer jobs in the Virginia, VA area that may be a better fit.

    Reliability Engineer

    Doherty | The Employment Experts, Virginia, VA

    Site Reliability Engineer

    Jobot, Virginia, VA

    AI Assistant is available now!

    Feel free to start your new journey!