Demo

Site Reliability Engineer

F2Onsite
Charlotte, NC Contractor
POSTED ON 1/31/2025
AVAILABLE BEFORE 3/2/2025
Remote Site Reliability Engineer

Job Duties 2. Traffic Management Responsibilities 3. Infrastructure Management 4. Vendor Support/Escalation 5. Change Management 6. Runbook Management and Updates 7. Incident Management 8. In-House SRE Projects 9. Knowledge Sharing and Mentorship 10. Weekly, Monthly and Yearly Reports

  • Monitoring and Alert Response
  • Alert Monitoring:
    • Continuously monitor PagerDuty alerts, perform initial triage, and escalate major issues using predefined SRE runbooks and SOPs.
    • Monitor and respond to alerts triggered over Slack and email.
    • Acknowledge partner or vendor maintenance alerts and plan accordingly.
  • System Health Checks:
    • Actively track system health using tools like Nagios, Pingdom, Grafana, Prometheus, QAMC , Splunk.
  • Ticket Management:
  • Track new SRE tickets, resolve existing ones, and follow up to ensure closure.
  • Assisting Cross-Functional Teams During Maintenance:
    • Collaborate with various teams(ProdOps) to safely redirect traffic away from the data center during maintenance activities, ensuring uninterrupted service.
  • Mitigating Production Issues:
    • In the event of production problems, promptly reroute traffic from the affected data center to maintain service continuity.
  • Facilitating Scheduled Hardware Maintenance:
  • Proactively manage traffic flow adjustments to accommodate planned hardware maintenance, minimizing potential disruptions.
  • On-Prem Server Management:
    • Manage and configure on-premises servers, including onboarding and configuring new hardware.
    • Onboarding to Monitoring tools and setting up alerts
  • Collaborative Infrastructure Tasks:
  • Work closely with Network Engineering for major hardware/infrastructure changes/Upgrades
  • Coordinate with Dev/DevOps and Platform team during on-premises to Cloud migrations.
  • Troubleshoot deployment issue during CI/CD deployment process.
  • Coordinate with vendors like:
  • Dell: For hardware-related issues.
  • ISP: (CenturyLink, Lumen, Zayo, Level3) For datacenter internet-related issues.
  • VMware: For on-premises virtualization support.
  • GCP: For Google Cloud support.
  • Change Monitoring:
    • Review all production change tickets to ensure proper procedures are followed.
    • Prevent unauthorized production changes.
  • Critical Change Support:
    • Work closely with cross-functional teams NetEng, DevOps, Dev , Platform teams during production changes.
    • Help execute critical changes during maintenance windows, ensuring minimal disruption.
    • Monitor and validate the impact of changes post-deployment.
    • Maintain and improve existing SRE runbooks by adding new troubleshooting steps and solutions.
    • Ensure SRE tasks have clear and detailed documentation for consistency.
    • Incident Response:
      • Act as the first responder during outages and provide updates in the Incident Management Slack channel.
      • Offer timely updates to stakeholders during ongoing incidents.
    • Incident Documentation:
      • Record incident details, actions taken, and outcomes in SRE incident Tracking Tickets.
    • Incident Resolution and RCA:
      • Conduct root cause analysis (RCA) for incidents and document findings.
      • Lead incident bridge calls and coordinate with stakeholders for resolution.
    • Post-Incident Management:
      • Conduct retrospectives/postmortems to evaluate incident handling and identify areas for improvement.
      • Document incident timelines, resolution steps, and follow-up actions.
      • Ensure completion of action items to prevent recurrence.
      • Participate in weekly project sync-ups to plan and execute initiatives for system scalability and reliability.
      • Optimize existing tools and applications.
      • Conduct POCs and onboard new tools to enhance capabilities.
      • Automate repetitive tasks to improve efficiency and reduce manual efforts.
      • Meghana is currently contributing to an in-house SRE project focused on developing a Slack bot using Python to collect data from Google Cloud Platform (GCP).
      • Create and update detailed SOPs, runbooks, and troubleshooting guides.
      • Train and mentor New SRE member to enhance their technical skills.
      • Share insights and lessons during team meetings and knowledge-sharing sessions.
      • Meghana consistently ensures the timely delivery of monthly, yearly, and postmortem reports, demonstrating her commitment to transparency and continuous improvement on Incident management.

    If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
    Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

    What is the career path for a Site Reliability Engineer?

    Sign up to receive alerts about other jobs on the Site Reliability Engineer career path by checking the boxes next to the positions that interest you.
    Income Estimation: 
    $92,877 - $110,401
    Income Estimation: 
    $120,933 - $155,034
    Income Estimation: 
    $114,618 - $136,401
    Income Estimation: 
    $152,958 - $200,151
    Income Estimation: 
    $186,685 - $265,377
    Income Estimation: 
    $71,440 - $92,105
    Income Estimation: 
    $87,466 - $114,731
    Income Estimation: 
    $115,647 - $153,495
    Income Estimation: 
    $87,466 - $114,731
    Income Estimation: 
    $114,790 - $146,930
    Income Estimation: 
    $115,647 - $153,495
    Income Estimation: 
    $114,790 - $146,930
    Income Estimation: 
    $142,618 - $183,267
    Income Estimation: 
    $115,647 - $153,495
    View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

    Job openings at F2Onsite

    F2Onsite
    Hired Organization Address Little Rock, AR Full Time
    Job Description Desktop Support Technician Job Summary Desktop Support Technician is an intermediate position that perfo...
    F2Onsite
    Hired Organization Address Little Rock, AR Full Time
    Technical Project Manager/IT Coordinator Duration: 4-6 months with possible extension. Start - ASAP Must be ok with loca...
    F2Onsite
    Hired Organization Address Kansas, MO Full Time
    Job Description Desktop Support Technician/Customer Service Engineer Contract to Hire Opportunity Monday-Friday - 40 hou...
    F2Onsite
    Hired Organization Address Little Rock, AR Full Time
    Technical Project Manager/Coordinator -Little Rock, AR Project Timeline - Start ASAP - project projected to run 4-6 mont...

    Not the job you're looking for? Here are some other Site Reliability Engineer jobs in the Charlotte, NC area that may be a better fit.

    Site Reliability Engineer

    Motion Recruitment, Charlotte, NC

    Junior Site Reliability Engineer

    Brooksource, Charlotte, NC

    AI Assistant is available now!

    Feel free to start your new journey!