What are the responsibilities and job description for the Site Reliability Engineer position at VDart?
Position : Site Reliability Engineer
Location : Mount Laurel, NJ - 2 to 3 days onsite
Duration : 6-month contract to hire
Job Description :
Essential Job Functions
Set up monitoring and alerting systems to detect issues, respond to incidents, and minimize downtime.
Analyze system performance and make improvements to ensure high availability, low latency, and efficiency.
Develop automation tools for routine tasks like deployment, scaling, and testing to improve reliability and reduce human error.
Forecast resource needs and plan for future growth, ensure the system can handle increased demand.
Work with development and operations teams to identify and resolve issues, as well as implement fixes to prevent recurrence.
Create and maintain clear documentation on system architecture, processes, and incident management
Perform miscellaneous duties as assigned by management.
Qualifications
Must have proficiency with Windows systems, Linux / Unix systems, cloud platforms (AWS, Azure, GCP) and CI / CD pipelines.
Must have strong skills in scripting (PowerShell, Python, Bash, Perl, etc.) and experience with Infrastructure as Code (IaC) (Terraform, ARM, Bicep, JSON).
Must have strong skills in network protocols, DNS, load balancing, and security principles
Must have proficiency in enterprise IAM solutions (Microsoft Entra ID, Active Directory Domain Services) including Single Sign-On Federation using SAML, OIDC / OAuth 2.0
Must have hands-on experience deploying monitoring, alerting and reporting solutions for business critical infrastructure and applications utilizing tools like Prometheus, Grafana, PowerBI, Azure Monitor (Log Analytics, EventHub), Nagios. Experience with Git and other version control systems.
Certifications in cloud platforms (AWS Certified SysOps Administrator, Azure Administrator Associate, Azure DevOps Engineer Expert, Google Professional Cloud DevOps Engineer)
Must have strong analytical and product management skills, including a thorough understanding of how to interpret customer business needs and translate them into application and operational requirements
Must be able to maintain confidentiality.
Must demonstrate exceptional communication skills by conveying necessary information accurately, listening effectively and asking questions where clarification is needed.
Ability to analyze problems involving multiple interrelated causes. Where necessary, gathers information and applies complex concepts or methods to generate an effective solution.
Must have knowledge of computer software and technical troubleshooting skills.
Must possess the ability to manage and direct change, delays, or unexpected events appropriately.
Ability to follow all company policies and procedures in effect at time of hire and as they may change or be added from time to time.
Keep a pulse on the job market with advanced job matching technology.
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution.
Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right.
Surveys & Data Sets
What is the career path for a Site Reliability Engineer?
Sign up to receive alerts about other jobs on the Site Reliability Engineer career path by checking the boxes next to the positions that interest you.