What are the responsibilities and job description for the Senior Site Reliability Engineer position at New York Technology Partners?
Job Title: Site Reliability Engineer
Location: Whippany, NJ
Position Type: Long term contract
Job Description:
- Designing, implementing, deploying and running highly available, fault-tolerant, auto-scaling and auto-healing systems
- Deep expertise in AWS, Azure, and GCP, including Kubernetes (EKS, ECS, Fargate, GKE) and server less architectures
- Implementing advanced monitoring (Prometheus, Grafana, Datadog, ELK), tracing, logging and automated alerting solutions.
- Scaling distributed systems, optimising compute/storage efficiency, and cost management.
- Designing failure simulations to improve system robustness and incident response.
- Expert in AWS CLI, CloudFormation, Ansible, Helm, and GitOps for automated infrastructure provisioning.
- Driving reliability best practices across engineering teams, embedding SRE principles into the DevSecOps lifecycle.
- Partnering with engineering, security, and product teams to balance reliability and feature velocity.
- Expertise in CIAM, ForgeRock stack (PingGateway, PingAM, PingIDM, PingDS) with certification or proof of completion of ForgeRock Deep-Dive 400 trainings.
- Building and mentoring high-performing SRE teams, fostering a culture of automation and innovation.
- Defining and enforcing reliability metrics to balance innovation with system stability.
- Optimising deployment pipelines for high-frequency, zero-downtime releases.
- Leveraging machine learning for anomaly detection, predictive scaling, and automated remediation.
Salary : $70 - $75