What are the responsibilities and job description for the Sr. SRE w IAM & Cloud Consultant at Whippany, NJ Onsite position at Sysmind, LLC?
Job Details
Position: Sr. SRE w IAM & Cloud Consultant
Location: Whippany, NJ Onsite
Position Type: Long Term Contract Position
Key Responsibilities:
- Designing, implementing, deploying and running highly available, fault-tolerant, auto-scaling and auto-healing systems
- Deep expertise in AWS, Azure, and Google Cloud Platform, including Kubernetes (EKS, ECS, Fargate, GKE) and server less architectures
- Implementing advanced monitoring (Prometheus, Grafana, Datadog, ELK), tracing, logging and automated alerting solutions.
- Scaling distributed systems, optimising compute/storage efficiency, and cost management.
- Designing failure simulations to improve system robustness and incident response.
- Expert in AWS CLI, CloudFormation, Ansible, Helm, and GitOps for automated infrastructure provisioning.
- Driving reliability best practices across engineering teams, embedding SRE principles into the DevSecOps lifecycle.
- Partnering with engineering, security, and product teams to balance reliability and feature velocity.
- Expertise in CIAM, ForgeRock stack (PingGateway, PingAM, PingIDM, PingDS) with certification or proof of completion of ForgeRock Deep-Dive 400 trainings.
- Building and mentoring high-performing SRE teams, fostering a culture of automation and innovation.
- Defining and enforcing reliability metrics to balance innovation with system stability.
- Optimizing deployment pipelines for high-frequency, zero-downtime releases.
- Leveraging machine learning for anomaly detection, predictive scaling, and automated remediation
Required Skills:
- 5 years' experience in hands-on configuration, deployment and running ForgeRock COTS based IAM solutions (PingGateway, PingAM, PingIDM, PingDS) with automated GitOps CI/CD pipelines using GitLab.
- Design and hands-on implementation of GitOps CI/CD pipelines, automated failover, data backup and restore solutions
- Automating telemetry, dashboards.
- 10 years' experience in Running Disaster Recovery, zero downtime deployment solutions.
- Designing and implementing continuous delivery.
- Hands-on coding in Python, Bash and JSON/Yaml (CaC).
- Supporting large-scale, distributed, cloud-based micro service and API service solutions with 99.9% uptime.
SYSMIND LLC is an Equal Employment Opportunity employer. All qualified applicants will receive consideration for employment without any discrimination. We promote and support a diverse workforce at all levels in the company. All job offers are contingent upon completion of a satisfactory background check and reference checks. Additionally passing the drug test may also be required. All contractors intending to work on SYSMIND's W2 are "at will" employees.