What are the responsibilities and job description for the SRE Lead position at The Wolf Works LLC?
Position Overview:
We are seeking a highly skilled Site Reliability Engineer (SRE) Lead Engineer to drive transformational initiatives within IT operations and development. This role requires a technical leader passionate about designing and implementing reliable, scalable, and high-performing systems with a strong focus on operational excellence.
Key Responsibilities:System Design and Architecture
- Design and architect reliable, scalable systems and services with a focus on operational excellence, availability, and performance.
Observability and Monitoring
- Utilize telemetry and observability tools, including Dynatrace APM, SolarWinds, Prometheus, Grafana, Kibana, Splunk, and other AIOPS tools.
- Develop and maintain correlation mechanisms and dashboards for comprehensive visibility across internal and external application requests.
Authentication Mechanisms
- Implement and manage login authentication technologies like Ping, ForgeRock, and SiteMinder, including session and cookie management.
SRE Practices and Evangelism
- Promote SRE principles, including incident management, monitoring, alerting, and automation.
- Collaborate with development teams to ensure operational reliability and resilience.
- Define and implement best practices for SRE within the organization.
Automation and Operational Excellence
- Automate operational tasks to streamline workflows and improve efficiency.
- Optimize resource utilization and lead capacity planning initiatives.
Incident and Problem Management
- Drive incident response processes, perform root cause analyses, and implement preventive measures to improve system reliability.
Security and Compliance
- Align SRE practices with security and compliance requirements.
- Implement measures to safeguard systems and data integrity.
Team Leadership and Collaboration
- Mentor SRE teams and foster skill development.
- Build strong relationships with operational teams to drive organizational improvements.
Continuous Learning
- Stay updated with industry trends, technologies, and advancements in SRE to enhance organizational capabilities.
Qualifications:Experience
- 10–12 years of hands-on experience in SRE, cloud technologies, development, observability tools, and automation.
Technical Expertise
- Observability Tools: Dynatrace, SolarWinds, Prometheus, Grafana, Kibana, Splunk, and AIOPS tools.
- Authentication Technologies: Ping, ForgeRock, SiteMinder.
- Cloud Platforms: AWS (Control Tower, Project Setup, Creating Accounts, RDS, SSO).
- Containerization: Docker, Kubernetes.
- Automation: GitLab CI/CD, Terraform, Ansible, or equivalent scripting tools.
- Programming Languages: Groovy-DSL, Java, Python, YAML, and microservices architecture.
- Messaging Systems: MQ, Kafka.
- Databases: Oracle, MySQL.
Additional Skills
- Implementation of observability frameworks with programmatic SLI/SLO blueprints.
- Proficient in Linux commands and systems.
- Hands-on experience with APM tools like Datadog, AppDynamics, or Dynatrace.
Job Type: Full-time
Pay: $130,000.00 - $140,000.00 per year
Benefits:
- 401(k)
- Dental insurance
- Health insurance
- Paid time off
- Vision insurance
Application Question(s):
- Are you proficient in observability tools such as Dynatrace, SolarWinds, Prometheus, Grafana, Kibana, Splunk, or similar AIOPS tools? Please specify which tools you have used and your role in implementing them.
- Do you have experience with login authentication technologies like Ping, ForgeRock, and SiteMinder? Can you describe your expertise in session and cookie management with these tools?
- Are you experienced with cloud platforms such as AWS (Control Tower, Project Setup, Creating Accounts, RDS, SSO)? Have you worked with containerization technologies like Docker and Kubernetes? Provide specific examples of your work.
- Do you have experience with automation tools such as GitLab CI/CD, Terraform, Ansible, or equivalent scripting tools? Can you share an example of how you automated a complex workflow?
- Are you proficient in programming languages such as Groovy-DSL, Java, Python, YAML, and microservices architecture? Please provide examples of projects where you used these skills.
- Do you require any type of work sponsorship to work in the United Stated?
- Please share your LinkedIn profile. (Mandatory Requirement)
Experience:
- SRE Lead: 10 years (Required)
Work Location: Hybrid remote in Fort Mill, SC 29707
Salary : $130,000 - $140,000