What are the responsibilities and job description for the Site Reliability Engineer (JOB ID 1207) position at OneSparq?
Job Details
OneSparQ is looking for a Site Reliability Engineer for a company located in Alpharetta, GA.
Responsibilities:
- Implement, maintain, and improve monitoring and alerting solutions to ensure system reliability and performance.
- Collect, process, and analyze logs (application, system, audit) and system metrics (CPU, memory usage).
- Design and build robust data pipelines to streamline the flow of logs and metrics into storage containers in AWS.
- Create dashboards and visualizations using tools like Grafana or Datadog to provide real-time and historical insights into system performance.
- Architect, deploy, and maintain systems on AWS, including Kubernetes clusters, Lambda functions, and associated services.
- Develop and maintain infrastructure using Terraform (or similar tools) and ensure consistency and scalability.
- Automate operational processes using tools like Ansible to improve efficiency and reduce manual intervention.
Required Skills:
- 5 years of technology experience.
- 2 years in a SRE role.
- In-depth knowledge of AWS architecture, including system design, infrastructure management, and storage solutions.
- Experience managing and optimizing Kubernetes clusters.
- Proficiency with Terraform or similar tools.
- Hands-on experience with Grafana, Datadog, or similar tools for monitoring and visualization.
- Working knowledge of Ansible or similar automation tools.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.