What are the responsibilities and job description for the Site Reliability Engineer position at Forward Progress Staffing?
Job Details
No corp-to-corp or sponsorships please!
Site Reliability Engineer (SRE) - Observability Specialist
Las Vegas, NV - hybrid onsite 3 days a week
6 month contract possible extension (W2)
Responsibilities:
• Observability Solutions: Design and integrate tools for monitoring, logging, and tracing (e.g., Prometheus, Grafana, Elasticsearch, Datadog).
• Monitoring & Alerting: Define KPIs, SLOs, and SLIs; implement actionable alerts to ensure reliability.
• System Reliability: Analyze observability metrics to identify risks and collaborate on mitigations.
• Collaboration: Partner with teams to embed observability into the software lifecycle and advocate best practices.
• Automation: Streamline observability processes like dashboard creation and log parsing.
• Documentation: Maintain documentation for observability tools and processes, ensuring visibility for stakeholders.
Requirements:
• Bachelor’s degree in Computer Science, Engineering, or related experience
• Experience with observability platforms (Prometheus, Grafana, Splunk, OpenTelemetry).
• Proficiency in programming/scripting languages (Python, Go, Bash).
• Knowledge of distributed systems, cloud platforms (AWS, Azure, Google Cloud Platform), and containerization (Kubernetes, Docker).
• Familiarity with KPIs, SLOs, and SLIs for monitoring and reporting.
Preferred:
• Certifications in observability tools or cloud platforms and experience with Infrastructure as Code (e.g., Ansible, Terraform).