What are the responsibilities and job description for the SRE Engineer position at Darwin Resources?
Location: Seattle, WA / Kansas City, KS (Hybrid)
Duration: Long Term
Job Description
We are seeking a highly skilled Site Reliability Engineer (SRE) with a strong software engineering background to join our team. This role is central to building and operating reliable, scalable, and performant systems while applying software development practices to infrastructure and operations. As an SRE, you will collaborate closely with engineering teams to design and develop solutions that improve reliability, accelerate delivery, and reduce manual effort.
Role & Responsibility
Design and develop software solutions to improve system reliability, scalability, and maintainability.
Write and maintain high-quality, testable code in languages like Python, Go, Java, or similar.
Build self-healing systems and automation tools to minimize human intervention.
Collaborate with software engineers to design robust, scalable systems with reliability in mind.
Ensure systems meet SLOs and SLAs through thoughtful architecture and automation.
Participate in capacity planning and system design reviews.
Create and maintain tools to automate manual processes, streamline deployments, and enhance monitoring.
Continuously innovate by implementing software engineering practices into operational tasks.
Debug and resolve complex production issues using software debugging techniques.
Conduct postmortems and implement long-term fixes for recurring issues.
Develop robust observability solutions, including logging, monitoring, and alerting.
Build dashboards and reporting tools to provide actionable insights into system performance and customer-centric KPIs.
Work closely with development teams to incorporate reliability best practices into the software lifecycle.
Advocate for engineering solutions that enhance both product features and system reliability.
Required Qualifications
Bachelor's degree in Computer Science, Software Engineering, or a related field, or equivalent practical experience.
10 years of professional experience as a software engineer or SRE with a focus on software development.
Strong programming skills in one or more languages, such as Python, Go, Java, C , or similar.
Experience designing and implementing distributed systems and high-performance applications.
Proficiency with infrastructure-as-code tools (e.g., Terraform, Ansible) and cloud platforms (AWS, GCP, or Azure).
Expertise in containerization and orchestration tools (e.g., Docker, Kubernetes).
Nice to have
Gen AI Scripting experience
Should have experience using these platforms -- --
We want SREs to write code to automate mundane repetitive tasks as opposed to just writing alerts and monitors
Duration: Long Term
Job Description
We are seeking a highly skilled Site Reliability Engineer (SRE) with a strong software engineering background to join our team. This role is central to building and operating reliable, scalable, and performant systems while applying software development practices to infrastructure and operations. As an SRE, you will collaborate closely with engineering teams to design and develop solutions that improve reliability, accelerate delivery, and reduce manual effort.
Role & Responsibility
Design and develop software solutions to improve system reliability, scalability, and maintainability.
Write and maintain high-quality, testable code in languages like Python, Go, Java, or similar.
Build self-healing systems and automation tools to minimize human intervention.
Collaborate with software engineers to design robust, scalable systems with reliability in mind.
Ensure systems meet SLOs and SLAs through thoughtful architecture and automation.
Participate in capacity planning and system design reviews.
Create and maintain tools to automate manual processes, streamline deployments, and enhance monitoring.
Continuously innovate by implementing software engineering practices into operational tasks.
Debug and resolve complex production issues using software debugging techniques.
Conduct postmortems and implement long-term fixes for recurring issues.
Develop robust observability solutions, including logging, monitoring, and alerting.
Build dashboards and reporting tools to provide actionable insights into system performance and customer-centric KPIs.
Work closely with development teams to incorporate reliability best practices into the software lifecycle.
Advocate for engineering solutions that enhance both product features and system reliability.
Required Qualifications
Bachelor's degree in Computer Science, Software Engineering, or a related field, or equivalent practical experience.
10 years of professional experience as a software engineer or SRE with a focus on software development.
Strong programming skills in one or more languages, such as Python, Go, Java, C , or similar.
Experience designing and implementing distributed systems and high-performance applications.
Proficiency with infrastructure-as-code tools (e.g., Terraform, Ansible) and cloud platforms (AWS, GCP, or Azure).
Expertise in containerization and orchestration tools (e.g., Docker, Kubernetes).
Nice to have
Gen AI Scripting experience
Should have experience using these platforms -- --
We want SREs to write code to automate mundane repetitive tasks as opposed to just writing alerts and monitors