What are the responsibilities and job description for the Site Reliability Engineer position at EVONA?
Site Reliability Engineer (SRE)
About the Role
As a Site Reliability Engineer (SRE), you will be responsible for deploying, maintaining, and rapidly resolving issues for customers and internal users of a SaaS product. This role involves supporting nominal releases and fixes, leading on-call support and incident remediation, and contributing to software and infrastructure code development alongside engineering teams.
You will collaborate daily with software development, testing, and leadership teams throughout project lifecycles to shape deployment and maintenance strategies, minimize architectural disconnects, and implement monitoring and maintenance systems. Your expertise will span software engineering, platform engineering, systems engineering, and software testing.
Additionally, you will engage directly with customers and internal product teams to understand pain points and drive improvements in both current and future development. This position is part of an engineering team working within an agile product development environment.
Key Responsibilities
- Deploy, monitor, and maintain cloud-based SaaS infrastructure
- Lead incident response efforts, ensuring rapid resolution of critical issues
- Collaborate with development teams to optimize deployments and reliability strategies
- Build and maintain monitoring, alerting, and logging systems
- Work closely with customers and internal stakeholders to address operational pain points
- Contribute to the development of automation, CI/CD pipelines, and infrastructure as code
Required Technical Skills
- Programming: Proficiency in Python and experience with distributed microservices
- Cloud Expertise: Strong experience with AWS services and troubleshooting cloud-native deployments
- Containerization & Orchestration: Hands-on experience with Kubernetes, containerized applications, and serverless architectures
- Monitoring & Logging: Familiarity with tools such as DataDog, Splunk, and AWS CloudWatch
- Infrastructure as Code: Knowledge of IaC tools and best practices
- API & Databases: Experience developing and troubleshooting API services, distributed NoSQL and relational databases, caching systems, and event-driven architectures
- CI/CD & Automation: Strong background in building CI/CD pipelines (preferably with GitHub Actions) and task automation
- Linux & Version Control: Understanding of Unix/Linux systems and experience with Git, with a strong focus on Git hygiene and release management
Required Soft Skills
- Strong troubleshooting and analytical skills for diagnosing and resolving incidents
- Proactive and results-driven mindset with a sense of urgency in incident management
- Self-starter who can work independently and deliver projects with minimal supervision
- Excellent communication skills to collaborate across development teams, leadership, and customers
Background & Qualifications
- Bachelor’s degree in Computer Science (or related field)
- 5 years of professional experience in software development, platform engineering, or reliability engineering
- US Citizenship required; ability to obtain a security clearance
- Compliance with U.S. Government ITAR regulations
Why Join?
- Remote-first workplace with hybrid work options in Seattle and Denver
- Work-life balance with required time off (minimum 15 days) and unlimited PTO
- Health & wellness benefits, including mental health support and comprehensive health insurance (100% covered for employees)
- Retirement benefits with 4% 401(k) matching
- Quarterly company offsites to exciting locations across the U.S.
- Impactful work shaping the future of space technology and SaaS reliability
Interview Process
- Screening Call (30 min) – Discuss experience, role expectations, and alignment (75% behavioral, 25% technical).
- Technical Interview (45 min) – Focused on cloud-native deployments and high-availability systems.
- Coding/Design Challenge (Offline) – Short take-home task to assess problem-solving and design skills.
- Final Technical Interview (45 min) – Focus on real-world troubleshooting and working alongside development teams.
- Reference Check – Two professional references from the past five years.
- Offer Stage – Verbal and formal offer within 24 hours of final interview.
About the Company
This company is revolutionizing spacecraft operations by modernizing satellite management and enabling broader access to space technology. The team includes experts from top aerospace and tech firms, bringing deep experience in satellite operations, broadband constellations, and cloud computing.
If you are passionate about reliability engineering and working on cutting-edge technology in a fast-paced environment, this is the role for you.
Salary : $180,000