What are the responsibilities and job description for the Staff Site Reliability Engineer position at Perchwell?
Who We Are
Perchwell is the modern data and workflow platform for real estate professionals and consumers. Based on the industry's foundational data, Perchwell builds a modern software suite to empower real estate professionals to do their best work, provide differentiated service to their clients, and grow their businesses.
Backed by Lux Capital, Founders Fund, and some of the country's leading Multiple Listing Services (MLSs), Perchwell builds next generation workflow software / data products for the multi-trillion dollar residential real estate industry. Perchwell is the first new entrant to come to market in decades and is currently scaling its best-in-class platform.
What We're Looking For :
As a Staff SRE you'll be a founding member of the SRE team and will own and help improve the technical foundations of Perchwell while exemplifying engineering rigor and excellence across our engineering culture, strategy, and execution. This is a hands-on role where you will be a thought leader within the engineering organization who will drive large strategic technical initiatives both within the SRE domain and across all other areas of engineering.
You have deep expertise in infrastructure and architectural design and have been responsible for large production systems. You've owned or have highly opinionated views on what good observability is and help other teams see the light. You are rigorous and meticulous in your work and help others understand and keep to high engineering standards. Performance engineering and incident response and management are two sides of the same coin for you and you are excited about helping foster a culture of continuous learning and improvement. You believe the best way to help others is to provide them the tools they need to get their job done in a safe and efficient manner and have a demonstrated history of success enabling product teams to quickly innovate and iterate.
You will work closely with the VP of Engineering and other senior leaders to tackle and remediate our current problem set while also building net new capabilities. You will build important relationships and partner deeply with our product and QA organizations. The SRE team will be responsible for building the ability to innovate faster in a safe and reliable way. Reliability, resiliency and adaptability will be our north stars. This role has an on-call requirement.
We believe that in an ever-changing, innovative environment, we do our best work when we are working as a team in-person. In this role, you'll work out of our New York City HQ in Soho Manhattan at least 3 days / week.
What You'll Work On :
- Lead major core initiatives around performance, introducing an event driven architecture, and safely supporting services alongside a monolith
- Design and build scalable processes and solutions to fundamental engineering challenges
- Design and manage scalable, secure AWS infrastructure
- Partner with the Quality Team to own the CI / CD processes and enable fast, safe and frequent deployments
- Own our Kubernetes infrastructure and strategy
- Build and manage safe self-service methods for our teams to manage their infra via Terraform and other automation tools
- Be a champion of observability ( o11y ) by owning our o11y systems and establishing and enforcing best practices throughout the engineering organization around performance and service monitoring
- Lead incident management and disaster recovery processes
- Be a thought leader and help foster a culture of ownership by mentoring teams on SRE principles and practices around resiliency, reliability, availability, and complex distributed systems
- Partner with FinOps and feature teams to manage infrastructure spending by identifying and optimizing costs saving strategies
Required For the Role :
Bonus Points for Any of the following :
Note : At this time, we are only considering candidates who are authorized to work in the U.S.