What are the responsibilities and job description for the Staff Site Reliability Engineer position at Perchwell?

Who We Are

Perchwell is the modern data and workflow platform for real estate professionals and consumers. Based on the industry's foundational data, Perchwell builds a modern software suite to empower real estate professionals to do their best work, provide differentiated service to their clients, and grow their businesses.

Backed by Lux Capital, Founders Fund, and some of the country's leading Multiple Listing Services (MLSs), Perchwell builds next generation workflow software / data products for the multi-trillion dollar residential real estate industry. Perchwell is the first new entrant to come to market in decades and is currently scaling its best-in-class platform.

What We're Looking For :

As a Staff SRE you'll be a founding member of the SRE team and will own and help improve the technical foundations of Perchwell while exemplifying engineering rigor and excellence across our engineering culture, strategy, and execution. This is a hands-on role where you will be a thought leader within the engineering organization who will drive large strategic technical initiatives both within the SRE domain and across all other areas of engineering.

You have deep expertise in infrastructure and architectural design and have been responsible for large production systems. You've owned or have highly opinionated views on what good observability is and help other teams see the light. You are rigorous and meticulous in your work and help others understand and keep to high engineering standards. Performance engineering and incident response and management are two sides of the same coin for you and you are excited about helping foster a culture of continuous learning and improvement. You believe the best way to help others is to provide them the tools they need to get their job done in a safe and efficient manner and have a demonstrated history of success enabling product teams to quickly innovate and iterate.

You will work closely with the VP of Engineering and other senior leaders to tackle and remediate our current problem set while also building net new capabilities. You will build important relationships and partner deeply with our product and QA organizations. The SRE team will be responsible for building the ability to innovate faster in a safe and reliable way. Reliability, resiliency and adaptability will be our north stars. This role has an on-call requirement.

We believe that in an ever-changing, innovative environment, we do our best work when we are working as a team in-person. In this role, you'll work out of our New York City HQ in Soho Manhattan at least 3 days / week.

What You'll Work On :

Lead major core initiatives around performance, introducing an event driven architecture, and safely supporting services alongside a monolith
Design and build scalable processes and solutions to fundamental engineering challenges
Design and manage scalable, secure AWS infrastructure
Partner with the Quality Team to own the CI / CD processes and enable fast, safe and frequent deployments
Own our Kubernetes infrastructure and strategy
Build and manage safe self-service methods for our teams to manage their infra via Terraform and other automation tools
Be a champion of observability ( o11y ) by owning our o11y systems and establishing and enforcing best practices throughout the engineering organization around performance and service monitoring
Lead incident management and disaster recovery processes
Be a thought leader and help foster a culture of ownership by mentoring teams on SRE principles and practices around resiliency, reliability, availability, and complex distributed systems
Partner with FinOps and feature teams to manage infrastructure spending by identifying and optimizing costs saving strategies

Required For the Role :

BS or MS in Computer Science, related technical field, or equivalent experience

Distributed systems experience

Deep experience with AWS cloud services such as : EC2, RDS, EKS, CloudFront, ECR, S3, IAM, CodeBuild, Lambda, and Route53

In-depth knowledge of Kubernetes, including experience with deploying, managing, scaling, and orchestrating clusters and automating / exposing this to other teams

You've built tools and or enabled automation for your team or others

You've used at least one programming language ( ex : python, golang, rust ) to solve problems and close feature gaps of your tools

Demonstrated pattern of systems thinking leading to organizational impact and strategic problem solving

The ability to go deep across service, database and infrastructure boundaries for design and performance related work

You've owned or influenced observability from implementation to best practices

5 years of experience in a dedicated SRE role

Bonus Points for Any of the following :

AWS Well Architected Framework experience

Experience with Event Driven System designs and the required messaging technologies and patterns ( kafka, nats, aeron )

Experience with security frameworks and tools for hardening environments and systems

Experience with different types of databases and their architectures

Experience with the Ruby on Rails ecosystem, and understanding of Rails-specific quirks

Experience with the management and scaling of Elasticsearch

Note : At this time, we are only considering candidates who are authorized to work in the U.S.

Apply for this job

Receive alerts for other Staff Site Reliability Engineer job openings

Staff Site Reliability Engineer

What are the responsibilities and job description for the Staff Site Reliability Engineer position at Perchwell?

What is the career path for a Staff Site Reliability Engineer?

Job openings at Perchwell

Not the job you're looking for? Here are some other Staff Site Reliability Engineer jobs in the New York, NY area that may be a better fit.

We don't have any other Staff Site Reliability Engineer jobs in the New York, NY area right now.

AI Assistant is available now!