Demo

Site Reliability Engineer

Tbwa Chiat/Day Inc
San Francisco, CA Full Time
POSTED ON 4/1/2025
AVAILABLE BEFORE 4/17/2025

Nearly every company in the world runs on custom software : Gartner estimates that up to 50% of all code is written for internal use. This is the operational software for refunding orders, underwriting loans, onboarding employees, analyzing transactions, and providing customer support. But most companies don’t have adequate resources to properly invest in these tools, leading to a lot of old and clunky internal software or, even worse, users still stuck in manual and spreadsheet flows.

At Retool, we’re on a mission to bring good software to everyone. We’re building a new type of development platform that combines the benefits of traditional software development with a drag-and-drop UI editor and AI, making it dramatically faster to build internal tools. We believe that the future of software development lies in abstracting away the tedious and repetitive tasks developers waste time on, while creating reusable components that act as a force multiplier for future developers and projects. The result is not just productivity, but good software by default. And that’s a mission worth striving for.

Today, our customers span from small startups building their first operational tools to Fortune 500 companies building mission-critical apps for thousands of users across their business. Interested in joining us? Let us know!

WHY WE'RE LOOKING FOR YOU

As one of our first Site Reliability Engineers, you will be instrumental in defining and shaping the processes and practices for a pivotal new business offering. You will play a crucial role in ensuring the reliability, scalability, and performance of our services while collaborating closely with our product and GTM teams. This is a unique opportunity to significantly impact the direction and success of a key initiative within our company.

Reducing friction in deploying Retool is one of the largest levers for us to grow efficiently as a business. You’ll be figuring out how to productize a scalable deployment solution that is both effective and delightful for our customers. This role requires a blend of deep technical expertise in site reliability engineering and a keen product sense to create solutions that not only perform well but also provide an exceptional developer experience.

IN THIS ROLE YOU'LL

  • Infrastructure Management : Design, implement, and manage scalable and resilient infrastructure using AWS, Kubernetes, and Terraform.
  • Process Shaping : Define and implement processes and practices that will support our new business offering, ensuring they are robust, scalable, and aligned with industry best practices.
  • Automation : Automate deployment and maintenance tasks to improve efficiency and scalability of this offering.
  • Documentation & Knowledge Sharing : Create and maintain comprehensive documentation for systems, processes, and procedures. Mentor and guide other team members on best practices.
  • Monitoring & Alerting : Leverage existing observability systems to build new products that ensure the health and performance of our services.

THE SKILLSET YOU'LL BRING

  • Technical Expertise :
  • Strong experience with AWS and Kubernetes.

  • Proficiency in managing PostgreSQL databases.
  • Extensive experience with infrastructure as code (IaC) using Terraform.
  • Operational Experience :
  • Previous experience in a similar SRE or DevOps role, ideally within a SaaS environment.

  • Strong background in monitoring, logging, and alerting systems (e.g., Prometheus, Grafana, Datadog).
  • Programming Skills :
  • Proficiency in one or more programming languages (e.g., Python, Go, Java).

  • Problem-Solving Skills :
  • Excellent problem-solving skills and the ability to troubleshoot complex issues.

  • Strong interpersonal and communication skills, with the ability to work effectively in a team-oriented environment.
  • NICE TO HAVE

  • Experience with CI / CD pipelines and tools (e.g., Buildkite, GitLab CI).
  • Knowledge of security best practices and tools.
  • For candidates based in San Francisco, the pay range(s) for this role is listed below and represents base salary range for non-commissionable roles or on-target earnings (OTE) for commissionable roles. This salary range may be inclusive of several career levels at Retool and will be narrowed during the interview process based on a number of factors such as (but not limited to), scope and responsibilities, the candidate’s experience and qualifications, and location.

    Additional compensation in the form(s) of equity, and / or commission / bonuses are dependent on the position offered. Retool provides a comprehensive benefit plan, including medical, dental, vision, and 401(k). Pay and benefits are subject to change at any time, consistent with the terms of any applicable compensation or benefit plans.

    124,100 - $193,600 USD

    Retool offers generous benefits to all employees and hybrid work location. For more information, please visit the benefits and perks section of our careers page!

    Apply for this job

    Retool is currently set up to employ all roles in the US and specific roles in the UK. To find roles that can be employed in the UK, please refer to our careers page and review the indicated locations.

    J-18808-Ljbffr

    Salary : $124,100 - $193,600

    If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
    Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

    What is the career path for a Site Reliability Engineer?

    Sign up to receive alerts about other jobs on the Site Reliability Engineer career path by checking the boxes next to the positions that interest you.
    Income Estimation: 
    $92,877 - $110,401
    Income Estimation: 
    $120,933 - $155,034
    Income Estimation: 
    $114,618 - $136,401
    Income Estimation: 
    $92,369 - $122,605
    Income Estimation: 
    $117,024 - $149,811
    Income Estimation: 
    $137,568 - $176,908
    Income Estimation: 
    $158,960 - $205,707
    Income Estimation: 
    $71,493 - $96,419
    Income Estimation: 
    $92,369 - $122,605
    Income Estimation: 
    $117,024 - $149,811
    Income Estimation: 
    $137,568 - $176,908
    View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

    Job openings at Tbwa Chiat/Day Inc

    Tbwa Chiat/Day Inc
    Hired Organization Address Minneapolis, MN Full Time
    Homeward is rearchitecting the delivery of health and care in partnership with communities everywhere, starting in rural...
    Tbwa Chiat/Day Inc
    Hired Organization Address New York, NY Full Time
    Technical Program Manager, Infrastructure Whatnot Whatnot is the largest livestream shopping platform in North America a...
    Tbwa Chiat/Day Inc
    Hired Organization Address Boston, MA Full Time
    We Help People. Check all associated application documentation thoroughly before clicking on the apply button at the bot...
    Tbwa Chiat/Day Inc
    Hired Organization Address New York, NY Full Time
    The Company You’ll Join Carta develops purpose-built software that transforms traditional accounting into a powerful gro...

    Not the job you're looking for? Here are some other Site Reliability Engineer jobs in the San Francisco, CA area that may be a better fit.

    Enterprise Site Reliability Engineer

    OpenAI, San Francisco, CA

    Site Reliability Engineer (Canada)

    Argus Labs, San Francisco, CA

    AI Assistant is available now!

    Feel free to start your new journey!