What are the responsibilities and job description for the Director - Site Reliability Engineering (SRE) - Observability Platform & Tools position at Toyota Motor Sales, U.S.A., Inc.?
Overview
Who we are
Collaborative. Respectful. A place to dream and do. These are just a few words that describe what life is like at Toyota. As one of the world's most admired brands, Toyota is growing and leading the future of mobility through innovative, high-quality solutions designed to enhance lives and delight those we serve. We're looking for diverse, talented team members who want to Dream. Do. Grow. with us.
An important part of the Toyota family is Toyota Financial Services (TFS), the finance and insurance brand for Toyota and Lexus in North America. While TFS is a separate business entity, it is an essential part of this world-changing company- delivering on Toyota's vision to move people beyond what's possible. At TFS, you will help create best-in-class customer experience in an innovative, collaborative environment.
To save time applying, Toyota does not offer sponsorship of job applicants for employment-based visas or any other work authorization for this position at this time.
This position is an onsite role based in Plano, TX
Who we're looking for
Toyota Financial Services is launching a new Site Reliability Engineering (SRE) team, and we are seeking a director to spearhead this initiative. As the director, you will be responsible for building the SRE team from the ground up and establishing robust processes to ensure the reliability, performance, and scalability of our systems and applications.
What you'll be doing
- Supporting Engineers with hands on coding, debugging, and implementation of automation to support a more stable and robust application environment.
- Foster a collaborative team culture and support professional development.
- Define and implement strategies for system reliability, performance, and scalability.
- Develop Service Level Objectives (SLOs) and Service Level Agreements (SLAs) aligned with business goals.
- Design and deploy monitoring, alerting, and incident management systems.
- Implement and refine disaster recovery and business continuity plans.
- Lead major incident responses and coordinate with stakeholders for resolution.
- Conduct post-incident reviews and drive continuous improvement.
- Identify and implement automation opportunities to streamline operations.
- Oversee the development and implementation of monitoring and incident management tools.
- Work with engineering, product, and infrastructure teams on reliability goals.
- Participate in architectural reviews, providing input on reliability and scalability.
- Recruit, build, and lead the new SRE team with clear objectives and metrics.
What you bring
What we'll bring
During your interview process, our team can fill you in on all the details of our industry-leading benefits and career development opportunities. A few highlights include :
Belonging at Toyota
Our success begins and ends with our people. We embrace diverse perspectives and value unique human experiences. Respect for all is our North Star. Toyota is proud to have 10 different Business Partnering Groups across 100 different North American chapter locations that support team members' efforts to dream, do and grow without questioning that they belong.
Applicants for our positions are considered without regard to race, ethnicity, national origin, sex, sexual orientation, gender identity or expression, age, disability, religion, military or veteran status, or any other characteristics protected by law.
Have a question, need assistance with your application or do you require any special accommodations? Please send an email to talent.acquisition@toyota.com.