Demo

Senior Site Reliability Engineer

Microsoft Power Platform Community
Redmond, WA Full Time
POSTED ON 1/28/2025
AVAILABLE BEFORE 2/26/2025
Overview

Are you looking to be at the forefront of Microsoft’s cloud computing transformation? Are you looking to work in an agile environment that ships frequently while maintaining a focus on long-term bets? Do you want to work with state of the art distributed systems that deal with near real time detections on petabyte scale telemetry using Machine Learning and traditional software to deliver on Cloud Availability and Safety goals. Do you want to make an impact in a team of talented engineers delivering world class Software solutions?

Microsoft Cloud Operations & Innovation (CO I) is the engine that powers Microsoft cloud services through the operation of our unified global datacenters enabling ~30% of Microsoft revenue through Commercial Cloud ($38 billion in FY20 Q1). The Cloud Infrastructure Health team in CO IE is focused on improving Cusomer Availability, Data center Safety, Capacity and helping optimize the utilization of Datacenter resources using telemetry and Insights. Our systems analyze petabyte scale telemetry data from Datacenter critical environments and secondary signals in near real time and offline that enable timesensitive insights directly impacting Cloud Operations.Our team is looking for an experienced, competent, and motivated Senior SRE . The Site Reliability Engineering (SRE) team provides leadership, direction and accountability for application architecture, system design, and end-to-end implementation. As a Senior Site Reliability Engineer you will identify and deliver software improvements using your expertise in software development, complexity analysis, and scalable system design. Collaboration skills will be required to work closely with other engineering teams to ensure services/systems are highly stable and performant, meeting the expectations of our customers and users.

As Site Reliability Engineer, primarily responsible in keeping our data services reliable, scalable and participate design reviews. Also takes responsibility for developing code, scripts, systems, and/or tools that reduce operational burden by automating complex and repetitive tasks such as onboarding of system capabilities to newer data centers and upkeep of system capabilities in the existing sites . The SRE enables feature teams to increase the velocity at which they can safely deploy changes to production, and monitor the effects of changes across the footprint. SRE analyzes telemetry data to develop capacity planning models, identify patterns and trends that drive continuous improvement, and highlight opportunities to deploy automation to monitor and manage CIH services across sites. SRE also participates in on-call rotations to resolve live site incidents, minimize customer impact, and document solutions and insights that inform ongoing improvements to infrastructure, code, tools, and/or processes that prevent the recurrence of similar issues.

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day.

Responsibilities

  • Own deployment, availability, reliability, performance and customer escalation targets for Critical Environment Telemetry solutions
  • Design, develop, and maintain data pipelines and back-end services for real-time decisioning, reporting, optimization, data collection, and related functions.
  • Write high quality, maintainable and high-performance code following demonstrated development principles. Manage automated unit and integration test suites.
  • Work with Project Managers and business stakeholders to design and deliver new features, collaborating with partner teams across the org to ensure successful launches.
  • Identify opportunities and drive the implementation of monitoring, self-healing, and automation capabilities to improve service manageability and reliability.
  • Investigate and resolve Customer Reported Incidents, continually looking for ways to minimize or eliminate future incidents and improve customer experiences.

Qualifications

Required Qualifications:

  • 6 years technical experience in software engineering, network engineering, or systems administration
    • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 3 years technical experience in software engineering, network engineering, or systems administration
    • OR Master's Degree in Computer Science, Information Technology, or related field AND 2 years technical experience in software engineering, network engineering, or systems administration.
  • 2 years of experience working in systems uptimes, performance, service monitoring and capacity planning.
Other Requirements

  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to, the following specialized security screenings:
    • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Additional Qualifications

  • 7 years technical experience in software engineering, network engineering, or systems administration
    • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 4 years technical experience in software engineering, network engineering, or systems administration
    • OR Master's Degree in Computer Science, Information Technology, or related field AND 3 years technical experience in software engineering, network engineering, or systems administration
    • OR Doctorate Degree in Computer Science, Information Technology, or related field.
Site Reliability Engineering IC4 - The typical base pay range for this role across the U.S. is USD $117,200 - $229,200 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $153,600 - $250,200 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay

Microsoft will accept applications for the role until February 10, 2025.

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form .

Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.

#COICareers

#COIEngCareers

#COIE_DPXEcareers

Salary : $117,200 - $250,200

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Senior Site Reliability Engineer?

Sign up to receive alerts about other jobs on the Senior Site Reliability Engineer career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$114,618 - $136,401
Income Estimation: 
$144,264 - $191,312
Income Estimation: 
$140,435 - $166,410
Income Estimation: 
$92,877 - $110,401
Income Estimation: 
$120,933 - $155,034
Income Estimation: 
$114,618 - $136,401
Income Estimation: 
$114,618 - $136,401
Income Estimation: 
$144,264 - $191,312
Income Estimation: 
$140,435 - $166,410
Income Estimation: 
$140,435 - $166,410
Income Estimation: 
$151,875 - $212,356
Income Estimation: 
$169,957 - $202,398
Income Estimation: 
$117,024 - $149,811
Income Estimation: 
$137,568 - $176,908
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Microsoft Power Platform Community

Microsoft Power Platform Community
Hired Organization Address Atlanta, GA Full Time
Overview Come build community, explore your passions and do your best work at Microsoft. This opportunity will allow you...
Microsoft Power Platform Community
Hired Organization Address Atlanta, GA Full Time
Overview As a Critical Environment Technician (CET) in Microsoft’s Cloud Operations & Innovation (CO I) team, you will h...
Microsoft Power Platform Community
Hired Organization Address Charlotte, NC Full Time
Overview Microsoft Security aims to make the world a safer place for everyone. Our mission is to reshape security and em...
Microsoft Power Platform Community
Hired Organization Address Bentonville, AR Full Time
Overview Microsoft Industry Solutions is a global organization hosting over 7,000 strategic sellers, industry and securi...

Not the job you're looking for? Here are some other Senior Site Reliability Engineer jobs in the Redmond, WA area that may be a better fit.

Senior Site Reliability Engineer

Oracle, Seattle, WA

AI Assistant is available now!

Feel free to start your new journey!