Demo

Video Cloud Site Reliability Engineer (SRE)

Alibaba Cloud
Bellevue, WA Full Time
POSTED ON 4/4/2025
AVAILABLE BEFORE 5/3/2025

We are committed to providing intelligent, high-quality, high-performance, ultra-low-latency, flexible and professional live streaming and media services to hundreds of millions of users through cutting-edge innovation. Join us and you will help shape the future of the next generation of intelligent video cloud services.


We are seeking a passionate and technically skilled Site Reliability Engineer (SRE) to join Alibaba Cloud’s Video Cloud team. You will play a critical role in building, deploying, and maintaining highly available, high-performance systems for live streaming, media services, video-on-demand, and media SDKs. Your responsibilities will span application deployment, system reliability, customer issue resolution, and automation to ensure the stability and scalability of Video Cloud's infrastructure.



Key Responsibilities

1. Application deployment

* Deploy applications/services written in multiple languages (C /Java/Go/Python) following operational guidelines.

* Monitor deployment metrics and logs to proactively identify risks, defects, or deviations from expected outcomes.

* Validate program functionality post-deployment and ensure results align with performance and reliability expectations.

2. System Reliability & Observability

* Oversee monitoring, alerting, and incident response for live streaming, media services, video-on-demand, and SDK systems, ensuring SLA compliance.

* Diagnose and resolve failures across diverse components: networks, databases, caches, message queues, operating systems, hardware, and third-party software.

* Design and optimize monitoring metrics, log collection, and alerting strategies to enhance system observability and uptime.

* Lead emergency response for critical incidents, conduct root cause analysis (RCA), and implement long-term solutions to prevent recurrence.

3. Customer Issue Resolution

* Investigate and resolve customer-reported issues related to live streaming quality (e.g., latency, buffering, visual anomalies) and media services quality (e.g., video transcoding or editing effects, visual or audio anomalies, media processing speed), collaborating with development teams to identify flaws in data centers, edge networks, or customer mobile device environment.

4. Automation & Continuous Improvement

* Develop tools and scripts (Java/Go/Python/C ) to automate deployment, scaling, fault recovery, and other operational workflows.

* Build automated diagnostic toolchains to accelerate issue resolution and improve customer satisfaction.


Minimum qualification:

- 3 years of experience in SRE, DevOps, or backend development, with expertise in distributed system operations. Experience in cloud computing, streaming, video technologies is a plus.

- Experience programming with at least one modern language such as Python, Golang, Java, C .

- Strong ability to work under pressure, manage critical incidents, and participate in an on-call rotation.

- Experience with CI/CD pipelines build processes

- Fluency in both Chinese and English for daily communication.


Preferred qualification:

- Familiarity with live streaming protocols (e.g., RTMP, HLS, DASH) or video technologies, such as video capture, video encoding and processing, video editing, video storage and delivery, video playback etc.

- Familiarity with Linux systems, network protocols (TCP/HTTP), docker, database, redis, mq, http protocol.

- Experience with distributed systems at scale, deployment of monitoring systems.




The pay range for this position at commencement of employment is expected to be between $104,400 and $171,000/year. However, base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience.


If hired, employee will be in an “at-will position” and the Company reserves the right to modify base salary (as well as any other discretionary payment or compensation program) at any time, including for reasons related to individual performance, Company or individual department/team performance, and market factors.


Alibaba U.S. based full time regular employees have access to medical, dental, and vision insurance, a 401(k) plan and basic life insurance, and wellbeing benefits like FSA, subject to the terms and conditions of the applicable plans then in effect. U.S. based employees are also eligible to receive up to 12 paid holidays, accrue up to 15 paid vacation days for this position, and receive up to 72 hours paid sick time (front-loaded) per calendar year.

Salary : $104,400 - $171,000

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Video Cloud Site Reliability Engineer (SRE)?

Sign up to receive alerts about other jobs on the Video Cloud Site Reliability Engineer (SRE) career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$92,877 - $110,401
Income Estimation: 
$120,933 - $155,034
Income Estimation: 
$114,618 - $136,401
Income Estimation: 
$92,369 - $122,605
Income Estimation: 
$117,024 - $149,811
Income Estimation: 
$71,493 - $96,419
Income Estimation: 
$92,369 - $122,605
Income Estimation: 
$76,670 - $90,826
Income Estimation: 
$91,609 - $118,978
Income Estimation: 
$92,877 - $110,401
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Alibaba Cloud

Alibaba Cloud
Hired Organization Address Washington, DC Full Time
Job Description We, Alibaba Overseas Engineering & TPM team, are seeking for a highly skilled and experienced Constructi...
Alibaba Cloud
Hired Organization Address Seattle, WA Full Time
In Alibaba Cloud, we provide the fundamental Cloud technology and infrastructure to help merchants, brands and other bus...
Alibaba Cloud
Hired Organization Address Bellevue, WA Full Time
We are the SRE team of the edge cloud business in Alibaba Cloud, specializing in edge cloud services, including edge net...
Alibaba Cloud
Hired Organization Address Bellevue, WA Full Time
1. Alibaba Cloud is a leading cloud computing company in China, with its market share ranking first in the country for s...

Not the job you're looking for? Here are some other Video Cloud Site Reliability Engineer (SRE) jobs in the Bellevue, WA area that may be a better fit.

Site Reliability Engineer

Alibaba Cloud, Bellevue, WA

AI Assistant is available now!

Feel free to start your new journey!