What are the responsibilities and job description for the Senior Systems Monitoring Engineer position at BloKchain Talent?
Company Description
Our Client, is an award-winning workplace. They have been recognized by Comparably as #1 CEO, Company Happiness, Benefits, Compensation, Diversity, and more! Not to mention they’ve been awarded by Glassdoor as the 2nd Best US workplace & Best Large Company US CEO in 2018, Wealthfront, and Business Insider. They culture focuses on delivering happiness, our commitment to transparency, and the tangible benefits we provide our employees and our customers.
Job Description
POSITION TITLE: Senior Systems Monitoring Engineer
LOCATION: Phoenix AZ
SALARY: Based on Experience
SPONSORSHIP: No
Job description:
- Responsible for application monitoring and alarming of the production environment of Zoom global real-time online conference system, high-availability target of 99.99%, continuously find and fix problems, and ensure the stable operation of the business.
- Ability to develop monitoring plan/system and implement maintenance in accordance with the company's product system architecture and business logic as required
- Have excellent self-learning ability, be able to read English documents related to products and technologies, pay attention to the development of open source software, and be able to with stand high work pressure
- Participate in the construction and continuous improvement of Zoom's global operation and maintenance system, be proficient in writing relevant operation and maintenance technical documents, and continuously improve the operation and maintenance system and process.
Job Requirements:
- Bachelor’s degree or above, computer related major, at least three years of experience in large-scale website system operation and maintenance
- Strong skills on some of popular monitoring systems, such as: Prometheus / Kubernetes / Grafana / Filebeat / ELK(Elasticsearch Logstash Kibana) / Zabbix.
- Be good at one of the programming languages: Shell, Python,Java etc.
- Experiences on open source software such as Nginx, Tomcat, Apache, Memcache, Redis, MySQL, experiences on system High availability, Fail-over mechanism , Load balancing.
- Experiences on Amazon service components, such as: Awscli, S3, EC2, Route53, RDS, Cloudwatch, DymamoDB, etc.
- Familiar with and master the use of automated operation and maintenance tools such as Ansible, Jenkins, etc., with actual large-scale (1000 ) server operation experience is preferred.
- Language requirement: English, Mandarin is plus
Qualifications
Job Requirements:
- Bachelor’s degree or above, computer related major, at least three years of experience in large-scale website system operation and maintenance
- Strong skills on some of popular monitoring systems, such as: Prometheus / Kubernetes / Grafana / Filebeat / ELK(Elasticsearch Logstash Kibana) / Zabbix.
- Be good at one of the programming languages: Shell, Python,Java etc.
- Experiences on open source software such as Nginx, Tomcat, Apache, Memcache, Redis, MySQL, experiences on system High availability, Fail-over mechanism , Load balancing.
- Experiences on Amazon service components, such as: Awscli, S3, EC2, Route53, RDS, Cloudwatch, DymamoDB, etc.
- Familiar with and master the use of automated operation and maintenance tools such as Ansible, Jenkins, etc., with actual large-scale (1000 ) server operation experience is preferred.
- Language requirement: English, Mandarin is plus
Additional Information
All your information will be kept confidential according to EEO guidelines.