What are the responsibilities and job description for the Senior Cloud Operations Engineer position at Oracle?
mdclp
The Company
Oracle is the world’s leading provider of business software. With a presence in over 175 countries, we are one of the biggest technology companies on the planet. We're using innovative emerging technologies to tackle real-world problems today. From advancing energy efficiency to re-imagining online commerce, the work we do is not only transforming the world of business—it is helping advance governments, power nonprofits, and giving billions of people the tools they need to outpace change. For more information about Oracle (NYSE:ORCL), visit us at oracle.com.
Oracle’s commitment to R&D is a driving factor in the development of technologies that have kept Oracle at the forefront of the computer industry. If you are passionate about advanced development and working on the next-generation large-scale distributed systems for the most popular open source database in the world, which is optimized for the cloud providing the best performance, we would like to talk with you.
What you will do:
HeatWave is a fully-managed database service, powered by the integrated in-memory query accelerator. It is the only cloud-native database service that combines transactions, analytics, and machine learning services into HeatWave, delivering real-time, secure analytics without the complexity, latency, and cost of ETL duplication. It also includes HeatWave Lakehouse which allows users to query data stored in object storage in a variety of file formats. It is developed, managed, and supported by the MySQL team in Oracle. Join us to help further develop this amazing technology.
This cutting edge technology serves critical business needs, which is changing the way data transactions function, all over the world. You will make a technical impact on the world with the work you do.
Join a fun and flexible workplace where you’ll enhance your skills and build a solid professional foundation. As a Senior Cloud Operations engineer for Oracle's Heatwave Service team you will contribute to an exciting team working on one of the hottest cloud services. You will use your skills to learn how to constantly deliver and improve on this tremendous cloud services. Operations work will include troubleshooting production issues and handling requests for upgrades, patches or modifications. When not working on operations you will be working on software engineering tasks such as review of incidents to drive improvement of services, tools or runbooks to increase our reliability, scalability and reduce operational overhead through automation, training, documentation, service enhancement, or process. This position has the opportunity to leverage and learn the ins and outs of current cloud service architecture, deployment, monitoring and operational technologies. There are many useful and desirable skills which will be acquired if not already present. See below for the many cool and current technologies in play. The ideal candidate has some of the many skills, but key is the motivation and ability to learn quickly as well as a passion for an excellent customer experience. Learn more at https://www.mysql.com/products/heatwave/
Career Level - IC3
Qualifications:
Range and benefit information provided in this posting are specific to the stated locations only
CA: Hiring Range in CAD from: $66,800 to $145,900 per annum.
Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle’s differing products, industries and lines of business.
Candidates are typically placed into the range based on the preceding factors as well as internal peer equity.
Responsibilities:
Engineers will:
- Improve monitoring, notifications and configuration of our Heatwave services.
- Perform proactive service checks and monitor/triage and address incoming system/application alerts, email and phone calls to ensure appropriate priority and response SLAs are met.
- Triage and troubleshoot service impacting events from multiple signals including phone, email, service telemetry and alerting while making sure the response SLAs are met.
- Participate in activities for services such as upgrades and patching.
- Act as a buddy/mentor for peers in the team.
- Contribute to a healthy, supportive and inclusive team culture.
- Identify and work with engineering to implement opportunities for automation, signal noise reduction, recurring issues and other actions to reduce time to mitigate service impacting events and increase the productivity of cloud operations and development resources.
- Provide feedback to development teams about operations administration dashboards functionality and UIs.
- Coordinate, document and track critical incidents ensuring rapid and complete issue resolution and an appropriate closed loop to customers and other key stakeholders.
- Improve the availability, scalability, latency, ease of use, and efficiency of service control plans and operational tooling.
- Participate in service capacity planning and demand forecasting, software performance analysis and system tuning.
- Up-skill with continuous learning of new features being delivered as part of the product map.
- Support secondary Heatwave on AWS cloud service as per business requirement.
- Potentially participate in regular rotations as a central part of the 24x7 operations team, Includes rotational work on weekends, Public Holidays, US East/West timezone shifts.
- Need to be reliable in terms of working scheduled hours.
- Need to be motivated quick learners.
Desired skills include
Cloud specific skills which are not strictly limited to:
- Experience with OCI or equivalent Cloud services (e.g.: IAM, Compute, Load Balancer, Object Storage, Health Monitor).
General skills for working in this operational role:
- Experience in understanding of serverless cloud architectures.
- Experience with Python programming, bash scripting & Git.
- Familiarity with MySQL database, SQL query interface, general database concepts.
- Good knowledge on Linux system administration and experience and familiarity with Linux troubleshooting and internals.
- Good knowledge with Networking concepts, DevOps model.
- Work productively in a fast-paced, team-oriented agile environment.
- Contribute to operational activities such as writing runbooks, troubleshooting, operations automation, and instrumentation for metrics and events.
- Good technical writing and communication skills. Engineers will need to be able to clearly write descriptions of operational issues and corrective actions for incidents.
- Experience with Agile methodology (Scrum or Kanban)
- Very strong analytical skills to identify problem root causes.
- Experience in collaborating with cross-functional teams like Development, QA, Product management, etc.
- Systematic problem-solving approach, combined with a strong sense of ownership and drive in resolving operations issues.
- Experience working under pressure to mitigate customer issues affecting service reliability, data integrity, and overall customer experience.
- Monitoring, management, analysis and troubleshooting of large-scale, distributed systems.
- BS/BE or MS/Mtech degree in Computer Science, Electrical/Hardware Engineering or related field.
- 4 years experience delivering and operating large scale, highly available distributed systems.
- 4 years of work experience as a software, site reliability, cloud operations or customer support engineer.
Salary : $66,800 - $145,900