What are the responsibilities and job description for the Site Reliability Developer 3 position at Oracle?
This position requires U.S. Citizenship and an active TS/SCI w/Poly Security Clearance.
Work with a SaaS Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services. Responsible for the design and delivery of the mission critical stack, with focus on security, resiliency, scale, and performance. Authority for end-to-end performance and operability. Partner with development teams in defining and implementing improvements in service architecture. Articulate technical characteristics of services and technology areas and guide Development Teams to engineer and add premier capabilities to the Oracle Cloud service portfolio. Understand and communicate the scale, capacity, security, performance attributes, and requirements of the service and technology stack. Demonstrate clear understanding of automation and orchestration principles. Act as ultimate escalation point for complex or critical issues that have not yet been documented as Standard Operating Procedures (SOPs). Utilize a deep understanding of service topology and their dependencies required to troubleshoot issues and define mitigations. Understand and explain the affect of product architecture decisions on distributed systems. Professional curiosity and a desire to a develop deep understanding of services and technologies.
Qualifications:
A BS or MS in Computer Science, or equivalent. Identifies solutions to knowledge of server hardware and software configuration, networking, standard internet services, scripting languages, cloud computing patterns, technology security and compliance. Experience running large scale customer facing web services. Identifies solutions to understanding of load balancing technologies and experience with development in programming languages, databases and big data stores, and container technologies. Work involves defining and documenting technical architecture of complex and highly scalable products. A minimum of 5 years experience of running large scale customer facing web services.Responsibilities:
Responsibilities:
- Team members will engage in operational activities including monitoring, installation, patching, and upgrading of Oracle Fusion Middleware, Fusion Applications, and Oracle database implementations.
- Work on the SaaS Engineering Operations team with full stack deployment and support responsibilities.
- Address extremely complex, critical customer system issues on a routine basis and document technical solutions.
- Work directly with customers, product development, and product support to resolve critical customer application or performance issues in a timely basis.
- Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services.
- Authority for end-to-end performance and operability.
- Utilize a deep understanding of service topology and their dependencies required to troubleshoot issues and define mitigations.
- Collaborate with team members to improve the team's engineering tools, systems, procedures, and security.
- Work across development and operational environments.
- Stay abreast of software testing technologies and best practices.
- Lead implementation of tools and processes using DevOps methodologies to innovate, design, develop, secure, build, release and support fully automated CI/CD pipelines for the development and maintenance of Cloud Infrastructure Services and relevant applications.
- Manage and operate the infrastructure and configuration of microservices with a focus on end-to-end automation and infrastructure as code.
- Perform code development, SaaS ONSR region builds, monthly release deployments, operations engineering, and promotion to testing and production environments to support Sprint activities.
- Participate in developing, implementing, and enforcing policies to enhance cloud system security.
- Recommend technical policies and consult with development teams on remediation.
- Collaborate with team members to improve the team's engineering tools, systems, procedures, and security.
- Apply DevOps methodologies in assisting to determine the direction of current and future development projects.
- Evangelize security and application solutions and controls by creating and communicating presentations both internally and externally.
- Demonstrate flexibility and resilience in response to changing or ambiguous situations.
- Work across development and operational environments.
- Stay abreast of software testing technologies and best practices.
Minimum Qualifications
- U.S. Citizenship and possess/maintain TS/SCI w/Poly security clearance.
- Able to work as part of a shift in a global 24x7x365 DevOps team. Must be willing to work non-standard work shift (though primary standard shift is US daytime) including, holidays, and weekends on a rotation basis with your colleagues.
- BS degree in Computer Science or related technical field involving coding or equivalent practical experience.
- Proficient with writing services/task automation in Python, Bash, Ruby, Perl, JavaScript, or Java
- Proficient with communication skills (writing, organization, learning exchange)
- Familiarity with core protocols (DNS, DHCP, HTTP, TCP)
- Deep knowledge of Linux internals and host-based networking
- Expert Linux/Unix performance and stability troubleshooting skills
- Familiarity with configuration management solutions such as Chef, Puppet, etc.
- Experience with devising, managing, and extending monitoring solutions for large scale environments.
- Experience in database management (Oracle DB, MYSQL, Postgres)
- Experience in shared file systems (Gluster, ZFS, etc.)
- Systematic problem-solving approach, strong communication skills, a sense of ownership and drive
- Deep understand of service metrics and alarms through the development of dashboards, service KPIs, alarming systems
- Experience working in an operational environment with mission critical tier one services with associated pager duty
- 3 years managing large scale, highly distributed, services infrastructures
- 2 years managing host virtualization technologies (KVM, Containers, Docker, etc.)
- 3 years of experience in production software development with Agile methodologies
- 3 years managing host, network, or storage virtualization technologies
- Expert troubleshooting skills
- Expert fleet automation and management solutions
- Knowledge on Fusion Applications is a big plus.
- Experience with Jira and Confluence is a plus.