What are the responsibilities and job description for the Site Reliability Engineer position at hireVouch?
Senior Site Reliability Engineer
Position Overview
We are a mid-size entertainment company delivering captivating digital experiences to millions of customers worldwide. Our IT organization powers the infrastructure and systems behind our cutting-edge payroll and accounting applications. We are seeking a Senior Site Reliability Engineer (SRE) to enhance the performance, scalability, and reliability of our infrastructure and help bring our next-generation solutions to life.
As a Senior Site Reliability Engineer, you will ensure the reliability and scalability of our Infrastructure. You will leverage your skills in cloud technologies, infrastructure operations, Kubernetes orchestration, application development, database administration, Oracle E-Business Suite (EBS), and maintain robust infrastructure that supports business-critical platforms. This role will also involve collaboration with cross-functional teams to implement engineering best practices, monitoring and automation while exploring opportunities to enhance operations with emerging AI technologies.
Key Responsibilities
- Infrastructure as Code : Develop and maintain automated infrastructure provisioning with Terraform for hybrid cloud environments.
- Cloud Expertise : Design and manage robust multi-cloud environments using AWS and Azure , with a focus on optimizing Kubernetes clusters ( EKS and AKS ).
- Oracle E-Business Suite (EBS) : Support, optimize, and ensure the reliability of Oracle EBS deployments, integrating it with other IT systems to maintain smooth business operations.
- Operating Systems Management : Administer and optimize Linux (RHEL) and Windows Server environments to ensure high availability and security.
- Application Performance : Collaborate with development teams to enhance applications built on React, Node.js, .NET, C#, and Java for reliability and performance.
- Networking & Security : Leverage advanced AWS networking skills to implement secure and scalable architectures, including VPC design, load balancing, and advanced routing.
- Database Optimization : Monitor and tune database performance and manage relational and NoSQL databases to support high-traffic entertainment services.
- Monitoring & Troubleshooting : Implement observability tools and proactively address performance issues using platforms like Prometheus, Grafana, Splunk, or CloudWatch.
- Incident Response & Automation : Lead incident management, postmortem reviews, and automation efforts to prevent recurrence and improve overall resilience.
- Cross-Team Collaboration : Work closely with developers, system administrators, and security teams to align infrastructure needs with business and technical goals.
Qualifications
Required Technical Skills
Desired Soft Skills
Nice-to-Have Skills