What are the responsibilities and job description for the Site Reliability Engineer/SRE position at Shrive Technologies?
Title :Site Reliability Engineer/SRE
Location : Indiana Remote
50% support/Operations
Location : Indiana Remote
50% support/Operations
- Runtime production operations support Sev 0 & Sev 1
- "Super T shaped" role that can float between squads with focus on Continuous Process Improvement
- Automation of repetitive tasks
- SREs are focused on building and monitoring anything in production that improves service resiliency
- Possess hands on experience in various stages of IT Infrastructure management Lifecycle.
- Experience in Client relationship, Service Integration, Team building, Process and People Management.
- Experience in successfully managing cloud operations and resources to deliver Client Satisfaction.
- Experience building, integrating, deploying and provisioning cloud services
- IaaC: Implemented large scale infrastructure using Cloud ARM / CF / Terraform Templates
- Experienced in scripting languages such as PowerShell, Python and Shell
- Experience with configuration management tools (Chef, Puppet, Ansible)
- Experience with Collaboration tools such as Atlassian (Jira, Confluence)
- Successfully governed DC consolidation and migration Projects
- Optimization of on-premise and cloud infrastructure and participate in design reviews
- Led multiple implementations of infrastructure monitoring using native monitoring, and third-party tools
- Capacity planning and management create, use, maintain a capacity model for on-prem and Cloud workloads
- Certified in Cloud Architecture, Operations and Engineering
- Certified in ITIL and project management
- Resolve critical and complex technical issues in a global support delivery team. Combine technical expertise and customer requirements to solve complex business challenges.
- Quickly identify customer issues WRT Cloud services; and being able to conduct in-depth diagnostics on Cloud platform and services.
- Perform RCA of critical incidents. Analyze and eliminate top issues impacting customer experience.
- Create documentation (SOP's & TSG's) to help L1/L2 teams to support operations.
- Work with leadership on process improvement and strategic initiatives
- Serve as the SME for selecting technology candidates and self healing capabilities for future service development
- Perform large scale automation, combining independent processes into robust behavior
- Provide Architectural Reviews and Signoffs on a Service based on ability to achieve availability targets
- Accept or reject services based on their ability to achieve SLAs
- Validate scalability testing results, and test limits of hardware and software