What are the responsibilities and job description for the ITOC Engineer - Night Shift position at Chewy?
Our Opportunity:
We are seeking a highly motivated IT Operations Center Engineer to be part of our team in either Richardson, TX or Plantation, FL. The ITOC Engineer plays a vital role in ensuring a healthy technology environment for our customers. The Chewy ITOC Engineer I performs advanced troubleshooting and operational tasks to support the Chewy IT environment. This role is responsible for the management and improvement of all monitoring, alarming, ticketing tools, and documentation. They are also engaged in the Incident Management process, troubleshooting service impacting issues, communicating to the business, and proactively assisting in Major Incident mitigation efforts. This will be a night shift 7:00pm EST - 7:30am EST (still determining the days).
What You'll Do:
- Monitor the Chewy enterprise environment in a 24/7/365 operation that is constantly available upon contact, and with availability to address alerts and incidents 100% of the time.
- React to alerts by performing Tier 1 and Tier 2 operations depending on the issue.
- Configure monitoring tools to include Datadog APM/Synthetic/RUM, AWS Cloudwatch, Splunk and others for optimal observability of Chewy's E-comm environment.
- Function as the ITOC SME for our monitoring infrastructure to aid teams as needed for onboarding and developing monitoring for new and existing services.
- Assist teams in deploying production changes to our AWS environment via Jenkins and other tools.
- Develop the material for and conduct team training to keep the ITOC up to date on the latest technology and best practices.
- Participate in and develop projects for the observability needs of internal Chewy teams, while identifying and creating opportunities to improve our processes and procedures to further raise the bar.
- Troubleshoot advanced issues impacting platform functionality.
- Interface, as primary POC, with 3rd party vendors and internal teams to maintain a highly efficient platform, enhancing ROI.
- Automate manual tasks using specified tools within our environment, such as Terraform, Jenkins and AWS technologies.
- Other duties as assigned.
What You'll Need:
- At least 3 years of experience in an IT Operation Center, or similar environment.
- Datadog, Splunk or similar system administration experience.
- AWS Cloud infrastructure experience.
- Advanced knowledge of micro-service based E-Comm systems.
- Extensive experience with Linux CLI.
- Splunk search query experience.
- Excellent organizational and troubleshooting skills.
- Ability to handle multiple tasks in a fast-paced environment.
- Effective communicator and collaborative worker at all levels of the organization.
- Application development knowledge and experience with associated tools such as Ansible, Terraform, and Jenkins.
- Ability to compose succinct status notifications during high pressure situations for consumption across a wide audience.
- Position may require travel.
Bonus:
- AWS or similar cloud services certification.
- Linux certification.
- ITIL v4 certified.
- Splunk Core certified.
- Scripting in Python, Bash, PowerShell, or similar.
- Bachelor's degree in a related field.