What are the responsibilities and job description for the W-Onsite-Cincinnati, OH :: Senior Site Reliability Engineer/DevOps Specialist position at Bitsoft International, Inc.?
Job Details
Senior Site Reliability Engineer/DevOps Specialist
1 Year
Cincinnati, OH (Onsite, 5 Days a Week)
Role Summary:
We are looking for a seasoned Site Reliability Engineer/DevOps Specialist to elevate the robustness and efficiency of Kroger's Fulfillment Technology platforms, vital to our expansive omni-channel strategy. In this role, the successful candidate will partner closely with teams across development, QA, and operations to architect, deploy, and sustain high-performing and reliable solutions within our live environments. With a focus on troubleshooting and resolving emergent issues, this role demands adept utilization of modern monitoring, logging, automation, and incident resolution toolsets. You will play a pivotal role in refining our DevOps methodologies, advocating for best practices in continuous integration/delivery, configuration management, infrastructure as code, and cloud services. A combination of deep software engineering expertise, command over system and network administration, and proficiency in cloud infrastructure is anticipated for this position, along with strong interpersonal skills and a zeal for continuous learning and tackling intricate challenges.
Key Responsibilities:
- Forge strong partnerships with in-house application engineering, monitoring teams, business operations, and external entities to prioritize, troubleshoot, and rectify issues that influence customer pickup or delivery services.
- Champion thorough root-cause investigations of significant operational and business disruptions and endorse appropriate corrective measures.
- Spearhead Crisis Management for significant incidents within the Pickup Fulfillment space, communicating effectively with stakeholders regarding resolution progress.
- Enhance the engineering team's ability to deliver reliable builds expediently through continuous integration improvements.
- Advance automation to ameliorate quality and operational efficacy.
- Guarantee system behavior is traceable, observable, and historical logs are maintained.
- Construct proficient monitoring, logging, and alerting architectures aiding in pinpointing performance issues and optimizing system operations across various platforms including cloud, on-site, and in-store.
- Develop well-structured design documents, playbooks, and technical manuals.
- Participate in an on-call rotation outside standard work hours and undertake scheduled tasks during planned maintenance intervals.
Requirements for the Position:
- A Bachelor's degree in Computer Science, Engineering, or a related field.
- A solid background with at least 4 years of experience in SRE/DevOps/Infrastructure roles.
- Proficiency with database management, handling of web and event-driven applications, messaging architecture, RESTful API and integrations, cloud services, support tools, system monitoring, and containerization frameworks.
- Proficiency in programming with Java, understanding Spring Boot, Microservices, Kafka, Cassandra, and SQL Server.
- Skills in scripting with Python or Shell.
- Minimum of 1-year experience with System Observability tools such as DynaTrace, ELK, PagerDuty, DataDog, Azure Monitor, Grafana, or similar.
- Experience with GitActions for comprehensive CI/CD automation.
- Deep knowledge of Linux systems including architectural understanding, security best practices, performance tuning, troubleshooting, and operational management.
- Proven effectiveness in Agile project environments.
- Experience collaborating with distributed and international teams.
- Strategic thinking capabilities.
- In-depth knowledge of eCommerce, Fulfillment, or Retail Technology infrastructure.
- Excellent communication, documentation, and public speaking skills.
Preferred Qualifications:
- Advanced degree such as a Master's or Ph.D. in Computer Science or a similar field.
- 4 years of experience with high-traffic eCommerce application development.
- 2 years of experience managing cloud infrastructure on platforms like Azure, AWS, or Google Cloud Platform.
- 1 years of expertise with tools and technologies such as Apache Kafka, Azure Cosmos DB, Apache Cassandra, Ansible, Terraform, Docker, and Kubernetes.