What are the responsibilities and job description for the Senior Kubernetes Admin Systems Engineer EngProd position at Arista Networks?
Who Youll Work With
Arista Networks is looking for worldclass Kubernetesaware engineers passionate about driving systems reliability and scalability to provide the best possible development experience for our 1400 person engineering team. You will be part of a fast paced high caliber team building the internal systems and infrastructure used to build the routing and switching products driving the industrys largest data center networks.
Aristas Software Engineering team runs at a scale rarely found TBs of source control 60GB work trees with 1000s of developer branches in flight at any given time over 400K daily build / test jobs and over 150 homegrown and cloud native services running on a 100 node onprem bare metal Kubernetes cluster . Operating these systems takes vigilance responsiveness to alerts and a steady stream of updates and bug fixes to keep things running smoothly and efficiently as well as to increase our ability to monitor understand and visualize them. The role will cover all aspects of our Kubernetes infrastructure and may include monitoring responding to and enhancing alerts working to unify and standardize our alerts fine tuning code for scalability and performance debugging problems simplifying and securing developer experience with k8s etc. You will own your projects from definition to deployment developer and vendor interactions and you will be responsible for the quality of everything you deliver.
What Youll Do
Working in the Engineering Productivity (EngProd) group you will collaborate and work with other engineers to design build scale and operate the systems that the rest of Aristas development teams use. The EngProd team uses industrystandard systems like Ansible Jenkins Kubernetes Grafana Spinnaker MySQL ElasticSearch Google Cloud and Varnish and also internal systems that weve built from the groundup to automate CI / CD testing analysis and visualization.
Responsibilities
- Work with existing k8s admin team to own different aspects of managing a production k8s cluster (eg : upgrades monitoring capacity planning security developer experience etc)
- Proactively monitor respond to and enhance alerts and set up automated alert handling where applicable
- Create and maintain the incident response runbooks working with the service dev teams
- Debug and resolve issues impacting developer user experience and infrastructure stability around the k8s platform
- Adopt current best practices in k8s cluster management. Evaluate and adopt OSS projects that simplify k8s cluster management.
- Set up guidelines and paved paths for service dev teams improving developer experience around the k8s platform.
- Work with Aristas software engineers to identify bottlenecks and limitations in our workflows tooling and infrastructure around k8s and provide fixes for those problems.
- Engage with 3rd party vendor support as part of triage
Qualifications :
Additional Information :
Arista Networks is an equal opportunity employer. Arista makes all hiring and employmentrelated decisions in a nondiscriminatory manner without regard to race color religion sex sexual orientation gender identity national origin or any other factor determined to be unlawful under applicable federal state or law law. All your information will be kept confidential according to EEO guidelines.
Remote Work : Employment Type :
Fulltime