What are the responsibilities and job description for the Dev Ops/SRE Engineer position at Internet Archive?
The Internet Archive is looking for an expert DevOps / SRE engineer to join the UX Team, working remotely.
You will be one of the primary engineers responsible for the Archive.org website (a Top 250 website) and related services. You will be in charge of maintaining and developing the mostly Ansible-managed production cluster, provisioning and configuring servers, maintaining applications, setting up monitoring and alerts, and generally helping keep things running smoothly. There is also the possibility of contributing to front-end development and participating in other UX-related activities. This is a rare opportunity to become a critical member of a small team making a huge impact in the world.
Responsibilities:
- Operationally maintaining Archive.org servers and services
- Maintaining and evolving the Ansible-based provisioning and configuration infrastructure
- Collaboratively managing the deployment architecture of our staging and production apps
- Setting up and maintaining monitoring and alerts
- Identifying and triaging problems when they arise; researching, building consensus around, and implementing solutions
- Responding to external stakeholders who have apps hosted in our server cluster
- Working with other DevOps engineers, both on the UX Team and on other teams
- Communicating effectively with stakeholders
- Reducing technical debt
- Being a role model for effective and collaborative engineering practices
- Maintaining the blog and other Wordpress sites
Requirements:
- 3 years of relevant work experience in a collaborative software development environment
- Strong Linux system administration skills
- Expertise with maintaining and optimizing a server cluster through time
- Experience setting up monitoring and alerting at all levels within a system
- Excellent problem-solving and debugging skills
- Excellent verbal and written communication skills
- Familiarity with website and server security
- Comfort working in a loosely structured environment requiring individual autonomy and initiative within one's scope of responsibilities
- Willingness to learn and change, reach compromise with others
- Remote work with occasional optional on-sites
Preferred Skills:
- Automated server provisioning with Ansible (or similar tooling)
- Web servers, load-balancing, and caching (e.g. nginx, HAProxy)
- Network & DNS configuration
- Containerization and clustering (e.g. Docker, nomad, consul)
- Monitoring and observability (e.g. Grafana, Prometheus, Loki, Sentry)
- Git, GitLab
- JIRA, Agile-ish software development
About Us:
We are a 501(c)(3) non-profit digital research library with a bold mission: to provide universal access to all knowledge—including the books, music, images, audio, television, websites, and software that form our shared human culture. Our dedicated team of engineers, archivists, librarians, and other professionals has created one of the world’s top 300 websites, archive.org. Each day, the Internet Archive digitizes thousands of books and captures hundreds of millions of web pages weekly. Over the past 25 years, we have built one of the largest digital libraries in existence, serving millions of people worldwide. This achievement is made possible through collaborations with hundreds of libraries, archives, museums, universities, and non-profits across the globe.
Benefits & Perks:
The Internet Archive provides a comprehensive benefits package including: PTO, paid holidays, medical, dental, vision, FSA, commuter, STD, LTD, 401K/Roth accounts. Work-life balance is important to us. For engineers located near HQ, we offer catered Friday lunches.
Internet Archive is an Equal Opportunity Employer M/F/D/V/L/G/B/T and will consider for employment qualified applicants with criminal histories in a manner consistent with the requirements of the Fair Chance Ordinance.
Salary : $140,000 - $180,000