What are the responsibilities and job description for the Infrastructure Engineer, Data Acquisition position at OpenAI?
About the DAQ Team
The Data Acquisition (DAQ) organization is responsible for building and operating the data pipelines, crawling, storage systems, and post-processing platforms that fuel OpenAI’s research and product development. Our Infrastructure team’s mission is to enable DAQ developers to move fast with minimal friction, providing “just enough” reliability, observability, and tooling in a highly dynamic environment. We value broad impact and rapid iteration over polished perfection. As our data needs grow by orders of magnitude, this small but scrappy team ensures that our foundational services can keep pace.
We operate across a wide range of infrastructure concerns, including :
Scaling web crawler (from modest usage to many multiples of that in the coming year)
Managing storage and compute for large-scale indexing, embedding, and search workloads
Evolving observability (metrics, logs, traces) and automation in a flexible, 80 / 20 manner
Adopting best-of-breed tooling and security from other parts of the organization (e.g., Terraform stacks, cloud platform practices)
Rather than enforcing rigid SLAs or designing monolithic infrastructure, we focus on empowering DAQ teams to build, deploy, and run their own services. Our environment can be fluid and ad hoc— if something solves the problem quickly and reliably enough, that is usually the correct approach.
About the Role
We’re looking for a hands-on Infrastructure Engineer with a strong bias toward action. You’ll be part of a small group of generalists responsible for everything from ad-hoc shell scripts to cluster provisioning automation. You will both design and implement systems : we do create architecture, but it’s rarely in the form of a lengthy design doc that goes stale—rather, we value prototyping, iterating, and shipping quickly.
In this role, you will :
Scale and maintain our data pipelines and compute clusters as DAQ grows by large multiples in the next year
Build out “just enough” observability (metrics, logs, tracing) to support developer troubleshooting and performance insights
Help design on-call processes for the infra we own, balancing developer velocity with service reliability
Collaborate directly with DAQ teams on deployment approaches, ephemeral or ad hoc workloads, network / security integrations, and more
Prototype and implement solutions for caching, load balancing, job scheduling, and cluster scaling, with an emphasis on iteration speed
Improve developer productivity by reducing friction—through better tooling, automated provisioning, and simplified environment setups
Adopt and integrate Infrastructure-as-Code (IaC), CI, and security best practices from other teams, tailoring them to DAQ’s dynamic needs
You might thrive in this role if you are a broad generalist who enjoys a scrappy, results‑oriented culture, can dive into anything from container orchestration to networking, and loves to unblock fellow engineers by building reliable infrastructure that supports massive growth. You’re comfortable with cloud infrastructure, automation, observability, and are eager to work in a dynamic and fast‑moving team environment.
Qualifications
Proven experience in an infrastructure backend role, ideally in a fast-paced environment
Comfort with at least one major cloud platform and its associated tooling
Familiarity with containerization and orchestration technologies (Kubernetes or similar)
Some background with IaC (e.g., Terraform, CloudFormation)
Hands-on experience with monitoring / observability stacks
Strong scripting / coding ability to build and automate solutions (your language of choice)
Ability to balance scrappiness and speed with robust design when needed
Effective communication and collaboration skills; you enjoy enabling others
We are a small team tackling big scaling challenges with lean resources—this role is pivotal to enabling the next wave of DAQ’s growth. If you’re excited by a high-impact, hands-on position where you’ll have broad autonomy and creative freedom to shape infrastructure, we’d love to hear from you!