Software Engineer, Build Systems / CI
full-time
senior
Posted 1 day ago
About this role
ABOUT THE ROLE
The Engineering Acceleration team builds and operates the foundational systems that engineers use to build, test, and ship ChatGPT, the API, and OpenAI's infrastructure.
We are looking for an engineer to help evolve OpenAI's build and continuous integration systems for a fast-growing engineering organization. This role sits at the intersection of developer productivity, build systems, distributed infrastructure, and software quality. You will work on the systems that determine how quickly and confidently engineers can move: Bazel-based builds, Buildkite pipelines, test selection, remote caching and execution, CI observability, and tooling that helps engineers understand and fix failures quickly.
Our mission is to make OpenAI one of the most productive engineering organizations in the world while preserving a high bar for correctness, reliability, and safety. The best version of this work is invisible when it succeeds: builds are fast, tests are trusted, CI failures are understandable, and engineers can focus on shipping useful systems instead of fighting infrastructure.
IN THIS ROLE, YOU WILL
- Own and evolve Bazel-based build and test workflows across a large, polyglot monorepo.
- Design and maintain Starlark rules, macros, toolchains, and integrations that make builds reproducible, hermetic, and easy for product teams to adopt.
- Improve CI performance and reliability across Buildkite pipelines, including queue time, build time, cache hit rates, test sharding, retry behavior, and flake isolation.
- Build systems that reduce unnecessary CI work through affected-target detection, dependency graph analysis, test selection, caching, batching, and smarter scheduling.
- Improve local development workflows so engineers can reproduce CI behavior, debug build failures, and iterate quickly without learning every detail of the build stack.
- Operate and optimize build infrastructure across Docker/OCI images, Kubernetes-based runners, cloud resources, and remote cache/execution systems.
- Instrument build and CI systems with metrics, logs, traces, dashboards, and analytics so we can measure speed, reliability, cost, and developer impact.
- Partner directly with product, infrastructure, and research engineering teams to understand pain points, onboard projects, debug hard build issues, and remove systemic bottlenecks.
- Use modern AI tools to rethink CI failure analysis, flaky test debugging, PR triage, automatic remediation, and developer-facing explanations.
- Own the reliability of the systems you build, including participating in an on-call rotation for critical developer infrastructure.
TECHNOLOGIES COMMONLY USED IN THIS ENVIRONMENT INCLUDE
- Bazel and Starlark for build and test workflows
- Buildkite for CI orchestration
- Docker and OCI images for build and runtime packaging
- Kubernetes for CI runners and infrastructure orchestration
- Python, Go, TypeScript, Rust, C++, and other languages in a large monorepo
- Terraform for infrastructure as code
- Remote caching, remote execution, artifact storage, and build telemetry systems
- Postgres, Kafka, and internal services used to power engineering platforms
YOU MAY BE A STRONG FIT IF YOU
- Have 5+ years of software engineering experience, including significant experience building infrastructure or tooling for developers.
- Have hands-on experience with Bazel, Buck, Pants, Gradle, or similar build systems, and understand the tradeoffs of hermetic builds, dependency graphs, caching, sandboxing, and remote execution.
- Have built or operated CI systems at scale, especially in environments where build time, queue time, test flakiness, and developer trust materially affect engineering velocity.
- Are comfortable writing production software for internal platforms, not just configuring tools. We expect this role to involve code, design, debugging, operations, and long-term ownership.
- Can debug distributed build and CI failures across source control, dependency management, containers, runners, remote caches, test frameworks, and service infrastructure.
- Care deeply about developer experience and have empathy for the small sources of friction that slow teams down or create operational toil.
- Are pragmatic about platform adoption: you know how to build paved paths that teams want to use because they are faster, clearer, and more reliable.
- Communicate clearly across teams and can turn ambiguous productivity problems into concrete technical plans.
- Are excited to apply AI to developer infrastructure in ways that make engineers faster without weakening quality, reliability, or safety.
About OpenAI
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool th
Similar Jobs
Related searches:
Get jobs like this delivered weekly
Free AI jobs newsletter. No spam.