Build Systems Engineer (Bazel)
full-time
principal
Posted 4 days ago
About this role
What MatX Is Building
MatX builds custom AI accelerator silicon. The build system is the backbone: it wires together RTL, a stack of commercial EDA tools (simulation, synthesis, place-and-route, lint/CDC), and a Rust/Python software stack into one hermetic, reproducible pipeline. We run on Bazel with bzlmod, RBE, custom rules, and a small but tight set of platform-level abstractions.
You'll join a small group that owns the build graph, the toolchains, the rules that wrap each EDA tool, and the CI infrastructure that keeps thousands of targets green.
What You’ll Do Here
New EDA tool integrations. Wrap a closed-source tool in a hermetic Bazel rule with proper providers, runfiles, and execution constraints. Add a new front-end stage to an existing toolchain; add a rule for test variants that share configuration; wire a third-party generator into our verilog graph as a first-class dep
Bazel version migrations. Lead upgrades (8.x → 9.x) and the bzlmod/ MODULE.bazel housekeeping that comes with them
Hermeticity work. Hunt down the implicit assumptions: system Python, system gcc, leaked /usr/bin deps, host-state in tests. Replace them with hermetic toolchains and tracked inputs
Refactors that delete code. Rewrite a fragmented family of test macros in terms of one shared rule. Remove a homegrown wrapper rule once upstream covers the case. Extract a common aspect helper used by three places that duplicated it. The good PRs net negative
Build performance. Persistent workers for slow tools, RBE configs, action graph hygiene, cache-key debugging when something silently rebuilds
CI infrastructure. GitHub Actions self-hosted runners on GCE COS, Buildbarn workers, monitoring, rolling upgrades
PRs are small and frequent. Median is +50/-30. Big refactors arrive as a series of mechanical commits, each individually reviewable
Reviews are real. We comment, ask questions, request changes. Reviews are how we share the build system across the team — not rubber-stamping
Negative diffs are celebrated. "Remove unused X" and "Replace ad-hoc Y with Z" are first-class contributions
You'll teach the rest of the team Bazel. Half the company writes RTL or Rust, not Starlark. Good rules let them stay in their domain. Good docstrings (and stardoc) keep them self-serve
You'll work tightly with at least one of us. Most non-trivial changes are pair-designed before code. Fast feedback loops, whiteboard sessions, no async-only collaboration
Lean on AI, but stay persnickety. We use Claude Code and similar tools heavily — for prototypes, refactors, scripts, even rule scaffolding. We also reject most of what they produce on the first pass. You'll steer the model hard toward your taste, push back on the easy answer, and review every line you commit as if you wrote it. Auto-generated PRs that pass tests but miss the point are not what we want
Who You Are
Deep build-system fluency. Rules, providers (or equivalent), aspects, toolchains, platforms, configuration/select, transitions, query. You can read a build-system file — .bzl , Buck2 BUCK , Shake Rules.hs , whatever — and predict what its action graph will look like. Bazel-native is a plus; we'll trade six weeks of Starlark ramp for the right taste. We hire on build-system fluency, not Bazel-keyword-matching. If you've done equivalent work in Buck2, Shake/Hadrian, Pants, Nix, or a homegrown Blaze-shaped system, read the bullets as concepts — Bazel is what you'll write here, but the principles port. Be honest about ramp on Starlark and bzlmod.
bzlmod / MODULE.bazel . Module extensions, lockfile management, vendoring third-party deps cleanly
Remote execution. RBE, Buck2 RE, BuildBuddy, BuildBarn, your own — they all teach the same lessons. Cache-key debugging, Build without the Bytes, diagnosing "works locally, fails remote." If you've owned one end-to-end, the next one is a port
Comfort in Rust / Python / shell / Starlark. You might read all four in any given week
Bonus Points If You Have
Build graph is the source of truth. "If two things must stay in sync, make one depend on the other." Allergic to parallel lists in workflow YAML, Python arrays, and .bzl dicts that drift
Don't parse what you can generate. If a tool has the structured data internally, have it write structured output. Parsing human-readable reports is a temporary bridge, not a design
Split build from check. A rule that produces artifacts always succeeds; a separate _test target gates on quality. Empty dashboards because the build broke are unacceptable
Let Bazel parallelize, not the orchestrator. One bazel build --keep_going over N matrix jobs that each warm up Bazel
Encode execution constraints in the rule, not the invocation. No README accumulating per-tool --strategy=... , --remote_download_outputs=... , --sandbox_debug incantations. execution_requirements belongs on the action
Compose at the boundary. Dev and prod differ only in where
Similar Jobs
Related searches:
Get jobs like this delivered weekly
Free AI jobs newsletter. No spam.