Cluster Deployment Engineer
full-time
lead
Posted 1 hour ago
About this role
About Anthropic
Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.
ABOUT THE ROLE
As a Cluster Deployment Engineer, you will own how large-scale AI compute clusters physically come together inside our datacenter fleet. You will set the deployment-engineering strategy for cluster build-out — how racks are organized into pods, halls, and sites; how compute, network, power, and cooling systems interface at the rack boundary; and how deployment scope flows cleanly from hardware specification to facility delivery to a running cluster. This role is focused on deployment engineering, not on datacenter network or systems design — your scope is making sure clusters land cleanly and predictably, not designing the fabrics or facilities themselves.
This is a senior individual-contributor role with broad technical influence. You will work across hardware, networking, facilities, supply chain, and construction to ensure that every generation of accelerator we deploy lands in a datacenter that is ready for it — on schedule, at full density, and with every piece of required infrastructure accounted for. You will be the person who sees around corners: anticipating how next-generation rack designs will stress our facilities, where our deployment model will break at scale, and what needs to change now so that the next cluster turn-up is faster and more predictable than the last.
You will operate at the intersection of engineering strategy and execution discipline, partnering with internal research and systems teams, external developers, engineering firms, and OEM partners to deliver cluster capacity at the speed the frontier demands.
RESPONSIBILITIES
Own cluster-level deployment strategy — define how AI compute clusters are organized across the floor, how racks interconnect, and how cluster topology requirements translate into facility and deployment scope across a portfolio of sites.
Set rack interface standards spanning power, network, mechanical, thermal, and spatial domains, and ensure that every deployment includes the complete set of infrastructure required to bring a cluster online.
Drive multi-threaded cluster bring-up programs across hardware, networking, power, and cooling — owning plans, dependencies, and critical paths from hardware specification through energization and turn-up.
Partner with internal engineering teams — research, systems, networking, and hardware — to translate cluster requirements into deployable facility scope, and to derisk onboarding of new hardware platforms well ahead of delivery.
Lead external partner execution with developers, engineering firms, OEMs, and construction teams, driving technical reviews, deviation management, and handoffs that keep deployments on schedule and within specification.
Improve cluster turn-up reliability and repeatability — identify systemic gaps in deployment scope, tooling, and partner interfaces, and drive durable fixes that reduce time-to-serve for new capacity.
Define and track deployment KPIs — cluster readiness, schedule adherence, scope completeness, time-to-first-packet — and use historical trends to forecast risk and inform capacity planning.
Coordinate cross-functional readiness across supply chain, security, operations, and construction to ship production-ready compute capacity.
Provide crisp executive visibility on deployment progress, tradeoffs, and risks across a portfolio of concurrent cluster programs.
Design cluster interfaces for durability — define rack and cluster-level interfaces that remain robust across hardware generations, so that facility scope and deployment models do not need to be reinvented every time the underlying hardware changes.
Build cluster layout and BOM tooling — create and maintain the tools, templates, and data models that turn cluster topology and rack specifications into accurate floor layouts, deployment sequences, and complete bills of materials, replacing one-off spreadsheets with repeatable, auditable workflows.
YOU MAY BE A GOOD FIT IF YOU
Have 10+ years of experience in hyperscale datacenter environments, with senior-level responsibility for cluster deployment, large-scale IT integration, or equivalent infrastructure programs.
Have delivered AI, HPC, or high-density compute clusters at scale and developed a strong intuition for the constraints that govern cluster deployment — interconnect reach, adjacency, power density, and thermal limits.
Can operate fluently across the boundary between IT hardware and facility infrastructure, and have set interface standards that held up across multiple hardware generations and sites.
Have led cross-functional progr
Similar Jobs
Related searches:
Get jobs like this delivered weekly
Free AI jobs newsletter. No spam.