Staff Platform Engineer, Voice AI

Together AI · San Francisco, CA · $220k - $280k

full-time lead Posted 1 month ago

Apply Now Stand out: build a proof-of-work pitch →

Free GitHub-based preview. Direct apply stays one click away.

Get weekly job alerts like this →

Hiring for this role?

AI Market Demand Pack · $29 one-time

Compare this role's skills with the full AI hiring market. Get ranked demand, salary bands, leading companies, public source URLs, and a decision brief.

See the live sample →

agents mlops api-design distributed-systems speech platform

About this role

About the Role Together AI is defining the infrastructure layer for the next generation of voice applications. Our Voice AI platform powers production-grade, real-time voice agents at scale — and we're looking for a Staff Platform Engineer to own the architecture that makes it possible. This isn't a role about maintaining what exists. You'll set the technical direction for how developers interact with Together's voice platform — from the real-time API primitives they build on, to the autoscaling systems that keep latency SLOs intact under unpredictable load, to the multi-provider abstraction layer that makes our platform uniquely powerful. Voice infrastructure is categorically harder than text inference: bidirectional audio streams, stateful long-lived connections, millisecond latency requirements, and complex multi-model routing don't forgive architectural shortcuts. You'll bring the judgment to get this right the first time, at scale. This is a foundational hire on a small, high-conviction team. The decisions you make in this role will define the platform architecture for years. Responsibilities Own the architecture and reliability of Together's real-time API layer — set the technical direction for WebSocket and HTTP streaming APIs powering STT and TTS at scale; establish the reliability bar (connection lifecycle, backpressure, graceful degradation, reconnection) that production voice agents — contact centers, AI agents, communication platforms — depend on. Lead autoscaling architecture for latency-sensitive voice workloads — design and ship orchestration systems that handle bursty, real-time traffic across tens of thousands of GPUs; solve the hard problems at the intersection of concurrent connection limits, streaming state, and hard latency ceilings that generic autoscalers weren't built for. Define the voice API feature surface — make the architectural calls on word-level alignment, real-time speaker diarization, audio format support (g711/mulaw, PCM, WebRTC), pronunciation controls, and multi-context WebSocket — with a clear view of what unlocks the next category of developer use cases. Build the observability platform for voice infrastructure — design the latency breakdown pipelines, audio quality signal collection, and customer-facing dashboards that give both the team and developers the instrumentation they need to operate at production quality; make debugging voice issues fast and systematic. Own the multi-provider abstraction layer — architect the normalization layer across model partners (Cartesia, Deepgram, Rime, and others) that delivers consistent, provider-agnostic API behavior; your design should absorb upstream variability without exposing it to developers. Drive the interface between API and ML serving — partner closely with ML engineering leadership to define the contract between the API layer and the model serving stack; your decisions here have direct impact on end-to-end latency and reliability SLAs. Raise the bar for developer experience across the platform — lead API design reviews, shape documentation strategy, define integration patterns and cookbooks; the voice developer experience should be something the industry references, not just adequate. Architect for the product surface that doesn't exist yet — build systems with the foresight that they become the foundation for multiple new voice products; your platform decisions should expand what's possible, not constrain it. Requirements 8+ years of experience building large-scale, real-time distributed systems — with clear ownership of systems that carried production traffic at meaningful scale; you can speak to the architectural decisions you made and defend the tradeoffs. Deep, battle-tested expertise in real-time streaming infrastructure — WebSocket server architecture, SSE, bidirectional streaming, connection multiplexing, stateful protocol design — you've debugged production failures in these systems and come out with durable architectural improvements. Expert-level TypeScript and Python, with strong proficiency in systems-level thinking; Rust experience is a meaningful advantage at this level given where voice infrastructure is heading. Senior distributed systems judgment — load balancing, autoscaling, rate limiting, and traffic shaping for latency-sensitive workloads aren't concepts you reference, they're problems you've solved under pressure. Deep Kubernetes expertise — custom autoscalers, resource management, and health checking for stateful, streaming services; you've built Kubernetes automation that handled edge cases the off-the-shelf tooling couldn't. Strong technical leadership — you set direction, influence across teams without authority, bring clarity to ambiguous problems, and leave systems and teams meaningfully better than you found them. Sharp product intuition for developer platforms — you have genuine opinions about API ergonom