Senior Software Engineer, Voice AI
full-time
senior
Posted 21 hours ago
About this role
Role Description
This is a high-autonomy, high-agency position for a voice AI engineer who thrives at the intersection of real-time systems, conversational AI, and healthcare. You'll own the architecture and delivery of Natera's Voice AI platform — a production system handling thousands of patient calls daily that provides automated test status, identity verification, billing support, and intelligent routing to human agents.
You'll work across the full voice AI stack: telephony, speech-to-text, LLM orchestration, text-to-speech, and analytics — building agentic conversational systems that directly improve patient access to their genetic testing results. This role requires deep understanding of the intricacies unique to voice AI: real-time audio streaming, turn-taking, interruption handling, latency optimization, and the orchestration challenges that distinguish voice from text-based AI systems.
Your work will span two critical domains:
1. Voice AI Platform Engineering
Design, build, and operate Natera's production voice AI system. This includes multi-agent orchestration, real-time WebSocket audio pipelines, telephony integration, and the voice-specific challenges of latency management, VAD tuning, barge-in handling, and ASR accuracy for medical terminology.
2. Agentic Conversational Architecture
Architect and implement autonomous agent workflows that handle complex patient interactions end-to-end — identity verification, OTP validation, personalized test status delivery, billing inquiries, and intelligent escalation. You'll design tool-calling patterns, agent handoff logic, state management across conversation turns, and the analytics infrastructure needed to measure and improve call efficacy.
What You'll Do
Own the end-to-end voice AI architecture — from Twilio media streams through LLM orchestration to TTS output and call disposition
Design and implement multi-agent systems using tool calling, agent handoffs, and shared conversation state for complex patient workflows
Build and optimize real-time audio pipelines — WebSocket streaming, codec handling (mulaw/PCM), VAD configuration, and interruption management
Architect analytics and observability infrastructure for voice-specific metrics: per-segment latency (STT/LLM/TTS), call efficacy, disposition accuracy, and ASR error rates
Solve voice-specific challenges: turn-taking timing, silence detection thresholds, barge-in recovery, medical term recognition, and end-to-end latency optimization
Integrate voice agents with internal services via secure authenticated APIs
Drive platform reliability — eliminate single points of failure, implement multi-provider LLM failover, and design graceful degradation paths
Collaborate with product and clinical operations to improve self-serve efficacy rates and reduce call escalations
Mentor team members on voice AI best practices and contribute to architectural decisions
What We're Looking For
5+ years of software engineering experience, with at least 2 years building production voice AI or conversational AI systems
Deep experience with voice AI pipelines — you understand the end-to-end flow from telephony through STT, LLM processing, TTS, and back to the caller, and you've solved real problems at each stage
Production experience with agentic architectures — multi-agent orchestration, tool calling, agent handoffs, memory/state management, and LLM-driven decision making in real-time conversation contexts
Strong understanding of voice-specific challenges: VAD tuning, turn-taking, interruption/barge-in handling, latency budgets, audio codec management, and the differences between voice and text-based AI UX
Hands-on experience with telephony systems — Twilio (media streams, SIP, IVR), or equivalent platforms with WebSocket-based audio streaming
Proficiency in TypeScript/Node.js with strong async programming patterns; experience with NestJS or similar frameworks
Experience with STT/TTS providers (Deepgram, OpenAI, ElevenLabs, Azure Speech) and understanding of ASR accuracy challenges (domain-specific vocabulary, noise handling)
Production experience with LLM APIs — OpenAI (especially Realtime API), Anthropic Claude, or equivalent; prompt engineering for conversational agents
High agency and autonomy — you don't wait for permission, detailed specs, or hand-holding. You unblock yourself, seek out the highest-impact work, and drive it to completion
Excellent communication — you can translate complex voice AI architecture decisions for product and clinical stakeholders
Preferred
Experience in healthcare, biotech, or regulated environments (HIPAA, PHI handling, zero-retention architectures, BAA compliance)
AWS infrastructure experience — ECS Fargate, Lambda, DynamoDB, Bedrock, Kafka/MSK, API Gateway, CDK
Background in real-time systems: WebSocket lifecycle management, connection resilience, streaming protocols
Experience build
Similar Jobs
Related searches:
Get jobs like this delivered weekly
Free AI jobs newsletter. No spam.