Senior Software Engineer, Voice AI

Natera · Remote (US) · $125k - $156k

full-time senior Posted 1 month ago

Apply Now Stand out: build a proof-of-work pitch →

Free GitHub-based preview. Direct apply stays one click away.

Get weekly job alerts like this →

Hiring for this role?

AI Market Demand Pack · $29 one-time

Compare this role's skills with the full AI hiring market. Get ranked demand, salary bands, leading companies, public source URLs, and a decision brief.

See the live sample →

speech llm payments healthcare embeddings cloud agents api-design

About this role

Role Description This is a high-autonomy, high-agency position for a voice AI engineer who thrives at the intersection of real-time systems, conversational AI, and healthcare. You'll own the architecture and delivery of Natera's Voice AI platform — a production system handling thousands of patient calls daily that provides automated test status, identity verification, billing support, and intelligent routing to human agents. You'll work across the full voice AI stack: telephony, speech-to-text, LLM orchestration, text-to-speech, and analytics — building agentic conversational systems that directly improve patient access to their genetic testing results. This role requires deep understanding of the intricacies unique to voice AI: real-time audio streaming, turn-taking, interruption handling, latency optimization, and the orchestration challenges that distinguish voice from text-based AI systems. Your work will span two critical domains: 1. Voice AI Platform Engineering Design, build, and operate Natera's production voice AI system. This includes multi-agent orchestration, real-time WebSocket audio pipelines, telephony integration, and the voice-specific challenges of latency management, VAD tuning, barge-in handling, and ASR accuracy for medical terminology. 2. Agentic Conversational Architecture Architect and implement autonomous agent workflows that handle complex patient interactions end-to-end — identity verification, OTP validation, personalized test status delivery, billing inquiries, and intelligent escalation. You'll design tool-calling patterns, agent handoff logic, state management across conversation turns, and the analytics infrastructure needed to measure and improve call efficacy. What You'll Do Own the end-to-end voice AI architecture — from Twilio media streams through LLM orchestration to TTS output and call disposition Design and implement multi-agent systems using tool calling, agent handoffs, and shared conversation state for complex patient workflows Build and optimize real-time audio pipelines — WebSocket streaming, codec handling (mulaw/PCM), VAD configuration, and interruption management Architect analytics and observability infrastructure for voice-specific metrics: per-segment latency (STT/LLM/TTS), call efficacy, disposition accuracy, and ASR error rates Solve voice-specific challenges: turn-taking timing, silence detection thresholds, barge-in recovery, medical term recognition, and end-to-end latency optimization Integrate voice agents with internal services via secure authenticated APIs Drive platform reliability — eliminate single points of failure, implement multi-provider LLM failover, and design graceful degradation paths Collaborate with product and clinical operations to improve self-serve efficacy rates and reduce call escalations Mentor team members on voice AI best practices and contribute to architectural decisions What We're Looking For 5+ years of software engineering experience, with at least 2 years building production voice AI or conversational AI systems Deep experience with voice AI pipelines — you understand the end-to-end flow from telephony through STT, LLM processing, TTS, and back to the caller, and you've solved real problems at each stage Production experience with agentic architectures — multi-agent orchestration, tool calling, agent handoffs, memory/state management, and LLM-driven decision making in real-time conversation contexts Strong understanding of voice-specific challenges: VAD tuning, turn-taking, interruption/barge-in handling, latency budgets, audio codec management, and the differences between voice and text-based AI UX Hands-on experience with telephony systems — Twilio (media streams, SIP, IVR), or equivalent platforms with WebSocket-based audio streaming Proficiency in TypeScript/Node.js with strong async programming patterns; experience with NestJS or similar frameworks Experience with STT/TTS providers (Deepgram, OpenAI, ElevenLabs, Azure Speech) and understanding of ASR accuracy challenges (domain-specific vocabulary, noise handling) Production experience with LLM APIs — OpenAI (especially Realtime API), Anthropic Claude, or equivalent; prompt engineering for conversational agents High agency and autonomy — you don't wait for permission, detailed specs, or hand-holding. You unblock yourself, seek out the highest-impact work, and drive it to completion Excellent communication — you can translate complex voice AI architecture decisions for product and clinical stakeholders Preferred Experience in healthcare, biotech, or regulated environments (HIPAA, PHI handling, zero-retention architectures, BAA compliance) AWS infrastructure experience — ECS Fargate, Lambda, DynamoDB, Bedrock, Kafka/MSK, API Gateway, CDK Background in real-time systems: WebSocket lifecycle m