Senior Software Engineer, Machine Learning Infrastructure - Generative AI

DoorDash · San Francisco, CA · $203k - $299k

full-time senior Posted 21 hours ago

Apply Now Stand out: build a proof-of-work pitch →

Free GitHub-based preview. Direct apply stays one click away.

Get weekly job alerts like this →

Hiring for this role?

fine-tuning mlops data-pipeline embeddings generative-ai reinforcement-learning distributed-systems agents

About this role

About the Team DoorDash’s GenAI Platform team sits within Machine Learning Platform and builds the shared infrastructure that helps DoorDash, Wolt, and Deliveroo teams safely bring GenAI-powered products, agents, automation, and personalization to production. Our mission is to increase the velocity of business impact from GenAI. A central pillar of that work is running frontier open-weight LLMs and VLMs (such as GLM, Qwen, Kimi, and DeepSeek) ourselves — real-time GPU serving, high-throughput batch inference, and fine-tuning on autoscaling GPUs — delivering large cost and latency wins (for example, a billion embeddings produced roughly 20× cheaper and visual models served roughly 72% cheaper). We also own core platform surfaces including the LLM Gateway, Agent Gateway, evals infrastructure, guardrails, and cost attribution. About the Role You will join a small, high-leverage team building production infrastructure for Generative AI at DoorDash, leading the design and architecture of our open-weights model platform spanning inference and fine-tuning: real-time GPU serving, high-throughput batch inference, and model fine-tuning. You’ll set technical direction across model serving and inference engines, fine-tuning and training pipelines, GPU autoscaling and utilization, batch pipelines, backend services, and observability, and mentor engineers as you go. This role is ideal for a senior engineer who enjoys owning ambiguous, high-impact systems and pushing the cost/performance frontier of GPU inference and fine-tuning in a fast-moving technical area where product needs, model capabilities, vendor ecosystems, and cost/performance tradeoffs are evolving quickly. You’re excited about this opportunity because you will… Lead the design of infrastructure that helps DoorDash teams move GenAI ideas from prototype to production, increasing the velocity of business impact from AI across the company. Own and evolve our open-weights serving stack — real-time GPU endpoints, high-throughput batch inference, and fine-tuning (SFT/DPO/LoRA) — alongside the LLM Gateway, Agent Gateway, evals infrastructure, guardrails, and cost attribution. Architect scalable, high-performance systems for model serving, batch inference, GPU autoscaling, and fine-tuning that power real customer and internal automation use cases Push the cost and latency frontier of GPU inference — turning batch jobs that took days into hours and cutting inference cost by multiples — while giving product teams a clean choice across open-weight and closed-source models with reliability, fallback, observability, and cost controls built in. Build platforms that support rapid experimentation while meeting production standards for latency, scale, monitoring, SLOs, playbooks, and operational excellence. Partner closely with — and raise the technical bar for — ML engineers, product engineers, data scientists, and platform teams across DoorDash, Wolt, and Deliveroo to turn emerging GenAI capabilities into durable platform primitives. Set technical direction for the future of DoorDash’s centralized GenAI platform — including emerging directions such as reinforcement learning (RLHF/RLVR), agent optimization, and other post-training and agentic techniques — enabling the next generation of AI-powered products, agents, automation, and personalization. We’re excited about you because… B.S., M.S., or PhD. in Computer Science or equivalent 6+ years of industry experience in software engineering Deep backend engineering fundamentals, especially in Python and distributed systems. Track record of designing and owning production services, APIs, data pipelines, or ML infrastructure at scale. Experience operating systems in production, including observability, debugging, reliability, incident response, and performance/cost optimization. Deep hands-on experience with LLM inference and/or fine-tuning of open-weight models in production — serving (latency, throughput, batching, autoscaling, GPU utilization) and/or fine-tuning (SFT/DPO/LoRA). Demonstrated technical leadership: leading design across ambiguous, fast-moving technical areas, mentoring engineers, and turning customer use cases into reusable platform capabilities Proficiency in using AI coding tools (e.g., Claude Code, Codex, Cursor) in the full software development lifecycle, including designing, generating code, testing, monitoring and releasing software Nice To Haves Experience with LLM inference engines and serving frameworks (e.g., vLLM, SGLang, TensorRT-LLM) in production Experience with distributed/multi-node fine-tuning and training pipelines (SFT, DPO/RLHF, LoRA), including data preparation and evaluation GPU performance work — multi-node/distributed inference, KV-cache/memory optimization, quantization (FP8/INT8/AWQ/GPTQ), or cold-start/throughput tuning Experience with Kubernetes, cloud infrastructure (AWS/GCP), GPUs, serverless/elastic GPU