Senior/Staff Deep Reinforcement Learning Engineer

DoorDash · San Francisco, CA · $168k - $247k

full-time lead Posted 3 months ago

Apply Now Stand out: build a proof-of-work pitch →

Free GitHub-based preview. Direct apply stays one click away.

Get weekly job alerts like this →

Hiring for this role?

AI Market Demand Pack · $29 one-time

Compare this role's skills with the full AI hiring market. Get ranked demand, salary bands, leading companies, public source URLs, and a decision brief.

See the live sample →

agents healthcare robotics fine-tuning cloud reinforcement-learning distributed-systems deep-learning

About this role

About the Team Our DD Labs team builds real-time autonomous delivery systems. The Planning & Decision-Making group is investing heavily in deep reinforcement learning to move beyond classical planning, learning policies that generalize across novel driving scenarios, handle long-tail edge cases, and improve continuously from large-scale fleet data. Our models jointly handle prediction and planning in a single unified architecture. Our stack is pure JAX end-to-end: the same code you train with is the code that runs on the robot. No C++ rewrites, no TensorRT export. A new policy goes from training to on-vehicle deployment in minutes. About the Role As a Senior/Staff Deep RL Engineer, you will design, train, and deploy deep reinforcement learning policies that make real-time driving decisions for our autonomous vehicles. You will own the full lifecycle, from problem formulation and reward design through large-scale distributed training to on-vehicle inference. You'll help define how learned components compose with the rest of the autonomy stack to produce robust, shippable behavior. You’re excited about this opportunity because you will… Formulate complex driving tasks as RL problems with well-shaped reward functions and expressive state/action representations. Design and train model-based deep RL agents using GPU-accelerated simulation at massive scale, including improving the simulator itself. Build and maintain distributed training infrastructure in JAX across large compute clusters. Build agentic optimization systems that automatically improve code, run experiments, analyze metrics, and iterate on RL policies with minimal human intervention. We’re excited about you because… BS/MS/PhD in CS, EE, Robotics, or a related field, with a strong foundation in reinforcement learning and deep learning. You have proficiency in using AI coding tools (e.g., Claude Code, Codex, Cursor) in the full software development lifecycle, including designing, generating code, testing, monitoring and releasing software Hands-on experience training RL agents at scale, ideally in robotics, autonomous driving, or other real-time decision-making domains. Proficiency in JAX or a similar functional ML framework; comfort with JIT compilation, vectorized environments, and GPU-accelerated simulation. Deep grasp of core RL concepts: policy gradients, value functions, exploration-exploitation, model-based RL, reward shaping, and sim-to-real transfer. Data-driven mindset: comfortable building experiment pipelines, analyzing training runs, and letting metrics guide architectural decisions. Nice to Have Publications at top venues (NeurIPS, ICML, ICLR, CoRL, RSS, ICRA) on RL or learned planning. Experience building or working with GPU-accelerated simulators for RL training. Track record of shipping a learned component in a production robotics or autonomous vehicle stack. Notice Regarding Use of AI and Automated Tools: To streamline our hiring process, DoorDash utilizes an automated recruitment tool called Gem. How it works: Gem assists our recruiting team by evaluating job related qualifications and characteristics in connection with hiring. The tool is designed and used to support - rather than replace - human decision-making; trained personnel make final decisions with meaningful human review and oversight, and DoorDash does not use Gem or other AI-enabled tool in a manner that has the effect of subjecting applicants or employees to discrimination based on any protected characteristic or proxy or for engaging in any protected activity under applicable law. Data Retention, Privacy & Bias Audit: Data collected during this process is retained in accordance with our Candidate Privacy Policy and applicable state laws. In compliance with New York City Local Law 144, the independent bias audit summary for Gem is publicly available for review at our Careers Page . Notice to Applicants for Jobs Located in NYC or Remote Jobs Associated With Office in NYC Only We use Covey as part of our hiring and/or promotional process for jobs in NYC and certain features may qualify it as an AEDT in NYC. As part of the hiring and/or promotion process, we provide Covey with job requirements and candidate submitted applications. We began using Covey Scout for Inbound from August 21, 2023, through December 21, 2023, and resumed using Covey Scout for Inbound again on June 29, 2024. The Covey tool has been reviewed by an independent auditor. Results of the audit may be viewed here: Covey Compensation The successful candidate’s starting pay will fall within the pay range listed below and is determined based on job-related factors including, but not limited to, skills, experience, qualifications, work location, and market conditions. Base salary is localized according to an employee’s work location. Ranges are market-dependent and may be modified in the future. In addition to base salary, the compensation for