Staff AI research scientist

Writer · San Francisco, CA

full-time lead Posted 1 week ago

Apply Now Stand out: build a proof-of-work pitch →

Free GitHub-based preview. Direct apply stays one click away.

Get weekly job alerts like this →

Hiring for this role?

agents reinforcement-learning search nlp pre-training generative-ai llm pytorch

About this role

🚀 ABOUT WRITER WRITER is where the world's leading enterprises orchestrate AI-powered work. Our vision is to expand human capacity through superintelligence. And we're proving it's possible – through powerful, trustworthy AI that unites IT and business teams together to unlock enterprise-wide transformation. With WRITER's end-to-end platform, hundreds of companies like Mars, Marriott, Uber, and Vanguard are building and deploying AI agents that are grounded in their company's data and fueled by WRITER's enterprise-grade LLMs. Valued at $1.9B and backed by industry-leading investors including Premji Invest, Radical Ventures, and ICONIQ Growth, WRITER is rapidly cementing its position as the leader in enterprise generative AI. Founded in 2020 with office hubs in San Francisco, New York City, Austin, Chicago, and London, our team thinks big and moves fast, and we're looking for smart, hardworking builders and scalers to join us on our journey to create a better future of work with AI. 📐 ABOUT THE ROLE AI research at WRITER isn't just about publishing papers — it's about building the scientific foundation that powers some of the most ambitious enterprise AI deployments in the world. As a staff AI research scientist, you'll be at the center of that work. You'll drive a high-impact research agenda focused on large language models, agentic reasoning, and the system-level capabilities that make AI genuinely useful at enterprise scale. This is a rare opportunity to do research that matters twice over — advancing the field and shipping directly into products used by hundreds of thousands of people every day. We're at an inflection point. Enterprises are moving from experimenting with AI to deeply embedding it across their operations, and WRITER's models are the engine making that possible. The work you do here — on post-training, planning, multi-step reasoning, and agentic workflows — will directly shape how the next generation of enterprise AI behaves, performs, and scales. You'll have the resources, infrastructure, and cross-functional support to pursue ambitious ideas and bring them to life quickly. This role is hybrid, based out of our San Francisco or New York City hub. You'll report to our VP of AI research. 🦸🏻‍♀️ WHAT YOU'LL DO - Lead an independent, high-impact research agenda on large language models and agentic systems, owning projects from early hypothesis through model training, evaluation, and production deployment - Design and execute large-scale post-training experiments using supervised fine-tuning, reinforcement learning from human feedback (RLHF), RLAIF, DPO, and emerging alignment techniques — with a focus on improving multi-step reasoning, planning, and tool use in enterprise agentic workflows - Build novel evaluation benchmarks and methodologies that push beyond existing limitations, establishing rigorous measures for how well models perform on complex, real-world enterprise tasks - Develop scalable data synthesis and curation pipelines that generate the high-quality training signal driving model improvement — including LLM-as-judge frameworks, synthetic data generation, and adversarial dataset construction - Shape WRITER's model architecture and training roadmap by translating your research insights into concrete improvements to our enterprise-grade LLMs, working hand-in-hand with research engineering and product teams - Publish and present original research at top-tier venues — NeurIPS, ICLR, ICML, ACL, and others — representing WRITER at the frontier of the field and contributing to the broader scientific community - Mentor and uplevel fellow researchers and engineers on the team, helping set a high bar for scientific rigor, experimental design, and research culture ⭐️ WHAT YOU NEED - 7+ years of hands-on ML research experience, with deep expertise in large language model pre-training and post-training — you've trained models at scale, debugged distributed jobs, and shipped improvements that made a measurable difference - Expert-level knowledge of post-training methods including SFT, RLHF, RLAIF, DPO, GRPO, and related alignment and reasoning techniques, with a track record of applying them to real, production-grade systems - Strong command of Python and PyTorch (or JAX), with the engineering depth to build and scale training pipelines, evaluation infrastructure, and data synthesis workflows yourself — not just direct others to do it - A meaningful publication record at competitive ML/AI venues (NeurIPS, ICLR, ICML, ACL, EMNLP, or equivalent), evidencing your ability to originate ideas and execute on a multi-month research agenda independently - Hands-on experience designing or evaluating agentic systems — models that plan, reason through multi-step tasks, use tools, and recover gracefully from errors — with a nuanced understanding of where they break and how to fix them - A Ph.D. in Compu