{"access":{"advertiser_pricing_url":"https://aidevboard.com/pricing","catalog_url":"https://aidevboard.com/api/v1/catalog","description":"Public read endpoints are open and free. API keys are optional for stable agent identity and keyed hourly throttling.","docs_url":"https://aidevboard.com/docs","mode":"open","register_url":"https://aidevboard.com/api/v1/register"},"degraded":false,"estimated":false,"has_next":true,"jobs":[{"id":"9a7b6e3f-9ca5-4d74-8149-e52a00eeffdc","company_id":"3da82454-107f-427f-88e7-01f315ef93fb","title":"Applied Research - Evals \u0026 Data","slug":"applied-research-evals-data-2b9e0702","description":"OWN YOUR INTELLIGENCE\n\n\n\nPrime Intellect is building the open superintelligence stack: the infrastructure frontier AI labs build internally, made available to every ambitious AI team.\n\n\n\nOur platform, Lab, unifies compute, environments, evaluations, secure sandboxes, high-performance training, and deployment into one full-stack system for post-training at frontier scale - from SFT and RL to tool use, agent workflows, and continuously improving production models. We are building open frontier AI: open-source models trained end to end for long-horizon tasks like autonomous research, and the full-stack platform our own research team uses to build them. The next generation of AI companies, enterprises, and research teams do not just need more GPUs. They need the ability to turn their own workflows, tools, data, and feedback loops into superintelligence they own.\n\n\n\nPrime Intellect has raised $150M in total funding from Founders Fund, Radical Ventures, NVIDIA, and exceptional AI, infrastructure, and enterprise operators — including Andrej Karpathy, Dwarkesh Patel, and leaders and founders from Ramp, Perplexity, Harvey, Mercor, Zapier, Datadog, Cognition, OpenAI, Thinking Machines, Together AI, SemiAnalysis, LangChain, Browserbase, Cloudflare, Sierra, Databricks, Airbnb, OpenRouter, Standard Intelligence, Fleet, Core Auto, and more. We are looking for people who want to build at the intersection of frontier research, real infrastructure, and go-to-market for a category that does not fully exist yet.\n\n\nRole Impact\n\nThis is a customer facing role at the intersection of cutting-edge RL/post-training methods, applied data, and agent systems. You’ll have a direct impact on shaping how advanced models are aligned, evaluated, deployed, and used in the real world by:\n\n - Advancing Agent Capabilities: Designing and iterating on next-generation AI agents that tackle real workloads—workflow automation, reasoning-intensive tasks, and decision-making at scale. Working with applied data from real deployments to continuously refine policies, improve reasoning, and enhance reliability and safety.\n\n - Building Robust Infrastructure: Developing the distributed systems, evaluation pipelines, and coordination frameworks that enable these agents to operate reliably, efficiently, and at massive scale. Building data capture, processing, and versioning workflows for feedback, model traces, and reward signals.\n\n - Bridge Between Customers \u0026 Research: Translating customer needs and insights from applied data into clear technical requirements that guide product and research priorities. Collaborating closely with RL and eval teams to ensure real-world signals inform model alignment and reward shaping.\n\n - Prototype in the Field: Rapidly designing and deploying agents, evals, and harnesses alongside customers to validate solutions. Using applied evaluation data to iterate on model performance and discover new capabilities.\n\n\nCustomer-Facing Engineering\n\n - Work side-by-side with customers to deeply understand workflows, data sources, and bottlenecks.\n\n - Prototype agents, data pipelines, and eval harnesses tailored to real use cases, then hand off hardened systems to core teams.\n\n - Translate customer insights and evaluation results into roadmap and research direction.\n\n\nPost-training \u0026 Reinforcement Learning\n\n - Design and implement novel RL and post-training methods (RLHF, RLVR, GRPO, etc.) to align large models with domain-specific tasks.\n\n - Build evaluation harnesses and verifiers to measure reasoning, robustness, and agentic behavior in real-world workflows.\n\n - Integrate applied data collection and analytics into the post-training process to surface regressions, emergent skills, and alignment opportunities.\n\n - Prototype multi-agent and memory-augmented systems to expand capabilities for customer-facing solutions.\n\n\nAgent Development \u0026 Infrastructure\n\n - Rapidly prototype and iterate on AI agents for automation, workflow orchestration, and decision-making.\n\n - Extend and integrate with agent frameworks to support evolving feature requests and performance requirements.\n\n - Architect and maintain distributed training and inference pipelines, ensuring scalability and cost efficiency.\n\n - Develop observability and monitoring (Prometheus, Grafana, tracing) to ensure reliability and performance in production deployments.\n\n\nRequirements\n\n - Strong background in machine learning engineering, with experience in post-training, RL, or large-scale model alignment.\n\n - Experience with applied data workflows and evaluation frameworks for large models or agents (e.g., SWE-Bench, HELM, EvalFlow, internal eval pipelines).\n\n - Deep expertise in distributed training/inference frameworks (e.g., vLLM, sglang, Ray, Accelerate).\n\n - Experience deploying containerized systems at scale (Docker, Kubernetes, Terraform).\n\n - Track record of research contributions (publications, open-source contributions, benchmarks) in ML/RL.\n\n - Passion for advancing the ","salary_min":150000,"salary_max":300000,"location":"San Francisco, CA","workplace":"remote","remote_scope":"unknown","job_type":"full-time","experience_level":"senior","tags":["reinforcement-learning","agents","data-pipeline","distributed-systems","llm","research","evaluation"],"apply_url":"https://jobs.ashbyhq.com/PrimeIntellect/bbfe94a6-d1a8-47e9-86af-f117277cdacb/application","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-07-08T18:34:09.743Z","expires_at":"2026-08-15T14:10:47.08875Z","created_at":"2026-04-13T15:01:32.581029Z","updated_at":"2026-07-16T14:10:47.245952Z","company_name":"Prime Intellect","company_slug":"PrimeIntellect","company_logo_url":"https://www.google.com/s2/favicons?domain=primeintellect.ai\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/9a7b6e3f-9ca5-4d74-8149-e52a00eeffdc"},{"id":"4aa05e90-1b2a-4e5a-b2e0-4035bd5fc1fe","company_id":"b459414f-fd43-42c4-a6e1-f07225286a75","title":"Evaluations - Member of Technical Staff","slug":"evaluations-member-of-technical-staff-8a4ec4b6","description":"ABOUT THE COMPANY\n\nPilots don't train with real passengers. Actors don't rehearse with real audiences. Yet, the most consequential decisions in society are often pushed straight to production.\n\n\n\nSimile is changing that. We have built the first AI simulation of society, populated by generative agents based on real humans. Our research pioneered the field of AI-based simulation, proving it is possible to model human behavior with high accuracy. Today, we are developing a Foundation Model to predict human behavior in any situation, at any scale.\n\n\n\nWe are backed by $100M in funding led by Index Ventures, with participation from Hanabi, A*, Bain Capital Ventures, and AI visionaries including Andrej Karpathy, Fei-Fei Li, Adam D'Angelo, and Guillermo Rauch.\n\n\n\n\n\n\n\nABOUT THE ROLE\n\nAs a Member of Technical Staff, Model Evaluations at Simile, you will build the measurement systems that determine whether our simulations of human behavior are accurate, trustworthy, and useful enough to guide real-world decisions. You will help shape what Simile measures, the quality bars we defend, and how evaluation evidence guides model, product, and customer decisions.\n\n\n\nEvaluation at Simile brings together model evals, statistics, behavioral science, research methodology, product quality, and human judgment. Our models simulate people, populations, markets, and groups, which means our evals must reason about distributions, noisy human ground truth, uncertainty, qualitative outputs, behavioral data, and customer decision-making. You will work with unusually rich data about human behavior, including surveys, long-form interviews, customer studies, qualitative research, and behavioral signals such as transactions, product interactions, and other real-world traces.\n\n\n\nWe are hiring across several forms of expertise. Some candidates may be deep in LLM evaluation, model training, and research engineering. Others may bring exceptional strength in statistics, behavioral science, survey methodology, human data, product evaluation, or experimentation. Across backgrounds, we are looking for people who can reason clearly, build quickly, use agentic coding tools fluently, and take hands-on ownership of ambiguous evaluation problems.\n\n\n\nThe core question for this role is simple: How do we know when a simulation of human behavior is good enough to trust?\n\n\n\n\nIN THIS ROLE, YOU WILL:\n\n - Build the measurement layer for behavioral simulation: Design evals, metrics, rubrics, datasets, dashboards, and workflows that measure whether Simile’s models are accurately predicting human behavior across customer use cases, populations, question types, and decision contexts.\n\n - Partner with modeling to improve models: Evaluate new model versions, diagnose regressions, identify priority areas for model-improvement cycles, and maintain stable eval suites that represent capabilities customers actually care about.\n\n - Contribute to product and applied evals: Build evals for qualitative responses, retrieval, survey generation, AI-generated research reports, customer-facing outputs, and other product surfaces where model quality directly shapes customer trust. Turn subjective quality concerns into concrete rubrics, labeled data, automated graders, release criteria, and model-improvement signals.\n\n - Make ground truth and uncertainty legible: Develop rigorous ways to compare simulated responses against human data, customer studies, Simile-collected ground truth, and behavioral datasets. Help the company reason about sampling error, uncertainty, calibration, margin of error, representativeness, and what “ground truth” means when human behavior is inherently noisy.\n\n - Automate evaluation workflows: Use modern agentic coding tools to rapidly build internal tools, inspect model outputs, create labeling workflows, validate evals, and turn fuzzy evaluation questions into working systems. We value people who can compress long, ambiguous projects into fast, useful prototypes without losing sight of rigor or reliability.\n\n - Help define the future of behavioral simulation evals: Prototype ways to evaluate behavioral predictions using diverse sources of data, including transaction or purchase behavior, product interactions, intervention response, first-party experiments, and eventually multi-agent group settings.\n\n\n\n\nREQUIREMENTS\n\n\nMUST HAVES\n\n - Evaluation Taste: You have strong intuition for what makes an eval meaningful, robust, and decision-relevant. You can explain what an eval measures, what it does not measure, how it can be gamed, and why it should or should not affect a model or product decision.\n\n - LLM and Model Fluency: You understand the basics of modern LLM training, post-training, model evaluation, and hill-climbing. You do not need to be a modeling specialist, but you can read model outputs, understand modeling team needs, and reason about whether a model change actually improved the thing we care about.\n\n - Statistical Judgment: You are comfortable r","salary_min":200000,"salary_max":400000,"location":"San Francisco, CA","workplace":"onsite","remote_scope":"not_remote","job_type":"full-time","experience_level":"lead","tags":["llm","search","generative-ai","agents","evaluation","research"],"apply_url":"https://jobs.ashbyhq.com/simile/33d75074-c23b-4a1f-bfdb-129bcc5be662/application","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-06-30T22:51:51.838Z","expires_at":"2026-08-15T14:11:46.353112Z","created_at":"2026-07-01T14:10:44.230358Z","updated_at":"2026-07-16T14:11:46.552764Z","company_name":"Simile","company_slug":"simile","company_logo_url":"https://www.google.com/s2/favicons?domain=simile.ai\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/4aa05e90-1b2a-4e5a-b2e0-4035bd5fc1fe"},{"id":"255321d4-5ae5-4873-9105-ae267dcc102c","company_id":"b6db41bc-ba14-4906-b2f7-a3ce9289a346","title":"User Researcher, AI Evaluations","slug":"user-researcher-ai-evaluations-f04d2a5f","description":"WHO WE ARE\n\nNotion is the collaborative AI workspace where teams and agents think together https://www.youtube.com/watch?v=vkpYpWfEK5s. We're building one place where your knowledge, projects, meetings, and AI tools live side by side, so work is faster, clearer, and less fragmented. Millions of individuals, small teams, and large companies run their work on Notion.\n\n\n\nNotinos (our employees) are customer zero in bringing this future of work to life. We care about craft, building things that last, and the belief that great work is still fundamentally human. Our goal isn’t to ship the next feature. Each and every team of Notinos is working to set the standard for how humans work together in the AI era. From building a business’s system of record to making and managing AI agents to automating away the busy work, we care deeply about giving our customers more time for their life’s work.\n\n\n\n\nABOUT THE ROLE:\n\nWe’re seeking an experienced UX Researcher to define and scale how we evaluate Notion’s AI-powered experiences—focusing on what “good” looks like not only for model output quality, but for the end-to-end product experience where people discover, set goals, delegate work, review results, and build trust over time with AI.\n\n\n\nThis role sits at the intersection of research craft and evaluation operations: you’ll run studies that uncover user mental models, expectations, and failure/recovery behaviors, then translate those insights into reusable rubrics, workflows, and measurement approaches that product, design, engineering, and data science can apply consistently.\n\n\n\nThis role can be based in either San Francisco or New York City. We work from our offices on Mondays, Tuesdays and Thursdays (our Anchor Days) because we do our best thinking and building together in person. We’re looking for someone who’s excited to work alongside the team during those days.\n\n\n\n\nWHAT YOU'LL ACHIEVE:\n\n - Define what “good” looks like (frameworks \u0026 rubrics): Establish clear, reusable evaluation criteria that reflect real user expectations—helpfulness, trust, tone, control, and transparency. You’ll translate qualitative insight into scoring guidance that can be applied consistently across teams and over time.\n\n - Run recurring evals (longitudinal \u0026 feature-specific): Run recurring longitudinal and feature-specific surveys and studies to measure experience quality over time against defined rubrics. Lead qualitative studies, side-by-side comparisons, and human-in-the-loop evaluation efforts to deepen understanding of where experiences break down and how they can improve. You’ll help teams spot regressions, benchmark improvements, and understand when expectations shift.\n\n - Anchor evaluation in real workflows (context \u003e isolated feedback): Ensure evals reflect jobs-to-be-done, user intent, and the full interaction journey (goal setting, delegation, review, iteration), not just decontextualized thumbs up/down. You’ll help teams understand who is evaluating, what they’re trying to do, and why outputs succeed or fail.\n\n - Identify failure modes \u0026 recovery behavior (guardrails): Uncover breakdowns, regressions, and edge cases across the system—from model behavior to UI and integrations—and study how people notice issues, correct them, and continue their work. You’ll turn these insights into actionable guidance for guardrails, fixes, and prioritization.\n\n - Operationalize evaluation with partners (process \u0026 tooling): Collaborate closely with Product, Design, Engineering, and Data Science to align on target use cases and build scalable evaluation loops (human-in-the-loop review, longitudinal studies, and calibration of automated/LLM-judge approaches against human judgment).\n\n\n\n\nSKILLS YOU'LL NEED TO BRING:\n\n - Ability to operationalize insight into measurement: You’re comfortable turning “soft” user expectations (trust, tone, usefulness, clarity) into concrete rubrics, scoring guidelines, and observable metrics.\n\n - AI fluency and systems thinking: You’re curious and hands-on with AI products, and can reason about how model behavior, uncertainty, and system constraints shape user experience. You also have experience evaluating AI-enabled products (LLMs, agents, generative UI/workflow automation) and working with Data Science/ML partners on measurement strategy and evaluation tooling.\n\n - Clear communication and impact orientation: You can align diverse partners around shared definitions of quality and create artifacts that enable teams to act consistently. You tailor storytelling to different audiences, connect research to business outcomes, and drive follow-through so insights translate into product change.\n\n - Strong UX research craft (quant + qual): You can choose the right methods for the question— interviews, benchmarking, surveys, experiments—and synthesize into actionable guidance. You also can prioritize ruthlessly, work through ambiguity, and balance scrappy iteration with deep d","salary_min":196000,"salary_max":230000,"location":"San Francisco, CA","workplace":"remote","remote_scope":"unknown","job_type":"full-time","experience_level":"senior","tags":["llm","agents","research","evaluation"],"apply_url":"https://jobs.ashbyhq.com/notion/0e9114bd-4603-4bdf-a86f-a7a4f390fae8/application","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-06-18T23:15:14.006Z","expires_at":"2026-08-15T14:03:39.52173Z","created_at":"2026-06-28T14:03:08.935234Z","updated_at":"2026-07-16T14:03:39.635762Z","company_name":"Notion","company_slug":"notion","company_logo_url":"https://www.google.com/s2/favicons?domain=notion.so\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/255321d4-5ae5-4873-9105-ae267dcc102c"},{"id":"edd81527-9709-4b01-8b95-c78ef07b4bc1","company_id":"6ce2d21e-b00f-4343-9bd0-5ac62ff81431","title":"Senior Machine Learning Engineer, Simulation Evaluation","slug":"senior-machine-learning-engineer-simulation-evaluation-2b2339ea","description":"Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building the Waymo Driver—The World's Most Experienced Driver™—to improve access to mobility while saving thousands of lives now lost to traffic crashes. The Waymo Driver powers Waymo’s fully autonomous ride-hail service and can also be applied to a range of vehicle platforms and product use cases. The Waymo Driver has provided over ten million rider-only trips, enabled by its experience autonomously driving over 100 million miles on public roads and tens of billions in simulation across 15+ U.S. states.\n The Challenge \n Waymo’s simulator is one of the most complex virtual environments ever built. It blends deterministic logic, physical dynamics, and state-of-the-art Generative AI to create a training ground for the Waymo Driver. The Simulator Evaluation team faces the ultimate data challenge: How do you mathematically prove that a virtual world is \"real\" ?\n We are seeking visionary machine learning engineers and researchers to architect the scalable deep learning systems, novel data workflows, and eval tools that power our research roadmap. In this role, you will pioneer the machine learning and generative vision paradigms required to define and measure the realism of our multimodal world models. Your work will define the state of the art for autonomous simulation, directly steering our research trajectory and the capabilities of the Waymo Driver.\n You will: \n \n Lead the design, development and deployment of cutting-edge evaluation approaches to assess realism of state-of-the-art multimodel world models and generative systems for simulation use cases at Waymo.\n Architect and implement robust and scalable machine learning pipelines for tuning, evaluating, and deploying large-scale discriminator models for the purposes of simulator realism evaluation. \n Evaluate open-source and production-ready video generation techniques that measure realism (e.g. temporal stability, multi-modal consistency, geometric discrepancy, condition following, etc.)\n Apply vision language models to evaluate semantic understanding and controllability across our world simulation products.\n Collaborate with research teams across Waymo and Alphabet to integrate advancements in 4D world modeling and generative AI into production systems.\n Mentor engineers on the team and provide technical guidance on architecture and execution.\n \n You have: \n \n Bachelor's, Master's, or PhD in computer science, machine learning, robotics, or a related field.\n Five or more years of experience in machine learning engineering or applied deep learning, supported by a portfolio of shipped products or peer-reviewed publications.\n Proficient programming skills in Python and hands-on experience with modern machine learning frameworks such as Jax, Flax, or PyTorch.\n Experience designing and implementing evaluation frameworks for complex systems or machine learning models.\n \n We prefer: \n \n Track record of training large-scale generative models (diffusion models, flow matching, vision language models, etc.)\n A PhD and demonstrated success delivering machine learning products focused on 3D generative models, world models, or video generation.\n Experience simulating sensor data, including camera, lidar, and radar, or modeling semantic scenes.\n Experience developing autonomous systems, robotics software, or autonomous vehicle simulations.\n Experience training and optimizing large-scale models on GPU or TPU clusters for efficient production serving.\n Professional experience writing C++ for high-performance production systems.\n The expected base salary range for this full-time position across US locations is listed below. Actual starting pay will be based on job-related factors, including exact work location, experience, relevant training and education, and skill level. Your recruiter can share more about the specific salary range for the role location or, if the role can be performed remote, the specific salary range for your preferred location, during the hiring process.  \n Waymo employees are also eligible to participate in Waymo’s discretionary annual bonus program, equity incentive plan, and generous Company benefits program, subject to eligibility requirements.  \n Salary Range\n $213,000 — $263,000 USD","salary_min":213000,"salary_max":263000,"location":"Mountain View, CA","workplace":"onsite","remote_scope":"not_remote","job_type":"full-time","experience_level":"senior","tags":["diffusion-models","generative-ai","robotics","pytorch","deep-learning","autonomous-vehicles","machine-learning","evaluation"],"apply_url":"https://careers.withwaymo.com/jobs?gh_jid=8001797","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-06-13T17:49:43Z","expires_at":"2026-08-15T14:05:09.163382Z","created_at":"2026-06-28T14:04:26.926315Z","updated_at":"2026-07-16T14:05:09.29757Z","company_name":"Waymo","company_slug":"waymo","company_logo_url":"https://www.google.com/s2/favicons?domain=waymo.com\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/edd81527-9709-4b01-8b95-c78ef07b4bc1"},{"id":"4b255973-955d-4d22-ba64-dc9d0c08e001","company_id":"053355fc-0162-4bb9-b414-cbf7679ee9c8","title":"Director, Research - Evaluation \u0026 Training","slug":"director-research-evaluation-training-067f35f5","description":"About Snorkel \n At Snorkel, we believe meaningful AI doesn’t start with the model, it starts with the data.\n We’re on a mission to help enterprises transform expert knowledge into specialized AI at scale. The AI landscape has gone through incredible changes since 2015, when Snorkel started as a research project in the Stanford AI Lab, to the generative AI breakthroughs of today. But one thing has remained constant: the data you use to build AI is the key to achieving differentiation, high performance, and production-ready systems. We work with some of the world’s largest organizations to empower scientists, engineers, financial experts, product creators, journalists, and more to build custom AI with their data faster than ever before. Excited to help us redefine how AI is built? Apply to be the newest Snorkeler!\n ABOUT THE ROLE  \n We're looking for a manager to lead a team of researchers to focus on data evaluation, error analysis and data valuation methods to predict model performance. This team is responsible for showcasing the value and quality of Snorkel’s data for model training and evaluation, understanding where today's frontier models fall short, and turning that understanding into a point of view on what benchmarks and datasets these models will benefit from. \n You and your team will be responsible for Snorkel’s data design flywheel by analyzing model failures, finding capability and skill gaps in current models, suggesting the next benchmarks to invest in and then proving the value of this data for our customers. \n  \n MAIN RESPONSIBILITIES  \n \n Own a multi-quarter roadmap centered on novel evaluation, error analysis, and data valuation techniques\n Synthesize and share trends from model-failure analysis and benchmarking into recommendations on the datasets the community should focus on and the ones Snorkel should invest in — making this team a primary input to the company's data strategy.\n Focus on data valuation techniques that quantify how Snorkel data meaningfully improves model performance\n Lead and grow a team of researchers, setting a high bar for quality, rigor and speed of execution\n Act as the primary bridge between the team's findings and Product, GTM, and our customers\n \n  \n PREFERRED QUALIFICATIONS  \n \n 7+ years in applied AI, ML, or research roles, with 4+ years managing technical teams.\n A leader who has repeatedly turned research and analysis into business outcomes, and who instinctively connects technical findings to market and customer needs.\n Strong business and market judgment in the AI/ML space — you understand the competitive and frontier-lab landscape and can prioritize accordingly.\n Technically conversant and credible: enough depth in LLM evaluation, benchmarking, and model behavior analysis to set direction, judge experimental quality, and pressure-test results — without needing to be the deepest technical expert in the room.\n A nose for trends: able to look across many evaluation results and failure cases and extract the signal that should drive what gets built next.\n Excellent communication and storytelling skills, with the ability to make technical results legible and persuasive to non-research audiences.\n Familiarity with data valuation or data attribution research is a strong plus.\n Bonus: experience working with frontier labs, public benchmarks, or commercial AI data/eval products.\n Actual compensation will be determined based on factors including skills, qualifications, experience, and geographic location.\n Salary range(s) for this role\n $275,000 — $425,000 USD \n Be Your Best at Snorkel \n Joining Snorkel AI means becoming part of a company that has market proven solutions, robust funding, and is scaling rapidly—offering a unique combination of stability and the excitement of high growth. As a member of our team, you’ll have meaningful opportunities to shape priorities and initiatives, influence key strategic decisions, and directly impact our ongoing success. Whether you’re looking to deepen your technical expertise, explore leadership opportunities, or learn new skills across multiple functions, you’re fully supported in building your career in an environment designed for growth, learning, and shared success.\n Snorkel AI is proud to be an Equal Employment Opportunity employer and is committed to building a team that represents a variety of backgrounds, perspectives, and skills. Snorkel AI embraces diversity and provides equal employment opportunities to all employees and applicants for employment. Snorkel AI prohibits discrimination and harassment of any type on the basis of race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state, or local law. All employment is decided on the basis of qualifications, performance, merit, and business need. \n We will ensure that individuals with dis","salary_min":275000,"salary_max":425000,"location":"San Francisco, CA","workplace":"onsite","remote_scope":"not_remote","job_type":"full-time","experience_level":"lead","tags":["llm","generative-ai","evaluation","research"],"apply_url":"https://job-boards.greenhouse.io/snorkelai/jobs/6020877004","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-06-11T22:43:31Z","expires_at":"2026-08-15T14:03:52.355771Z","created_at":"2026-06-28T14:03:22.023885Z","updated_at":"2026-07-16T14:03:52.480004Z","company_name":"Snorkel AI","company_slug":"snorkel-ai","company_logo_url":"https://www.google.com/s2/favicons?domain=snorkel.ai\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/4b255973-955d-4d22-ba64-dc9d0c08e001"},{"id":"d8e4dfdd-920a-4d58-9605-44f850da2a35","company_id":"e8c9f3a5-9310-43f5-9341-321fe6d93a92","title":"Director of Platform Management for Simulation, Evaluation \u0026 Validation","slug":"director-of-platform-management-for-simulation-evaluation-validation-74af71dd","description":"About us    \n Founded in 2017, Wayve is the leading developer of Embodied AI technology.  Our advanced AI software and foundation models enable vehicles to perceive, understand, and navigate any complex environment, enhancing the usability and safety of automated driving systems.\n Our vision is to create autonomy that propels the world forward.  Our intelligent, mapless, and hardware-agnostic AI products are designed for automakers, accelerating the transition from assisted to automated driving.  In our fast-paced environment big problems ignite us—we embrace uncertainty, leaning into complex challenges to unlock groundbreaking solutions. We aim high and stay humble in our pursuit of excellence, constantly learning and evolving as we pave the way for a smarter, safer future.\n At Wayve, your contributions matter.  We value diversity, embrace new perspectives, and foster an inclusive work environment; we back each other to deliver impact.  \n Make Wayve the experience that defines your career!  \n The Role  \n As Director of Platform Management for Simulation, Evaluation \u0026 Validation , you will lead the overall product vision and director for the platforms that Wayve uses to develop, evaluate, and validate the AI Driver. This includes open and closed loop simulation platforms, the evaluation test platform, synthetic data production, critical event pipelines, validation, and the workflows that connect them.\n You will directly lead and indirectly influence sub team platform leads across Simulation, Evaluation, Data, and Validation to convert a complex set of cross-cutting initiatives into a single, opinionated product strategy. The job is part developer-tools PM (your users are engineers and scientists who will tell you immediately when something is slow, wrong, or in the way), part platform PM (your roadmap has to compose across simulators, datasets, evaluation, and triage), and part safety-systems PM (the outputs gate releases that put cars on real roads).\n You will help shape and set vision for how our tools and services should function and work in an AV2.0 context - a Wayve-pioneered approach to both the driving stack and offline development.\n Key Responsibilities \n \n Set product vision and strategy for the platforms. Define a multi-year, multi-quarter product strategy that takes Wayve from a collection of capable tools to a unified, opinionated platform for evaluating and validating the AI Driver. Anchor the strategy in measurable customer outcomes — developer velocity, signal quality, cost per evaluation, time-to-insight, validation credibility.\n Own the product roadmap end-to-end. Translate company-level priorities into a coherent roadmap across simulation, evaluation, and validation initiatives. Drive trade-offs between different internal and external customer groups. Ensure resourcing decisions are clearly communicated to other director level stakeholders.\n Lead and grow the team. Manage and mentor the product managers embedded across. Hire to fill gaps, level up the craft, and build a highly capable lean team.\n Build world class developer experiences. Run regular user research with Autonomy, Science, Validation, Release, and Product teams. Maintain a clear picture of what each customer segment needs from the platform, where the friction is, and which gaps are quietly bleeding velocity. \n Define and defend quality bars for an internal platform. Set platform-wide standards for reliability, latency, cost per unit (per simulation, per evaluation, per enriched hour), self-service, observability, and API stability. Treat platform regressions the way a consumer team treats a churn spike.\n Establish regular operational cadences. Lead monthly business reviews and close the loop on requests and progress reporting with the leadership team. Lead quarterly planning, KPI reviews, and the trade-off conversations.\n Represent the platform externally. Help Wayve's leadership tell a credible story about how we develop and validate the AI Driver — to OEM partners, regulators, and at relevant industry forums.\n \n About You  \n In order to set you up for success at Wayve, we’re looking for the following skills and experience.  \n Essential  \n \n 0+ years of product management experience , with at least 3–5 years leading PM teams (PMs and/or senior PMs reporting to you).\n Track record building internal platforms, developer tools, or ML/AI evaluation infrastructure — products whose users are engineers and scientists, where the bar is set by power users with strong opinions and the ability to route around you if the tool isn't good enough.\n Data driven . You always seek to answer questions with data and lean into defining the right customer facing KPIs indicative of success\n Systems thinking across hardware, AI, and product. You understand how decisions in one part of the stack (data, simulator fidelity, metric design) propagate into outcomes (release confidence, on-road performance, regulator trust) els","salary_min":332000,"salary_max":415000,"location":"London, UK","workplace":"hybrid","remote_scope":"not_remote","job_type":"full-time","experience_level":"lead","tags":["autonomous-vehicles","data-pipeline","robotics","generative-ai","evaluation"],"apply_url":"https://wayve.firststage.co/jobs?gh_jid=8585026002","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-06-11T19:43:11Z","expires_at":"2026-08-15T14:13:57.567438Z","created_at":"2026-06-28T14:12:39.528441Z","updated_at":"2026-07-16T14:13:57.687438Z","company_name":"Wayve","company_slug":"wayve","company_logo_url":"https://www.google.com/s2/favicons?domain=wayve.ai\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/d8e4dfdd-920a-4d58-9605-44f850da2a35"},{"id":"d0c707f8-46ab-4546-a1fe-71cf6c2db09a","company_id":"6ce2d21e-b00f-4343-9bd0-5ac62ff81431","title":"Senior Software Engineer, ML/Eval Data Platforms \u0026 Infrastructure","slug":"senior-software-engineer-mleval-data-platforms-infrastructure-02234f34","description":"Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building the Waymo Driver—The World's Most Experienced Driver™—to improve access to mobility while saving thousands of lives now lost to traffic crashes. The Waymo Driver powers Waymo’s fully autonomous ride-hail service and can also be applied to a range of vehicle platforms and product use cases. The Waymo Driver has provided over ten million rider-only trips, enabled by its experience autonomously driving over 100 million miles on public roads and tens of billions in simulation across 15+ U.S. states.\n The Planner Evaluation team works on one of the key challenges in autonomous driving: measuring and improving the quality of the software that drives the car. We are looking for experienced data-minded software engineers and data scientists to help us improve how we characterize and evaluate changes to the Onboard software stack (Planner, Perception, etc). If you are passionate about autonomous vehicles and how to use rich, complex data to drive decision making, this is the role for you!\n This role follows a hybrid work schedule and reports to an Engineering Manager. \n  \n You will:\n \n Develop and productionize data pipelines to generate high-quality ML and evaluation datasets, streamlining curation, sampling, and slicing for training and testing the Waymo Driver software.\n Design and implement robust tools and infrastructure for data mining, exploration, and analysis to extract insights from large-scale datasets and drive data-driven decisions.\n Architect, build, and maintain large-scale data platforms to process Waymo driving logs and simulation data, ensuring dataset generation is fresh, accurate, and complete.\n Own and execute complex projects, successfully translating ambiguous requirements into high-impact deliverables.\n Collaborate cross-functionally with Data Science and Quantitative Analytics teams to identify their data needs and engineer infrastructure solutions.\n \n You have:\n \n Education: Master’s degree or PhD in Computer Science, Engineering, or a related technical field.\n Experience: 3+ years of professional software engineering experience.\n Distributed Systems: Hands-on experience with systems that ingest, store, transform, and output data at scale.\n Programming Languages: Proficiency in C++ or Python within a production environment.\n Communication: Strong technical communication and collaboration skills.\n \n  \n We prefer:\n \n Experience with A/B experiment infrastructure.\n Exposure to ad-hoc data analysis utilizing SQL.\n Prior experience working within the autonomous vehicle (AV) industry.\n \n  \n #Hybrid\n The expected base salary range for this full-time position across US locations is listed below. Actual starting pay will be based on job-related factors, including exact work location, experience, relevant training and education, and skill level. Your recruiter can share more about the specific salary range for the role location or, if the role can be performed remote, the specific salary range for your preferred location, during the hiring process.  \n Waymo employees are also eligible to participate in Waymo’s discretionary annual bonus program, equity incentive plan, and generous Company benefits program, subject to eligibility requirements.  \n Salary Range\n $213,000 — $263,000 USD","salary_min":213000,"salary_max":263000,"location":"Mountain View, CA","workplace":"hybrid","remote_scope":"not_remote","job_type":"full-time","experience_level":"senior","tags":["data-pipeline","autonomous-vehicles","fine-tuning","distributed-systems","evaluation","infrastructure"],"apply_url":"https://careers.withwaymo.com/jobs?gh_jid=7991303","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-06-09T17:21:44Z","expires_at":"2026-08-15T14:05:10.204056Z","created_at":"2026-06-28T14:04:28.292816Z","updated_at":"2026-07-16T14:05:10.325102Z","company_name":"Waymo","company_slug":"waymo","company_logo_url":"https://www.google.com/s2/favicons?domain=waymo.com\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/d0c707f8-46ab-4546-a1fe-71cf6c2db09a"},{"id":"2087e015-ee4e-49c3-b24d-576611c371ec","company_id":"a0000000-0000-0000-0000-000000000001","title":"Staff+ Software Engineer, Safeguards Evals ","slug":"software-engineer-safeguards-evals-c7ee0bf5","description":"About Anthropic \n Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.\n About the role\n How do we know our safety systems actually catch misuse? Anthropic increasingly uses AI to investigate potential misuse of Claude — analyzing real-world traffic to surface bad actors, policy violations, and emerging threats. Its findings inform enforcement actions and model launch decisions, which means we need rigorous, trustworthy answers to questions like: Does the monitoring agent catch what it should? Where does it fail? Does it stay reliable as adversaries adapt, as models improve, and as the agent itself changes?\n This role builds the evaluation infrastructure that answers those questions. You'll sit at the intersection of applied ML research and engineering — designing experiments to measure how well an investigative agent performs across harm areas, building datasets that represent real abuse rather than synthetic benchmarks, and shipping those methods into pipelines that gate every change to the system. Your work directly determines how much trust Anthropic can place in its automated abuse detection, and where we invest to make it better.\n Key responsibilities\n \n \n Build and own the evaluation harness for an agentic investigation system — defining metrics, test cases and grading approaches for a complex long horizon agent\n \n Construct high-quality eval datasets representing real-world misuse across harm areas (e.g., cyber attacks, bio weapons, influence operations), drawing from real traffic patterns and synthetic generation\n \n Measure agent performance end-to-end (detection precision/recall, investigation quality, robustness) and drive hill-climbing on the hardest harm areas\n \n Analyze coverage to identify measurement gaps, and evolve evals so they remain unsaturated and high-signal as agent capabilities advance\n \n Productionize successful research into regression and release pipelines that run on every agent change, prompt update, and underlying model upgrade\n \n Build tooling that enables policy experts to author, run, and iterate on evaluations without engineering support\n \n Construct RL environments to improve Claude’s safety investigation capabilities.\n \n Minimum qualifications\n \n \n Proficiency in Python and comfort working across the stack\n \n Experience building and maintaining data pipelines\n \n Experience working with LLMs and a working understanding of their capabilities and failure modes — especially agentic systems with tool use and multi-step reasoning\n \n Strong data analysis skills — you can draw reliable insights from large datasets\n \n Ability to move fluidly between research prototyping and production-quality code\n \n Ability to translate ambiguous problems into concrete, testable experiments\n \n Preferred qualifications\n \n \n 8+ years of industry software engineering experience\n \n Expertise in building or contributing to agent evaluation frameworks, benchmarks, or automated grading systems\n \n Extensive experience in trust and safety, content moderation, or abuse detection systems\n \n Experience in red teaming, adversarial testing, or jailbreak research on AI systems\n \n Experience with synthetic data generation or data augmentation\n \n Experience with distributed systems or large-scale data processing\n \n Experience with prompt engineering or building LLM-powered applications\n The annual compensation range for this role is listed below. \n For sales roles, the range provided is the role’s On Target Earnings (\"OTE\") range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role.\n Annual Salary:\n $320,000 — $485,000 USD \n Logistics \n Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience\n Required field of study:  A field relevant to the role as demonstrated through coursework, training, or professional experience\n Minimum years of experience: Years of experience required will correlate with the internal job level requirements for the position\n Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.\n Visa sponsorship:  We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.\n We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed.  Research shows that people who identify as being from underreprese","salary_min":320000,"salary_max":485000,"location":"San Francisco, CA","workplace":"hybrid","remote_scope":"not_remote","job_type":"full-time","experience_level":"lead","tags":["alignment","data-pipeline","llm","agents","distributed-systems","evaluation","rust"],"apply_url":"https://job-boards.greenhouse.io/anthropic/jobs/5251671008","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-06-09T15:02:53Z","expires_at":"2026-08-15T14:00:41.990749Z","created_at":"2026-06-28T14:00:32.865428Z","updated_at":"2026-07-16T14:00:42.162244Z","company_name":"Anthropic","company_slug":"anthropic","company_logo_url":"https://www.google.com/s2/favicons?domain=anthropic.com\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/2087e015-ee4e-49c3-b24d-576611c371ec"},{"id":"d3c94339-dd13-4c3f-ad05-a18b15c19e75","company_id":"f36ec848-cb19-4b95-a680-6733e58086c0","title":"Machine Learning Engineer II - Autonomous Driving Performance Evaluation","slug":"machine-learning-engineer-ii-autonomous-driving-performance-evaluation-8e6d5a1d","description":"May Mobility is transforming cities through autonomous technology to create a safer, greener, more accessible world. Based in Ann Arbor, Michigan, May develops and deploys autonomous vehicles (AVs) powered by our innovative Multi-Policy Decision Making (MPDM) technology that literally reimagines the way AVs think. Our vehicles do more than just drive themselves - they provide value to communities, bridge public transit gaps and move people where they need to go safely, easily and with a lot more fun. We’re building the world’s best autonomy system to reimagine transit by minimizing congestion, expanding access and encouraging better land use in order to foster more green, vibrant and livable spaces. Since our founding in 2017, we’ve given more than 500,000 autonomous rides to real people around the globe. And we’re just getting started. We’re hiring people who share our passion for building the future, today, solving real-world problems and seeing the impact of their work. Join us. \n Job Summary \n May Mobility is entering an exciting phase of growth as we expand our first-of-its-kind autonomous shuttle and mobility services across the nation. Launched in 2017 with a strong team of experienced roboticists and software engineers with decades of experience fielding robotic systems in the wild, May Mobility is looking to expand its team of robotics engineers with a background in robotics or autonomous vehicles.\n We are seeking ML-Oriented Software Engineers with experience in robotics applications. As part of our Autonomous Driving ML team, you will use ML Engineering concepts to measure, analyze and systematically improve the performance of May's Autonomous Driving stack through data, metrics, evaluation and test/hillclimbing suites.\n Essential Responsibilities \n \n Design, implement and own ML metrics and evaluation pipelines spanning offline model evaluation, simulation and on-road performance. \n Build and maintain test, regression and hillclimbing suites that gate model and stack releases, including automated triage of regressions to root cause. \n Drive model improvement through loss analysis, error mining, and data balancing/curation strategies for training and evaluation sets.\n \n Skills and Abilities \n Success in this role typically requires the following competencies: \n \n Designing quantitative metrics and statistical analyses that translate model behavior into actionable, decision-grade signals (significance, slicing, long-tail analysis).\n Building evaluation and analytics frameworks in production, including dataset slicing, result aggregation and dashboarding at scale. \n Applying data-centric ML methods such as hard-example mining, resampling/reweighting and curriculum or balance adjustments to lift model performance.\n \n Qualifications and Experience \n Candidates most successful in this role typically hold the following qualifications or comparable knowledge or experience: \n Required \n \n Bachelor's or Master's degree in Robotics, Computer Science, Statistics, or a related field with strong mathematical and engineering foundations. \n A minimum of 2 years building evaluation, metrics, or data analysis systems for ML in production. \n Proficiency in Python (NumPy/Pandas or equivalent dataframe tooling) with experience in Linux environments. \n Familiarity with basic concepts in Machine Learning (losses, train/eval splits, common failure modes) and basic Perception and Planning concepts in Autonomous Driving.\n \n Desirable \n \n Proficiency in Go or C++. \n Familiarity with experiment tracking and evaluation tooling such as MLflow, Weights \u0026 Biases, or in-house equivalents. \n Familiarity with statistical methods for A/B comparison, regression detection and noisy-metric analysis. \n Familiarity with data mining and curation at scale (embedding-based retrieval, active learning, auto-labeling). \n Familiarity with visualization and dashboarding tools (Plotly, Grafana, Streamlit or similar).\n \n Physical Requirements \n \n Standard office working conditions which includes but is not limited to:\n \n Prolonged sitting\n Prolonged standing\n Prolonged computer use\n \n \n Travel required? -  Low 5-10%\n \n \n \n \n \n \n \n Benefits and Perks \n \n Comprehensive healthcare suite including medical, dental, vision, life, and disability plans. Domestic partners who have been residing together at least one year are also eligible to participate. \n Health Savings and Flexible Spending Healthcare and Dependent Care Accounts available.\n Rich retirement benefits, including an immediately vested employer safe harbor match.\n Generous paid parental leave as well as a phased return to work. \n Flexible vacation policy in addition to paid company holidays.\n Total Wellness Program providing numerous resources for overall wellbeing   \n \n Don’t meet every single requirement? Studies have shown that women and/or people of color are less likely to apply to a job unless they meet every qualification. At May Mobility, we’re committe","salary_min":172000,"salary_max":210000,"location":"Anywhere, USA","workplace":"remote","remote_scope":"restricted","job_type":"full-time","experience_level":"junior","tags":["healthcare","autonomous-vehicles","robotics","evaluation","machine-learning"],"apply_url":"https://job-boards.greenhouse.io/maymobility/jobs/8187068002","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-06-08T20:01:04Z","expires_at":"2026-08-15T14:18:34.682378Z","created_at":"2026-06-28T14:16:46.812396Z","updated_at":"2026-07-16T14:18:34.804213Z","company_name":"May Mobility","company_slug":"may-mobility","company_logo_url":"https://www.google.com/s2/favicons?domain=maymobility.com\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/d3c94339-dd13-4c3f-ad05-a18b15c19e75"},{"id":"0679cae1-6674-4571-a8fb-8e14efcb71c5","company_id":"e8c9f3a5-9310-43f5-9341-321fe6d93a92","title":"Systems Engineer, AI Validation ","slug":"systems-engineer-ai-validation-5229713f","description":"About us    \n Founded in 2017, Wayve is the leading developer of Embodied AI technology.  Our advanced AI software and foundation models enable vehicles to perceive, understand, and navigate any complex environment, enhancing the usability and safety of automated driving systems.\n Our vision is to create autonomy that propels the world forward.  Our intelligent, mapless, and hardware-agnostic AI products are designed for automakers, accelerating the transition from assisted to automated driving.  In our fast-paced environment big problems ignite us—we embrace uncertainty, leaning into complex challenges to unlock groundbreaking solutions. We aim high and stay humble in our pursuit of excellence, constantly learning and evolving as we pave the way for a smarter, safer future.\n At Wayve, your contributions matter.  We value diversity, embrace new perspectives, and foster an inclusive work environment; we back each other to deliver impact.  \n Make Wayve the experience that defines your career!  \n The role  \n Wayve’s Engineering Validation team builds confidence in the autonomy systems we deliver to product and customer teams. As a Systems Engineer, you will own a core part of the behavior validation for the Wayve AI Driver, from strategy to implementation and execution. Your contributions will enable the successful integration of the AI Driver across a range of autonomy products. This critical work will help the Wayve AI Driver reach millions of customers globally.\n Key Responsibilities \n \n Develop and implement a comprehensive validation strategy across test modalities for multiple autonomy products.\n Define and implement general purpose metrics for validating on-road and simulated driving behavior.\n Define test coverage requirements and build comprehensive test suites.\n Develop acceptance criteria through collaboration with Product, Safety, and AV Engineering teams.\n Analyze test results and report findings to Release and Autonomy Engineering stakeholders.\n Identify data, simulation, and evaluation team dependencies to enable timely, scalable, and automated validation execution.\n \n About You \n In order to set you up for success as a Systems Engineer (Senior / Staff), AI Validation at Wayve, we’re looking for the following skills and experience. \n Essential \n \n 5+ years of experience working on automated driving, robotics, or related product.\n BSc, MSc, or PhD in Computer Science, Robotics, Aerospace, or a related field.\n Proficiency in Python to implement metrics and work with evaluation codebase.\n Deep understanding of driving behavior and how to measure driving performance.\n Experience with both simulated and and physical testing environments for autonomous systems.\n Experience analyzing and reporting validation results.\n Demonstrated ownership driving validation concepts from ideation to implementation in ambiguous, fast-paced environments.\n \n Desirable \n \n Experience with modern AI tools and agentic workflows \n Experience analyzing large datasets with SQL\n \n  \n This is a full-time role based in our office in Sunnyvale and the reasonably estimated salary for this role ranges from $209,700 to $266,800 plus a competitive equity package. Actual compensation is based on the candidate's skills, qualifications, and experience.\n At Wayve we want the best of all worlds so we operate a hybrid working policy that combines time together in our offices and workshops to fuel innovation, culture, relationships and learning, and time spent working from home. We operate core working hours so you can determine the schedule that works best for you and your team.  \n Wayve is committed to creating an inclusive interview experience. If you require any accommodations or adjustments to participate fully in our interview process, please let us know. \n We understand that everyone has a unique set of skills and experiences and that not everyone will meet all of the requirements listed above. If you’re passionate about self-driving cars and think you have what it takes to make a positive impact on the world, we encourage you to apply. At Wayve we're committed to creating a diverse, fair and respectful culture that is inclusive of everyone based on their unique skills and perspectives, and regardless of sex, race, religion or belief, ethnic or national origin, disability, age, citizenship, marital, domestic or civil partnership status, sexual orientation, gender identity, veteran status, pregnancy or related condition  (including breastfeeding) or any other basis as protected by applicable law.  \n For more information visit Careers at Wayve.  \n To learn more about what drives us, visit Values at Wayve  \n For US candidates only, please visit E-Verify Notice and Participation and Right to Work \n \n DISCLAIMER: We will not ask about marriage or pregnancy, care responsibilities or disabilities in any of our job adverts or interviews. However, we do look to capture information about care responsibilities, and ","salary_min":209700,"salary_max":266800,"location":"Sunnyvale, CA","workplace":"hybrid","remote_scope":"not_remote","job_type":"full-time","experience_level":"senior","tags":["agents","generative-ai","autonomous-vehicles","robotics","evaluation"],"apply_url":"https://wayve.firststage.co/jobs?gh_jid=8542296002","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-06-03T16:43:13Z","expires_at":"2026-08-15T14:14:02.941898Z","created_at":"2026-06-28T14:12:44.691875Z","updated_at":"2026-07-16T14:14:03.069771Z","company_name":"Wayve","company_slug":"wayve","company_logo_url":"https://www.google.com/s2/favicons?domain=wayve.ai\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/0679cae1-6674-4571-a8fb-8e14efcb71c5"},{"id":"4ab2e3ca-532b-47f4-9957-6020b204eaaf","company_id":"d49c7f16-1314-459a-acab-7b3d38ee01a9","title":"Member of Technical Staff, Evals","slug":"member-of-technical-staff-evals-74e01b55","description":"Magic’s mission is to build safe AGI that accelerates humanity’s progress on the world’s most important problems. We believe the most promising path to safe AGI lies in automating research and code generation to improve models and solve alignment more reliably than humans can alone. Our approach combines frontier-scale pre-training, domain-specific RL, ultra-long context, and inference-time compute to achieve this goal.\n\n\n\n\nABOUT THE ROLE\n\nEvals builds the internal platform that teams across Magic use to evaluate the performance of internal and external models. The team supports pre-training, post-training, data, inference, and product, and sits on the critical path of many of the company's most important decisions.\n\nAs a Member of Technical Staff on Evals, you will build both the platform and the evaluations themselves. You'll develop infrastructure for large-scale evaluations, data ablations, and dataset quality analysis, while designing and validating the methodologies used to measure model performance.\n\nSweating the details matters on this team. Many benchmarks, papers, and open-source evaluation frameworks contain subtle bugs or flawed assumptions that lead to misleading conclusions. We care deeply about correctness, reproducibility, and measurement quality.\n\nEvals are essential to the success of the company. By building trustworthy evaluation systems, you will help Magic make better research decisions, build better datasets, and ship better products.\n\n\n\n\nWHAT YOU'LL WORK ON\n\n - Build and maintain the internal evals platform used across Magic\n\n - Design, implement, and validate eval tasks for pre-training, post-training, reinforcement learning, inference, and product systems\n\n - Develop infrastructure for running large-scale evaluations\n\n - Build systems to measure dataset quality and identify opportunities to improve training data\n\n - Improve evaluation correctness, reproducibility, and reliability\n\n - Audit and improve upon public benchmarks, evaluation methodologies, and open-source implementations\n\n - Partner with research, data, inference, and product teams to define metrics that accurately reflect model quality\n\n - Build tooling and frameworks that enable teams across Magic to make decisions based on trustworthy measurements\n\n\nWHAT WE'RE LOOKING FOR\n\n - Experience building production systems, internal platforms, or developer infrastructure\n\n - Experience working with machine learning systems, evaluation frameworks, data infrastructure, or research tooling\n\n - Track record of owning technical projects end-to-end\n\n - Skepticism toward results that cannot be reproduced, validated, or explained\n\n - Ability to reason critically about benchmarks, metrics, and experimental methodology\n\n - Experience designing, implementing, or operating systems that run at scale\n\n - Comfortable navigating ambiguity and determining whether a measurement is actually capturing the behavior it claims to measure\n\n - Excitement about helping researchers and engineers make better decisions through trustworthy measurements\n\n\nCOMPENSATION, BENEFITS, AND PERKS (US)\n\n - Annual salary range between $200K - $550K depending on experience\n\n - Equity is a significant part of total compensation, in addition to salary\n\n - 401(k) plan with 6% salary matching\n\n - Generous health, dental, and vision insurance for you and your dependents\n\n - Unlimited paid time off\n\n - Visa sponsorship and relocation support for candidates moving to San Francisco\n\n - A small, fast-moving, highly collaborative team working on frontier AI systems\n\nMagic strives to be the place where high-potential individuals can do their best work. We value quick learning and grit just as much as skill and experience.\n\n\n\n\nOUR CULTURE\n\n - Integrity. Words and actions should be aligned\n\n - Hands-on. At Magic, everyone is building\n\n - Teamwork. We move as one team, not N individuals\n\n - Focus. Safely deploy AGI. Everything else is noise\n\n - Quality. Magic should feel like magic","salary_min":200000,"salary_max":550000,"location":"San Francisco, CA","workplace":"onsite","remote_scope":"not_remote","job_type":"full-time","experience_level":"lead","tags":["code-generation","reinforcement-learning","pre-training","evaluation"],"apply_url":"https://jobs.ashbyhq.com/magic.dev/49e62c0f-ee70-4c6d-95dc-1ac4132ca5cf/application","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-06-01T21:37:20.708Z","expires_at":"2026-08-15T14:05:46.0157Z","created_at":"2026-06-28T14:05:01.254114Z","updated_at":"2026-07-16T14:05:46.139397Z","company_name":"Magic","company_slug":"magic","company_logo_url":"https://www.google.com/s2/favicons?domain=magic.dev\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/4ab2e3ca-532b-47f4-9957-6020b204eaaf"},{"id":"88b5244c-2383-4f06-b5ef-0ade11296098","company_id":"6ce2d21e-b00f-4343-9bd0-5ac62ff81431","title":"Technical Lead Manager, Prediction, ML Evaluation","slug":"staff-technical-lead-manager-prediction-planning-ml-eval-29b43259","description":"Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building the Waymo Driver—The World's Most Experienced Driver™—to improve access to mobility while saving thousands of lives now lost to traffic crashes. The Waymo Driver powers Waymo’s fully autonomous ride-hail service and can also be applied to a range of vehicle platforms and product use cases. The Waymo Driver has provided over ten million rider-only trips, enabled by its experience autonomously driving over 100 million miles on public roads and tens of billions in simulation across 15+ U.S. states.\n The Predictive Planning team (PrePlan) develops and deploys state-of-the-art machine learning solutions that predict the future state of the world and plan the Waymo Driver’s behavior. Our mission is to transform Waymo's unprecedented scale of driving data into robust, generalizable, and performant deep neural networks. These models enable the autonomous vehicle to navigate complex environments safely and efficiently. \n We have an exciting opportunity for a Staff Technical Lead Manager to lead our ML Evaluation team. In this role, you will define the strategic vision for our evaluation platforms, scaling the critical infrastructure and metrics required, and partner closely with the modeling teams to rigorously validate our next-generation deep neural networks and accelerate ML developer velocity across PrePlan.\n You will: \n \n Influence the strategic direction of foundational infrastructure and evaluation platforms to robustly support next-generation ML model evaluation use cases\n Collaborate cross-functionally with ML engineers, data scientists, and infrastructure teams to identify, define, and surface critical signals on model, component, and system-level performance\n Leverage and scale evaluation and infrastructure platforms to significantly enhance the ML developer experience, enabling faster iteration through earlier, more reliable, and trusted model evaluation\n Manage and mentor a focused team of engineers, aligning their career growth and aspirations with critical organizational needs\n Drive best practices and leverage deep technical awareness of the Alphabet ML stack (e.g., TensorFlow, JAX, Flax, Apache Beam) to optimize evaluation workflows\n Stay at the forefront of emerging technologies, industry trends, and research in ML evaluation methodologies and advanced metrics design\n \n You have:  \n \n M.S. in Computer Science, Mathematics, or equivalent industry experience in Robotics or large-scale ML systems with critical evaluation needs\n 5+ years of experience building and maintaining large-scale distributed infrastructure, ML inference systems, or evaluation platforms, including 3+ years of engineering management experience\n Strong coding and testing proficiency, specifically in Python and C++\n Strong foundational knowledge of model evaluation and core data science principles (e.g., confidence intervals, outlier identification, curve fitting, and causality analysis)\n Familiarity with large-scale ML deployment and orchestration tools (e.g., TF Serving, TorchServe, Kubeflow, SageMaker Pipelines, or Vertex AI Pipelines)\n Understanding of machine learning fundamentals and experience with popular ML frameworks such as JAX, PyTorch, or TensorFlow\n \n We prefer: \n \n Experience developing and maintaining evaluation pipelines for ML models\n Experience deploying and supporting machine learning models for computer vision, natural language processing, robotics/motion planning, or recommendation systems\n Experience supporting a small team of MLEs developing high-capacity, production-grade models and components\n Strong understanding of metrics computation and regression detection at scale\n The expected base salary range for this full-time position across US locations is listed below. Actual starting pay will be based on job-related factors, including exact work location, experience, relevant training and education, and skill level. Your recruiter can share more about the specific salary range for the role location or, if the role can be performed remote, the specific salary range for your preferred location, during the hiring process.  \n Waymo employees are also eligible to participate in Waymo’s discretionary annual bonus program, equity incentive plan, and generous Company benefits program, subject to eligibility requirements.  \n Salary Range\n $251,000 — $310,000 USD","salary_min":251000,"salary_max":310000,"location":"Mountain View, CA","workplace":"onsite","remote_scope":"not_remote","job_type":"full-time","experience_level":"lead","tags":["nlp","deep-learning","computer-vision","autonomous-vehicles","tensorflow","pytorch","robotics","evaluation"],"apply_url":"https://careers.withwaymo.com/jobs?gh_jid=7963516","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-05-28T17:26:50Z","expires_at":"2026-08-15T14:05:15.803612Z","created_at":"2026-05-29T14:12:24.077985Z","updated_at":"2026-07-16T14:05:15.922379Z","company_name":"Waymo","company_slug":"waymo","company_logo_url":"https://www.google.com/s2/favicons?domain=waymo.com\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/88b5244c-2383-4f06-b5ef-0ade11296098"},{"id":"15a3860a-1bf8-414e-9cc0-134f1fd3836b","company_id":"10c1ac82-83d5-423a-b438-4cc7b13d597c","title":"Principal Software Engineer, AI Observability \u0026 Evals Platform","slug":"principle-software-engineer-ai-observability-evals-platform-04a35007","description":"ABOUT US\n\n\n\nAt LangChain, our mission is to make intelligent agents ubiquitous. We build the foundation for agent engineering in the real world, helping developers move from prototypes to production-ready AI agents that teams can rely on. We began as widely adopted open-source tools and have grown to also offer a platform for building, evaluating, deploying, and operating agents at scale.\n\nWith $125M raised at Series B from IVP, Sequoia, Benchmark, CapitalG, and Sapphire Ventures, we’re at a stage where we’re continuing to develop new products, growth is accelerating, and all team members have meaningful impact on what we build and how we work together. LangChain is a place where your contributions can shape how this technology shows up in the real world.\n\nToday, our platform includes LangSmith (Observability, Evaluation, Deployment, Fleet, and Sandboxes), our open source frameworks (LangChain, LangGraph, and Deep Agents), and the newly launched LangSmith Engine for autonomous agent improvement. We have 100M+ monthly open source downloads, 6,000+ active LangSmith customers, and 5 of the Fortune 10 use LangSmith in production (+ 35% of the Fortune 500 overall), including teams at Klarna, Clay, Coinbase, Workday, Lyft, Cloudflare, Harvey, Rippling, Vanta, LinkedIn, Monday.com, Nvidia, and Bridgewater.\n\n\n\n\n\n\nABOUT THE TEAM\n\nThe LangSmith team owns and builds LangChain's core platform for observability, evaluation, and production reliability of AI systems. From tracing and annotation to run rules, evaluations, and beyond, they own this end-to-end. If you want to help define what great AI observability looks like at production scale, this is where that work gets done.\n\n\n\n\nABOUT THE ROLE\n\nWe're looking for a Principal/Lead level Software Engineer to join the LangSmith team and help drive the technical direction of the platform. You'll build across the full stack from backend services and APIs to frontend product surfaces, and you'll play a central role in shaping how we build: setting engineering standards, mentoring engineers across the team, and making architectural decisions that hold up as we scale. If you're energized by both hands-on engineering and the multiplier effect of leveling up those around you, this role is built for that.\n\nLocation: This role can be based in our Boston, San Francisco, or NYC office.\n\n\n\n\n\n\nWHAT YOU'LL DO\n\n\n\n\nDRIVE TECHNICAL DIRECTION\n\n - Lead architectural decisions across our Go, Python, and TypeScript stack, ensuring systems are performant, maintainable, and built to scale\n\n - Work across the full stack, owning features end-to-end from backend services and APIs through to frontend product experiences\n\n - Drive tracing, monitoring, and evaluation workflows at scale, with a focus on reliability and query performance across high-volume data\n\n - Help shape the product roadmap by partnering closely with product and design — not just executing on it\n\n\nRAISE THE BAR FOR THE TEAM\n\n - Set engineering standards for the team: define patterns, lead code reviews, and establish the foundations others build on\n\n - Mentor and grow engineers at all levels through code review, design feedback, pairing, and ongoing technical guidance\n\n - Drive projects from ambiguity to delivery while maintaining high engineering standards and aggressive timelines\n\n\nOWN RELIABILITY AND QUALITY\n\n - Troubleshoot and resolve production issues with a root-cause mindset, and implement durable fixes\n\n - Ensure system reliability through strong testing, monitoring, and alerting practices\n\n - Create and maintain technical documentation, including system design docs and API references\n   \n   \n\n\nWHAT YOU'LL BRING\n\n\n\n\n - 10+ years of professional experience in backend or fullstack engineering on highly complex, production systems\n\n - Strong programming skills across multiple parts of the stack: backend (Python and/or Go) and frontend (TypeScript, React, or similar)\n\n - Demonstrated experience making and owning architectural decisions, including tradeoffs around data systems, APIs, and service reliability\n\n - Experience with high-throughput or mission-critical systems, and a proven ability to optimize for performance and reliability\n\n - Depth in operationalizing technical work — you've taken systems from prototype to production and kept them running well at scale\n\n - Demonstrated track record of mentoring engineers and raising the technical quality of a team, not just the codebase\n\n - Strong communication skills and comfort operating cross-functionally with product, design, and engineering leadership\n\n - Customer centricity and an ownership mentality — you care how the product lands, not just how the code reads\n\n - You exemplify our operating principles https://www.langchain.com/careers\n\n\n\n\nNICE TO HAVE\n\n - Experience with database systems (Postgres, Redis, ClickHouse) and cloud platforms (AWS, GCP, or Azure)\n\n - Familiarity with observability tooling, evaluation frameworks, or AI/LLM infrastructure\n   \n   \n\nSalary R","salary_min":230000,"salary_max":270000,"location":"Boston, MA","workplace":"hybrid","remote_scope":"not_remote","job_type":"full-time","experience_level":"principal","tags":["agents","llm","platform","evaluation"],"apply_url":"https://jobs.ashbyhq.com/langchain/d3f8de08-2e2b-4c3f-be1f-e63ca51f1d93/application","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-05-13T22:53:01.826Z","expires_at":"2026-08-15T14:02:16.267573Z","created_at":"2026-05-14T14:02:16.638316Z","updated_at":"2026-07-16T14:02:16.396857Z","company_name":"LangChain","company_slug":"langchain","company_logo_url":"https://www.google.com/s2/favicons?domain=langchain.com\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/15a3860a-1bf8-414e-9cc0-134f1fd3836b"},{"id":"06156a81-9587-4b11-9e8a-8eb784531cf2","company_id":"66e863fb-9aaf-40df-996c-eb439e6f857e","title":"Machine Learning Engineer, LLM Evals \u0026 Observability","slug":"machine-learning-engineer-llm-evals-observability-2ace05f2","description":"About Glean: \n  \n Glean is the Work AI platform that helps everyone work smarter with AI. What began as the industry’s most advanced enterprise search has evolved into a full-scale Work AI ecosystem, powering intelligent Search, an AI Assistant, and scalable AI agents on one secure, open platform. With over 100 enterprise SaaS connectors, flexible LLM choice, and robust APIs, Glean gives organizations the infrastructure to govern, scale, and customize AI across their entire business - without vendor lock-in or costly implementation cycles. \n  \n At its core, Glean is redefining how enterprises find, use, and act on knowledge. Its Enterprise Graph and Personal Knowledge Graph map the relationships between people, content, and activity, delivering deeply personalized, context-aware responses for every employee. This foundation powers Glean’s agentic capabilities - AI agents that automate real work across teams by accessing the industry’s broadest range of data: enterprise and world, structured and unstructured, historical and real-time. The result: measurable business impact through faster onboarding, hours of productivity gained each week, and smarter, safer decisions at every level. \n  \n Recognized by Fast Company as one of the World’s Most Innovative Companies (Top 10, 2025), by CNBC’s Disruptor 50, Bloomberg’s AI Startups to Watch (2026), Forbes AI 50, and Gartner’s Tech Innovators in Agentic AI, Glean continues to accelerate its global impact. With customers across 50+ industries and 1,000+ employees in more than 25 countries, we’re helping the world’s largest organizations make every employee AI-fluent, and turning the superintelligent enterprise from concept into reality. \n  \n If you’re excited to shape how the world works, you’ll help build systems used daily across Microsoft Teams, Zoom, ServiceNow, Zendesk, GitHub, and many more - deeply embedded where people get things done. You’ll ship agentic capabilities on an open, extensible stack, with the craft and care required for enterprise trust, as we bring Work AI to every employee, in every company. \n  \n About the Role: \n Building a great AI assistant is only half the battle – knowing whether it's actually great is the other half. Our team owns the measurement and quality layer that make Glean's Assistant and Agents reliably better over time: evaluation pipelines, quality eval-sets, LLM-powered judges, agent observability, and the tooling engineers use to understand what changed and why. It's a rare combination of infrastructure engineering, applied ML, and direct product impact. If you care deeply about quality and want to build the systems that make it measurable, this role is for you. \n You will:  \n \n Design and curate evaluation datasets – sampling strategies, query diversity, and golden sets that give reliable, representative coverage of real assistant behavior. \n Build and maintain large-scale evaluation pipelines that measure assistant quality across thousands of real user queries. \n Build LLM-powered judges that score metrics like correctness, completeness, and response quality, and align them against human judgment. \n Evaluate new models and product changes before they ship – providing the quality signal that gates launches and prevents regressions. \n Build observability infrastructure for AI agents: trace enrichment, data pipelines, and dashboards that make assistant behavior inspectable. \n Close the loop between quality measurement and improvement using eval results, customer feedback, and techniques like automated prompt iteration to help drive concrete gains in assistant behavior. \n Collaborate with engineers across the company to make evals a first-class part of how we ship. \n \n About you: \n \n 2+ years of software engineering experience with strong coding skills. \n Strong backend fundamentals in Go and Python; comfortable with distributed data pipelines. \n Experience working with LLM evaluation, reinforcement learning from human feedback, natural language processing, or other large systems involving machine learning. \n Analytically rigorous – you think carefully about what offline metrics actually predict about real user experience. \n Thrive in a customer-focused, tight-knit and cross-functional environment - being a team player and willing to take on whatever is most impactful for the company \n You care about quality – not just in the systems you build, but in the product you're helping measure and improve. \n \n Location:   \n \n This role is hybrid (3-4 days a week in one of our SF Bay Area offices) \n \n Compensation \u0026 Benefits: \n The standard base salary range for this position is $200,000 - $300,000 annually. Compensation offered will be determined by factors such as location, level, job-related knowledge, skills, and experience. Certain roles may be eligible for variable compensation, equity, and benefits. \n We offer a comprehensive benefits package including competitive compensation, Medical, Vis","salary_min":200000,"salary_max":300000,"location":"Mountain View, CA","workplace":"onsite","remote_scope":"not_remote","job_type":"full-time","experience_level":"junior","tags":["llm","agents","nlp","reinforcement-learning","cloud","data-pipeline","machine-learning","evaluation"],"apply_url":"https://job-boards.greenhouse.io/gleanwork/jobs/4694716005","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-05-12T19:37:57Z","expires_at":"2026-08-15T14:04:01.720999Z","created_at":"2026-05-14T14:03:50.554084Z","updated_at":"2026-07-16T14:04:01.838771Z","company_name":"Glean","company_slug":"glean","company_logo_url":"https://www.google.com/s2/favicons?domain=glean.com\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/06156a81-9587-4b11-9e8a-8eb784531cf2"},{"id":"05fe22b7-bf97-45f6-a90b-6be38ed428c6","company_id":"e8c9f3a5-9310-43f5-9341-321fe6d93a92","title":"Triage Specialist ","slug":"triage-specialist-308bd6bd","description":"About us    \n Founded in 2017, Wayve is the leading developer of Embodied AI technology.  Our advanced AI software and foundation models enable vehicles to perceive, understand, and navigate any complex environment, enhancing the usability and safety of automated driving systems.\n Our vision is to create autonomy that propels the world forward.  Our intelligent, mapless, and hardware-agnostic AI products are designed for automakers, accelerating the transition from assisted to automated driving.  In our fast-paced environment big problems ignite us—we embrace uncertainty, leaning into complex challenges to unlock groundbreaking solutions. We aim high and stay humble in our pursuit of excellence, constantly learning and evolving as we pave the way for a smarter, safer future.\n At Wayve, your contributions matter.  We value diversity, embrace new perspectives, and foster an inclusive work environment; we back each other to deliver impact.  \n Make Wayve the experience that defines your career!  \n  \n The role  \n As a Triage Specialist at Wayve , you’ll play a critical role in maintaining the safety and reliability of our ADAS+ vehicle systems. You’ll be responsible for Root Cause Analysis and producing Top Issue reports across our test environments and operational domains.You will bridge the gap between Detective Engineers, Triage Engineering and Component teams, you’ll help accelerate triage outcomes and improve the efficiency and effectiveness of our development pipeline.\n This is a highly collaborative and detail-oriented role with a direct impact on the performance and safety of Wayve’s models.\n Wayve.ai’s office is located in Sunnyvale, California, at the centre of Silicon Valley. The office offers convenient access to public transit and is surrounded by a vibrant community of tech innovators, with a wide selection of nearby dining and recreational options.\n Key Responsibilities\n \n Perform first-level analysis on real world run data\n Partner closely with Detective Engineers to learn and discover the root cause of an issue.\n Partner closely with Triage Engineers to develop automated solutions.\n Support the full issue lifecycle: identification, investigation, resolution, documentation, and reporting.\n Accurately add metadata to the on road interventions (i.e. Root Cause Labels, Jira Linking, Comments)\n Effectively use Wayve’s proprietary tooling.\n Investigate complex or ambiguous runs that require deeper analysis or resolution.\n Participate in regular standups to address emerging concerns, assign priorities, and adapt to process changes.\n Deliver actionable feedback on triage tooling and suggest improvements to enhance usability and speed.\n Contribute to process improvement and maintain a high standard of safety, efficiency, and rigor.\n Accurately validate Triage automation bots\n \n About You   \n In order to set you up for success as a Triage Specialist at Wayve, we’re looking for the following skills and experience.  \n Essential \n \n Previous experience in ADAS, Autonomous Vehicles or Testing\n Passion for Quality and Safety-first mindset.\n Ability to learn and use a variety of internal tools and software platforms.\n Comfort with technical terminology related to ADAS systems and software stacks.\n Strong communication skills across cross-functional and time-zone-distributed teams.\n Analytical thinking, logical reasoning, and bias-free problem-solving.\n Detail-oriented approach with a focus on accuracy, even over extended sessions.\n Strong debugging, documentation, and investigation skills.\n Ability to work independently and as part of a collaborative team.\n \n Desirable \n \n Experience with issue tracking and configuration management (e.g., Jira, Confluence, Bitbucket).\n Familiarity with software development concepts: source control, requirements analysis, build pipelines.\n Exposure to software release workflows and release testing practices.\n Basic experience with scripting (e.g., Bash, Python).\n SQL/data analysis proficiency for deeper log review and trend analysis.\n Experience with AV testing or robotics systems.\n \n This role is a full-time role based in Sunnyvale, CA (hybrid) and the reasonably estimated salary for this role ranges from $ $115,600 to $137,300 , plus a competitive equity package. Actual compensation is based on the candidate's skills, qualifications, and experience.\n Wayve is committed to creating an inclusive interview experience. If you require any accommodations or adjustments to participate fully in our interview process, please let us know. \n We understand that everyone has a unique set of skills and experiences and that not everyone will meet all of the requirements listed above. If you’re passionate about self-driving cars and think you have what it takes to make a positive impact on the world, we encourage you to apply. At Wayve we're committed to creating a diverse, fair and respectful culture that is inclusive of everyone based on their unique ski","salary_min":115600,"salary_max":137300,"location":"Sunnyvale, CA","workplace":"onsite","remote_scope":"not_remote","job_type":"full-time","experience_level":"mid","tags":["generative-ai","autonomous-vehicles","robotics","evaluation"],"apply_url":"https://wayve.firststage.co/jobs?gh_jid=8541943002","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-05-09T00:15:13Z","expires_at":"2026-08-15T14:14:03.954344Z","created_at":"2026-05-10T14:14:31.532422Z","updated_at":"2026-07-16T14:14:04.075789Z","company_name":"Wayve","company_slug":"wayve","company_logo_url":"https://www.google.com/s2/favicons?domain=wayve.ai\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/05fe22b7-bf97-45f6-a90b-6be38ed428c6"},{"id":"f0e2d14a-2403-4412-adf3-0ffa6627de3f","company_id":"a0000000-0000-0000-0000-000000000001","title":"Research Engineer, Model Evaluations","slug":"research-engineer-model-evaluations-aa85e078","description":"About Anthropic \n Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.\n About the role\n We're looking for Research Engineers to build the evaluations that tell us — and the world — what Claude can actually do. Your work will turn ambiguous notions of \"intelligence\" into clear, defensible metrics that researchers, leadership, and the public can rely on.\n You'll design and implement evaluations across the full spectrum of Claude's capabilities and personality, and build the infrastructure that runs them reliably at scale. You'll partner closely with researchers throughout the lifecycle of a new capability — from defining what to measure, to running the eval against live training checkpoints, to interpreting the results. The goal is to make Anthropic the leader in extremely well-characterized AI systems, with performance that is exhaustively measured and validated across the tasks that matter.\n Key responsibilities\n \n Design and run new evaluations of Claude's capabilities — reasoning, agentic behavior, knowledge, safety properties — and produce visualizations that make the results legible to researchers and decision-makers\n Build and harden the distributed eval execution platform so hundreds of evals run reliably against checkpoints throughout production RL training runs\n Own the dashboards researchers and leadership use to monitor model health during training, improving signal-to-noise, reducing latency, and making regressions impossible to miss\n Debug anomalous eval results mid-training-run, determine whether the cause is a model change or an infrastructure issue, and communicate the answer clearly under time pressure\n Improve the tooling, libraries, and workflows researchers use to implement and iterate on evaluations\n Partner with research teams across the full lifecycle of a new capability — from defining what to measure to interpreting results as training progresses\n Run experiments to characterize how prompting, sampling, and scaffolding choices affect results on internal and industry benchmarks\n Communicate evaluations and their results to internal stakeholders and, where appropriate, external audiences\n \n Minimum qualifications\n \n Strong Python programming skills, including production or research infrastructure\n Experience building or operating distributed systems, data pipelines, or other infrastructure that needs to be reliable at scale\n Clear written and verbal communication, especially when explaining technical results to non-specialists\n Comfort operating in an on-call or production-support capacity when training runs are live\n Care about the societal impacts of your work and an interest in steering powerful AI to be safe and beneficial\n \n Preferred qualifications\n \n Hands-on experience using large language models such as Claude, including prompting, sampling, and scaffolding\n Background in data visualization and a track record of building dashboards people actually trust and use\n Experience developing robust evaluation metrics for language models\n Experience with observability, monitoring, or experiment-tracking systems\n Background in statistics and experimental design\n Experience with large-scale dataset sourcing, curation, and processing\n Experience running or supporting ML training infrastructure\n A bias toward picking up slack and operating flexibly across team boundaries\n Enjoy pair programming — we love to pair\n \n Representative projects\n \n Stand up a new eval that tests a specific reasoning capability from scratch — define the task, build the dataset, implement the scoring, validate against known signals, and ship a dashboard that makes the result legible\n Diagnose a mid-training regression: an eval suite returns anomalous numbers, and you need to determine within hours whether it's the model, the harness, the data, or the infrastructure\n Take a flaky distributed eval pipeline and make it boring — better retries, better observability, faster feedback to researchers\n Partner with a research team on a new capability area, helping them articulate what \"good\" looks like and translating that into measurable artifacts\n The annual compensation range for this role is listed below. \n For sales roles, the range provided is the role’s On Target Earnings (\"OTE\") range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role.\n Annual Salary:\n $500,000 — $850,000 USD \n Logistics \n Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience\n Required field of study:  A field relevant to the role as demonstrated through coursework, training, or professional experience\n Minimum years of experience: Years of","salary_min":500000,"salary_max":850000,"location":"San Francisco, CA","workplace":"hybrid","remote_scope":"not_remote","job_type":"full-time","experience_level":"principal","tags":["llm","data-pipeline","agents","search","distributed-systems","alignment","evaluation","research"],"apply_url":"https://job-boards.greenhouse.io/anthropic/jobs/5198255008","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-04-28T07:53:55Z","expires_at":"2026-08-15T14:00:29.586404Z","created_at":"2026-04-30T05:46:34.076137Z","updated_at":"2026-07-16T14:00:29.705233Z","company_name":"Anthropic","company_slug":"anthropic","company_logo_url":"https://www.google.com/s2/favicons?domain=anthropic.com\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/f0e2d14a-2403-4412-adf3-0ffa6627de3f"},{"id":"ae037fa3-ed42-4673-8c5c-286e63428ef6","company_id":"a761e420-c3e8-47ae-984d-1061786e8a13","title":"Member of Technical Staff, Evaluation Execution","slug":"member-of-technical-staff-evaluation-execution-aa3ef44e","description":"About METR\nWe are a nonprofit research organization that develops scientific methods to assess AI capabilities, risks, and mitigations, with a specific focus on threats related to AI R\u0026D automation and misalignment.\nMETR has consistently set precedents for catastrophic AI risk evaluations, including the first independent safety evaluations (working informally with Anthropic and OpenAI in 2022), the first loss-of-control evaluations and first agentic dangerous capability evaluations, the first evaluations using finetuning (mentioned briefly here),the first independent evaluations using internal information about training, the first review partnership for company risk analysis, the first embedded redteaming, and the first evaluations of internal deployments.\nWe’ve been consulted and/or favorably referenced by groups on opposite ends of various spectra, including a16z, Khosla, Gary Marcus, Obama, and Dean Ball, and are known for producing one of the most positive results on AI capabilities (the time horizon trend) and the most negative (our downlift study). We’re generally referenced as the canonical third party assessor, e.g. as the obvious candidate to verify conditional pause agreements. \nWe believe it is robustly good for policymakers and civil society to have a clear understanding of risks from AI systems, and we are extremely excited to build a team of ambitious, excellent people to tackle one of the most important challenges of our time. \n \nWhat this role looks like\nRunning models on tasks. Often this means integrating models into our agent scaffolds, running them on our infrastructure and checking the results carefully. (METR both develops our own tasks internally and runs external evaluations.)\nCommunicating results and takeaways. This includes designing useful graphs, writing up conclusions for different audiences (system cards, risk reports, regulators, X, etc), and having great takes on what matters for risk.\nBuilding software to improve our evaluations. We don't just try and run the same evaluation over and over again. We also run faster, more informative evaluations over time; this means making the right investments (with the support of our platform team).\nProject management. Live evaluations require keeping track of a bunch of threads and staying organized. With our recent risk report process, we were running many evaluations at once.\nStrong and professional communication. We run important and sensitive evaluations, and so the team needs to coordinate with METR leadership, lab contacts, regulators, and others.\n \nWhy this role matters\nAs part of informing the world about risk from frontier AI systems, METR often runs and publishes evaluations of frontier models.\nOur evaluations are a central tool the world uses to understand AI progress. Our Time Horizon methodology has been included in systemcards, called an \"obsession\" by the NYT, has wide reach online, and is used by governments to inform national policy.\nWe’re expanding the ambition and scale of our evaluations. We have recently begun to measure model propensities and monitorability, and we are increasing the speed, reliability, and quantity of evaluations we aim to do so that we can keep the world informed.\n \nHow METR’s evaluations are changing over 2026\nTime Horizon is close to saturation, so we’re currently working on Time Horizon 2.0, which we expect to be running on models over the next 6 to 18 months. \nWe’re gearing up for our first large-scale publication on monitorability, which we believe will be similar to TH in helping folks understand trends over time.\nWe spent the past three months working on a large, industry-wide third-party risk assessment program - which includes us collecting information (and running evaluations!) for both monitorability and propensities/alignment. We expect to do much more work as part of our own risk assessment programs in the future.\nIn general, many ambitious impact stories for METR require us having the capacity to run many more evaluations than we have run historically. For example, while our evaluations currently inform many key decisionmakers about AI capabilities, they are not yet consistently run with the scale, reliability, and speed necessary to play concrete, codified roles in regulatory frameworks. Unlocking this capacity is part of the near-future vision for evaluation execution.\n","salary_min":285548,"salary_max":503116,"location":"Berkeley","workplace":"onsite","remote_scope":"not_remote","job_type":"full-time","experience_level":"lead","tags":["agents","fine-tuning","research","evaluation"],"apply_url":"https://jobs.lever.co/metr/93baec4d-1e47-40f7-9990-6d7fef12da00/apply","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-04-27T23:11:49.026Z","expires_at":"2026-08-15T14:09:08.446937Z","created_at":"2026-06-30T14:08:02.835104Z","updated_at":"2026-07-16T14:09:08.565524Z","company_name":"METR","company_slug":"metr","company_logo_url":"https://www.google.com/s2/favicons?domain=metr.org\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/ae037fa3-ed42-4673-8c5c-286e63428ef6"},{"id":"89ddd205-641f-4ac6-a511-85bcd15bc1aa","company_id":"72014eb6-e84d-48c2-af5c-5424ebec0b3c","title":"Senior Staff Machine Learning Systems Engineer, Indexing \u0026 Retrieval Search","slug":"senior-staff-software-engineer-indexing-retrieval-platform-78a19276","description":"Reddit is a community of communities. It’s built on shared interests, passion, and trust, and is home to the most open and authentic conversations on the internet. Every day, Reddit users submit, vote, and comment on the topics they care most about. With 100,000+ active communities and approximately 126 million daily active unique visitors, Reddit is one of the internet’s largest sources of information. For more information, visit www.redditinc.com .\n Team The ML Indexing \u0026 Retrieval Platform team at Reddit is responsible for building and scaling the core infrastructure that powers machine learning driven recommendations. We design and maintain systems for ML data ingestion, low-latency retrieval services, and end-to-end lifecycle management of data. With a focus on performance, reliability, and scalability, we enable real-time access to high-quality data that supports a wide range of applications, including Content Understanding, Semantic, Lexical retrieval \u0026 GenAI applications.  \n How You'll Have Impact \n You’ll lead the development of next-generation ML Indexing \u0026 Retrieval systems, owning the full lifecycle from ideation to production and going beyond incremental improvements to reimagine core platform capabilities. As part of a high-impact, cross-functional team, you’ll solve complex technical challenges to build scalable, reliable platforms that empower developers to efficiently ship critical ML features. \n Languages: Go, Java, Python, or any object oriented programming language \n Frameworks: Flink, Airflow, Spark for large scale batch \u0026 stream processing  \n Databases: Familiarity with Vector, Lexical \u0026 Key-Value Databases  \n Tools: Kubernetes, Docker, AWS, GCP \n What You’ll Do \n \n Lead the technical strategy, architecture, and implementation of Reddit’s next-generation ML Indexing \u0026 Retrieval engine, integrating capabilities across lexical and vector indexing, low-latency retrieval, and emerging GenAI applications. \n Partner closely with product engineers across Content Understanding, Search, Feeds, Ads, Growth, and Safety to deliver high-quality experiences. \n Define best practices for observability, reliability, and operational excellence in large-scale distributed systems. \n Mentor and guide engineers in designing scalable infrastructure and adopting robust DevOps and SRE principles. \n Collaborate with infrastructure, and ML teams to ensure the platform evolves to meet the needs of Reddit’s growing user base and diverse content ecosystem. \n \n Who You Might Be: \n \n 10+ years of experience in software engineering, specializing in Indexing and Retrieval systems. \n 3+ years in technical leadership, architecting and scaling distributed systems in production environments. \n Deep expertise in large-scale data platforms, including batch indexing and stream processing. \n Proven experience designing and operating large-scale, low-latency retrieval services. \n Expertise in lexical and vector search retrieval technologies, such as Milvus, Vespa, or Elasticsearch.  \n Skilled in designing cloud-native architectures and managing containerized workloads using Kubernetes and AWS/GCP. \n Adept at translating complex technical challenges into clear, actionable strategies. \n Strong communicator and mentor who leads through collaboration, influence, and technical excellence. \n \n Benefits: \n \n Comprehensive Healthcare Benefits and Income Replacement Programs \n 401k with Employer Match \n Global Benefit programs that fit your lifestyle, from workspace to professional development to caregiving support \n Family Planning Support \n Gender-Affirming Care \n Mental Health \u0026 Coaching Benefits \n Flexible Vacation \u0026 Paid Volunteer Time Off \n Generous Paid Parental Leave  \n \n # LI -Remote \n Pay Transparency: \n This job posting may span more than one career level.\n In addition to base salary, this job is eligible to receive equity in the form of restricted stock units, and depending on the position offered, it may also be eligible to receive a commission. Additionally, Reddit offers a wide range of benefits to U.S.-based employees, including medical, dental, and vision insurance, 401(k) program with employer match, generous time off for vacation, and parental leave. To learn more, please visit https://www.redditinc.com/careers/ .\n To provide greater transparency to candidates, we share base salary ranges for all US-based job postings regardless of state. We set standard base pay ranges for all roles based on function, level, and country location, benchmarked against similar stage growth companies. Final offer amounts are determined by multiple factors including, skills, depth of work experience and relevant licenses/credentials, and may vary from the amounts listed below.\n The base salary range for this position is:\n $279,200 — $390,900 USD \n In select roles and locations, the interviews will be recorded, transcribed and summarized by artificial intelligence (AI). You will have the opportunity to opt out of recor","salary_min":279200,"salary_max":390900,"location":"Remote (US)","workplace":"remote","remote_scope":"restricted","job_type":"full-time","experience_level":"lead","tags":["distributed-systems","search","healthcare","generative-ai","cloud","machine-learning","evaluation"],"apply_url":"https://job-boards.greenhouse.io/reddit/jobs/7844238","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-04-22T23:19:47Z","expires_at":"2026-08-15T14:09:30.037638Z","created_at":"2026-04-30T05:51:48.863215Z","updated_at":"2026-07-16T14:09:30.272723Z","company_name":"Reddit","company_slug":"reddit","company_logo_url":"https://www.google.com/s2/favicons?domain=www.reddit.com\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/89ddd205-641f-4ac6-a511-85bcd15bc1aa"},{"id":"d4fbb761-0c23-476b-9417-44d0477804b4","company_id":"01048ffd-9864-41e0-a719-14b849fbcbcd","title":"Sr. Software Engineer, Computer Vision","slug":"sr-software-engineer-computer-vision-dfedf95b","description":"SpaceX was founded under the belief that a future where humanity is out exploring the stars is fundamentally more exciting than one where we are not. Today SpaceX is actively developing the technologies to make this possible, with the ultimate goal of enabling human life on Mars.\n SR. SOFTWARE ENGINEER, COMPUTER VISION \n We are looking for exceptional, driven, adaptable, and resilient senior software engineers who are technical leaders and experts in artificial intelligence (AI), machine learning (ML), and computer vision. As a software engineer on the NDE \u0026 Materials Software team, you will accelerate the quality, speed, and efficiency of manufacturing processes at SpaceX by owning end-to-end solutions - from model development to production deployment and monitoring.\n Our team is building the next generation of in-process monitoring tools for SpaceX as we develop the Starship, Raptor, and Starlink programs. We deliver high-impact solutions through advanced AI models and scalable simulation with a focus on non-destructive evaluation (NDE) and materials engineering applications. As a technical leader, you will guide the team in applying cutting-edge AI to these manufacturing challenges.\n RESPONSIBILITIES: \n \n Lead computer vision model development and deployment for real-time inspection and automated defect recognition in production-scale manufacturing\n Architect and implement pipelines for scalable model training, evaluation, deployment, monitoring, and retraining using industry-standard tools like Ray Train/Serve, Kubeflow, Airflow, or equivalent\n Develop software to integrate hardware, sensors, and tooling (including sensor fusion) with AI-driven process monitoring systems\n Build and optimize data pipelines from production lines, incorporating real-time stream processing and inference\n Collaborate with part/process engineers, NDE engineers, and materials scientists to link advanced models and outputs to manufacturing\n Lead design and code reviews, technology evaluations, and enforce best practices (e.g., style, CI/CD, accuracy, testability, efficiency, and standards)\n \n BASIC QUALIFICATIONS: \n \n Bachelor's degree in computer science, engineering, math, physics, or related STEM discipline; OR 8+ years of professional experience building and deploying AI software/Machine Learning in lieu of a degree\n 5+ years of software development experience\n 3+ years deploying AI models to production environments\n 3+ years of software engineering experience\n 3+ years of experience leveraging Python for data analysis\n \n PREFERRED SKILLS AND EXPERIENCE: \n \n Proven track record as a technical leader in software projects involving AI/ML\n Hands-on experience deploying computer vision models for real-time manufacturing inspection and defect detection\n Strong development experience in Python and C++ (or similar), with expertise in ML frameworks like PyTorch or JAX\n Hands-on experience with computer vision libraries (e.g., OpenCV)Experience fine tuning and adapting state-of-the-art models, such as transformers and vision transformers, to production environments\n Experience fine-tuning LLMs or vision-language models and building agentic tools\n Experience deploying applications at scale with Docker, Kubernetes, and cloud/edge inference for factory automation\n Expertise in machine learning/LLM operations including model versioning, A/B testing, drift detection, and orchestration with Kubeflow, Ray, MLFlow, or similar\n Stream processing with Apache Kafka, RabbitMQ, or equivalent\n Database expertise in PostgreSQL or similar and data tools (Prometheus, Grafana, Jupyter)\n Strong linux experience\n Experience applying AI to physics or simulation domains, using physics-informed neural networks (PINNs) or surrogate modeling\n \n ADDITIONAL REQUIREMENTS: \n \n Ability to work extended hours and weekends as necessary\n Ability to travel to other SpaceX sites as needed (up to 20%)\n \n COMPENSATION AND BENEFITS:     \n Pay range:     Sr. Software Engineer: $160,000.00 - $225,000.00/per year          Your actual level and base salary will be determined on a case-by-case basis and may vary based on the following considerations: job-related knowledge and skills, education, and experience.\n Base salary is just one part of your total rewards package at SpaceX. You may also be eligible for long-term incentives, in the form of company stock or long-term cash awards, as well as potential discretionary bonuses and the ability to purchase additional stock at a discount through an Employee Stock Purchase Plan. You will also receive access to comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short and long-term disability insurance, life insurance, paid parental leave, and various other discounts and perks. You may also accrue 3 weeks of paid vacation and will be eligible for 10 or more paid holidays per year. Employees accrue paid sick leave pursuant to Company policy which satisfies or exceeds the accrual, carryo","salary_min":160000,"salary_max":225000,"location":"Hawthorne, CA","workplace":"onsite","remote_scope":"not_remote","job_type":"full-time","experience_level":"senior","tags":["deep-learning","fine-tuning","llm","pytorch","computer-vision","agents","data-pipeline","evaluation"],"apply_url":"https://boards.greenhouse.io/spacex/jobs/8517346002?gh_jid=8517346002","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-04-21T21:04:15Z","expires_at":"2026-08-15T14:18:27.044019Z","created_at":"2026-04-22T15:57:46.18082Z","updated_at":"2026-07-16T14:18:27.165771Z","company_name":"SpaceX","company_slug":"spacex","company_logo_url":"https://www.google.com/s2/favicons?domain=spacex.com\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/d4fbb761-0c23-476b-9417-44d0477804b4"},{"id":"200e5f23-8158-4eb8-ab5c-d40e414f8efb","company_id":"6ce2d21e-b00f-4343-9bd0-5ac62ff81431","title":"Senior Machine Learning Engineer (Infra), Driver Understanding and Evaluation","slug":"senior-machine-learning-engineer-infra-driver-understanding-and-evaluation-da1f274f","description":"Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building the Waymo Driver—The World's Most Experienced Driver™—to improve access to mobility while saving thousands of lives now lost to traffic crashes. The Waymo Driver powers Waymo’s fully autonomous ride-hail service and can also be applied to a range of vehicle platforms and product use cases. The Waymo Driver has provided over ten million rider-only trips, enabled by its experience autonomously driving over 100 million miles on public roads and tens of billions in simulation across 15+ U.S. states.\n The DUE Machine Learning team will build and operate scalable machine learning and data systems, simulation workflow and insight tools, improve and speed up the evaluation and onboard developer journeys. It will combine expert human judgements and advanced machine learning models to deliver training and evaluation data for hundreds of metrics and components that make up the Waymo driver. We are looking for researchers and software engineers who are passionate about developing machine learning techniques for the Evaluation systems on our autonomous vehicles, and have an incessant drive to improve the performance of our technology stack.\n You will: \n \n Build scalable systems for training and fine-tuning large-scale models to evaluate interesting driving behaviors.\n Work at the intersection of data engineering, model development, and simulation Provide guidance on architectural decisions and technical directions. Own large, complex systems, driving architectures that meet technical and business objectives.\n Contribute to the production and optimization of machine learning models aiming to assess Waymo’s expansive fleet of vehicles that cumulatively travel millions of miles.\n Design and scale large distributed systems covering the ML lifecycle, supporting planet-scale dataset generation, model training, and evaluation.\n Collaborate cross-functionally to derive performance and system-level requirements for large ML systems. Translate product/business goals into measurable technical deliverables, ensuring system component alignment.\n \n You have: \n \n M.S. or Ph.D. degree Computer Science, Machine Learning, Artificial Intelligence, or a related technical field, or equivalent practical experience.\n 5+ years in machine learning infrastructure such as developing, designing, scaling, training, deploying, and optimizing large-scale machine learning systems from data to model.\n A history of contributions to machine learning tooling and frameworks e.g. PyTorch, Jax, Tensorflow, Ray, or similar. The candidate should understand both the user facing API and the internal workings. \n Strong expertise in distributed training techniques, including gradient sharding and optimization strategies for scaling large models across ML accelerator profiling tools to uncover performance bottlenecks.\n \n We prefer: \n \n 7+ years in machine learning infrastructure such as developing, designing, scaling, training, deploying, and optimizing large-scale machine learning systems from data to model.\n Experience in the autonomous vehicles domain, robotics, or complex simulation environments.\n Familiarity with large-scale simulation platforms and their integration with ML training workflows.\n The expected base salary range for this full-time position across US locations is listed below. Actual starting pay will be based on job-related factors, including exact work location, experience, relevant training and education, and skill level. Your recruiter can share more about the specific salary range for the role location or, if the role can be performed remote, the specific salary range for your preferred location, during the hiring process.  \n Waymo employees are also eligible to participate in Waymo’s discretionary annual bonus program, equity incentive plan, and generous Company benefits program, subject to eligibility requirements.  \n Salary Range\n $213,000 — $263,000 USD","salary_min":213000,"salary_max":263000,"location":"Mountain View, CA","workplace":"onsite","remote_scope":"not_remote","job_type":"full-time","experience_level":"senior","tags":["pytorch","autonomous-vehicles","fine-tuning","tensorflow","distributed-systems","robotics","infrastructure","evaluation"],"apply_url":"https://careers.withwaymo.com/jobs?gh_jid=7819951","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-04-17T19:03:21Z","expires_at":"2026-08-15T14:05:08.349765Z","created_at":"2026-04-17T19:31:58.91374Z","updated_at":"2026-07-16T14:05:08.469848Z","company_name":"Waymo","company_slug":"waymo","company_logo_url":"https://www.google.com/s2/favicons?domain=waymo.com\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/200e5f23-8158-4eb8-ab5c-d40e414f8efb"}],"market_demand_pack":{"amount_cents":2900,"api_checkout_url":"https://aidevboard.com/api/v1/checkout?product_id=aidevboard_ai_skills_demand_pack","checkout_url":"https://aidevboard.com/market-demand-pack?qc=api-jobs-market-demand-pack\u0026utm_campaign=skills_demand_pack\u0026utm_medium=jobs_api\u0026utm_source=api","currency":"USD","description":"Full ranked public AI/ML demand CSV, source job URLs, and decision brief with market and offer angles.","fulfillment":"automatic_email_after_paid_checkout","human_checkout_url":"https://aidevboard.com/market-demand-pack?qc=api-jobs-market-demand-pack\u0026utm_campaign=skills_demand_pack\u0026utm_medium=jobs_api\u0026utm_source=api","name":"AI Market Demand Pack","next_step":"Open checkout_url for Stripe Checkout, or call api_checkout_url to get the non-charging checkout handoff payload.","price_usd":29,"product_id":"aidevboard_ai_skills_demand_pack","quote_url":"https://aidevboard.com/api/v1/quote?product_id=aidevboard_ai_skills_demand_pack"},"page":1,"per_page":20,"total":84,"total_pages":5}