{"has_next":true,"jobs":[{"id":"a62cc613-162e-4779-81ca-502537d39185","company_id":"a0000000-0000-0000-0000-000000000001","title":"Performance Engineer, Inference Systems","slug":"performance-engineer-inference-systems-d02d5600","description":"About Anthropic \n Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.\n About the Role \n Anthropic's inference fleet serves Claude to millions of users across our own products and the world's largest cloud platforms. The stack that makes this possible is deep and tightly coupled: accelerator kernels, model servers, distributed routing, autoscaling, capacity management. Every layer affects the others, often in ways that are hard to see in isolation.\n The Inference System Dynamics team is responsible for understanding that whole system and holding it to a high bar across four dimensions: throughput, latency, reliability, and correctness . We measure how the fleet performs against its theoretical performance frontier, run cross-layer investigations to explain the gaps, and own the correctness checks that make sure Claude's outputs are right, not just fast, across hardware platforms and serving configurations. We don't own the individual components. We instrument and model them, find the highest-leverage opportunities across them, and partner with the owning teams to land the wins.\n You'll work across all four areas. One week that might mean tracing a tail-latency regression from request timing down through routing and batching into a kernel overhead; the next it might mean tightening a correctness eval so it catches an output regression introduced by a quantization change. We're looking for performance engineers who treat correctness as part of performance.\n Key Responsibilities \n \n Run cross-layer performance investigations across throughput, latency, and reliability, sizing the gap between actual fleet performance and theoretical rooflines, identifying root causes, and quantifying the value of closing them\n Own and improve the correctness evaluation pipeline that validates model output quality across hardware platforms, numerics, and serving configurations, and lead the investigation when it catches a regression\n Build the observability, dashboards, and modeling tools that make throughput, latency, cost, reliability, correctness, and their interactions legible across the stack\n Partner with kernel, serving, routing, autoscaling, and capacity teams to prioritize and land the highest-impact optimizations your analysis surfaces\n Ruthlessly stack-rank a large surface area of opportunities by impact and effort, and say no to the ones that don't make the cut\n \n Minimum Qualifications \n \n Hands-on performance engineering experience: profiling, roofline analysis, latency/throughput optimization, and root-cause investigation in complex production systems\n Proficiency in Python, with the ability to read, instrument, and contribute to large production codebases you didn’t write\n Solid data analysis skills (e.g. SQL, pandas, or similar) sufficient to turn raw telemetry into clear findings\n Ability to communicate quantitative results clearly in writing to influence priorities on teams you don't manage\n Genuine interest in correctness as an engineering discipline: numerics, evaluation design, regression detection\n \n Preferred Qualifications \n \n Experience with ML systems, especially training or inference infrastructure or general LLM serving stacks. Direct large-scale inference experience is a strong plus\n Familiarity with GPU/TPU/accelerator performance concepts (memory bandwidth, kernel overheads, quantization, collective communication). Reasoning about these matters more than having written kernels yourself\n Experience with reliability engineering for high-throughput services: autoscaling, load balancing, request routing, tail latency\n Experience with model evaluation or numerical regression-detection pipelines\n Experience building observability or telemetry for distributed systems\n Comfortable having impact through influence and evidence rather than direct ownership\n \n Representative Projects \n \n Trace a 350ms latency gap on a new accelerator platform from end-to-end request timing down to a server scheduling overhead, quantify the win, and land the fix directly or with the owning team\n Redesign the correctness eval gate: determine which signals reliably catch real model-output regressions versus noise, and make it the trusted release criterion across hardware backends\n Build a FLOPs funnel that breaks down where compute actually goes across the fleet, exposing the gap between achieved throughput and kernel rooflines\n Root-cause a numerical divergence between two hardware platforms to a specific kernel change, and define the acceptance threshold going forward\n Model the latency–cost impact of changing batch-sizing and utilization targets, and turn the result into the signal the autoscaler uses in production\n \n Deadline to apply: None. Applications ","salary_min":350000,"salary_max":850000,"location":"San Francisco, CA","workplace":"hybrid","job_type":"full-time","experience_level":"principal","tags":["distributed-systems","alignment","llm","research","inference"],"apply_url":"https://job-boards.greenhouse.io/anthropic/jobs/5224564008","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-05-20T22:53:11Z","expires_at":"2026-06-29T14:00:19.065435Z","created_at":"2026-05-27T14:00:24.711949Z","updated_at":"2026-05-30T14:00:19.17401Z","company_name":"Anthropic","company_slug":"anthropic","company_logo_url":"https://www.google.com/s2/favicons?domain=anthropic.com\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/a62cc613-162e-4779-81ca-502537d39185"},{"id":"9dd96c9b-0c11-4da4-bd2a-3dc613470428","company_id":"c587b06c-b6f0-4d1d-b694-6fb6abc2a6bb","title":"Senior Product Manager, Inference","slug":"senior-product-manager-inference-d8adb377","description":"Who We Are \n Lightning AI is the company behind PyTorch Lightning. Founded in 2019, we build an end-to-end platform for developing, training, and deploying AI systems—designed to take ideas from research to production with less friction.\n Through our merger with Voltage Park, a neocloud and AI Factory, Lightning AI combines developer-first software with cost-efficient, large-scale compute. Teams get the tools they need for experimentation, training, and production inference, with security, observability, and control built in.\n We serve solo researchers, startups, and large enterprises. Lightning AI operates globally with offices in New York City, San Francisco, Seattle, and London, and is backed by Coatue, Index Ventures, Bain Capital Ventures, and Firstminute.\n  \n What We're Looking For \n We're looking for a Founding Product Manager for Inference who will own this product end-to-end—roadmap, pricing, and GTM—from the ground up. This is a zero-to-one role at the intersection of deep technical fluency and commercial instinct. You'll define what we build and why, design the developer journey from first API call to production workload, and be the product voice in sales cycles and the market.\n The right candidate has lived inside the machine; they've operated model serving infrastructure or shipped products on top of it, and can move fluidly between a latency/throughput tradeoff conversation with an infra engineer and a positioning conversation with a sales lead. You will be joining the Product Team and report to our VP of Product working directly with our executive team as we grow this business. \n This is a hybrid role based in our New York City or San Francisco office with in-office requirements of 2 days per week. \n What You’ll Do \n \n Define Lightning AI's inference product vision and roadmap — what we build, what we don't, and in what order — translating the competitive landscape (vLLM, Together, Fireworks, Modal, hyperscaler inference APIs) into a differentiated strategy grounded in Lightning's compute and software advantage\n Own inference pricing and packaging end-to-end: design the model (per-token, per-second, reserved capacity), run pricing experiments with Growth and Finance, and define the tiers that convert self-serve developers into enterprise contracts\n Be the product voice in GTM: develop sales positioning, answer technical objections in the field, and partner with Marketing on the benchmarks, reference architectures, and developer content that builds credibility with ML engineers and platform teams\n Own the developer journey from API key to production-scale deployment — identify and remove friction across onboarding, documentation, SDK ergonomics, and dashboard observability\n Lead experiments across activation flows, pricing pages, and upgrade prompts; track and move DAU/MAU, Time to Value, Activation %, PQLs, and expansion revenue\n Partner with engineering to write tight specs and make fast build/buy/partner decisions; collaborate across Product to ensure inference coheres with training, fine-tuning, and storage surfaces\n Establish inference-specific metrics — throughput, latency SLAs, cold-start behavior, cost per token — and build the instrumentation to track them\n \n What You’ll Need \n \n 7+ years of product management experience, with at least 3 years in infrastructure, platform, or developer tooling products\n Direct, hands-on experience with model serving or inference infrastructure — you've shipped in this space; you understand quantization, batching strategies, KV cache, and speculative decoding at a level that lets you go deep with ML engineers\n Proven track record owning product pricing and packaging decisions, not just feature decisions — you've modeled unit economics and made calls that affected margin\n Experience with a PLG or trial-to-paid motion in a developer product; you know how to build self-serve growth loops and run rigorous A/B experiments\n Strong analytical skills — comfortable with product instrumentation, metrics, and dashboards; you pull your own data\n Excellent written and verbal communication; you can write a crisp one-pager, a technical spec, and a customer-facing benchmark brief with equal fluency\n Bias for action and comfort operating with high ambiguity in a fast-moving environment\n Bachelor's degree in Computer Science, Engineering, or related technical field (or equivalent practical experience)\n Bonus: Prior experience at a neocloud, hyperscaler inference team, or AI infrastructure startup; familiarity with the PyTorch/Lightning ecosystem; background in GPU cluster products or consumption-based infrastructure pricing\n \n  \n Compensation \n We are committed to offering competitive compensation that reflects the value each team member brings to our mission. Final offers are based on factors such as experience, skills, geographic location, and role expectations. In addition to base salary, our total rewards package for eligible roles incl","salary_min":160000,"salary_max":275000,"location":"New York, NY","workplace":"hybrid","job_type":"full-time","experience_level":"senior","tags":["pytorch","mlops","fine-tuning","gpu","llm","inference"],"apply_url":"https://job-boards.greenhouse.io/lightningai/jobs/7702094003","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-05-12T18:21:09Z","expires_at":"2026-06-29T14:03:03.748651Z","created_at":"2026-05-14T14:03:38.883824Z","updated_at":"2026-05-30T14:03:03.857752Z","company_name":"Lightning AI","company_slug":"lightning-ai","company_logo_url":"https://www.google.com/s2/favicons?domain=lightning.ai\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/9dd96c9b-0c11-4da4-bd2a-3dc613470428"},{"id":"c92a34c5-38f3-4162-b7f3-a5e5c8ab11f7","company_id":"a0000000-0000-0000-0000-000000000001","title":"Staff + Sr. Software Engineer, Cloud Inference Launch Engineering","slug":"staff-sr-software-engineer-cloud-inference-launch-engineering-e1f8fd98","description":"About Anthropic \n Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.\n About the Role\n The Cloud Inference team scales and optimizes Claude to serve the massive audiences of developers and enterprise companies across AWS, GCP, Azure, and future cloud service providers (CSPs). We own the end-to-end product of Claude on each cloud platform, from API integration and intelligent request routing to inference execution, capacity management, and day-to-day operations.\n Within Cloud Inference, the model \u0026 inference launch team owns the validation pipeline for our inference server and load balancer on these platforms. We're responsible for every inference change — model launches, performance improvements, safeguard integrations — landing on cloud platforms with correctness, performance, and reliability intact.\n This is high-leverage infrastructure work: validation has to be fast and cheap enough to run on the same accelerators that serve customers, trustworthy enough to replace manual checks, and consistent enough that a change working on Anthropic first-party means it works everywhere. This directly determines how fast frontier models and features ship to every cloud platform, and how quickly performance wins reach production — reclaiming capacity at a time when compute is our scarcest resource.\n What You'll Do\n \n Be on the critical path for frontier model launches, bringing up inference for new model architectures and shipping them to cloud platforms in lockstep with our first-party platform\n Work with the core inference team to bring new inference features (e.g. structured sampling, prompt caching, and more) to cloud platforms, owning the platform-specific integration that gets them to production\n Identify and dive deep on the gaps that make inference behave differently across first-party and CSPs — config drift, observability, deployment patterns, hard cross-platform bugs — and fix them at the source rather than building platform-specific workarounds\n Design, build, and own the CI/CD infrastructure for the inference server and load balancer across cloud platforms, with shadow traffic, performance baselines (throughput and latency), and correctness checks that catch regressions before production\n Drive down merge-to-production cycle time by making validation faster, more parallel, and cost-effective enough to run on the same constrained accelerator pool that serves customers, without trading away reliability \n Analyze observability data across providers to identify performance bottlenecks, cost anomalies, and regressions, and drive remediation based on real-world production workloads\n \n You May Be a Good Fit If You:\n \n Have a strong interest in LLM serving; prior inference or ML experience is not required \n Have significant software engineering experience, with a strong background in high-performance, large-scale distributed systems serving millions of users\n Have a track record of building automation or test infrastructure that measurably improved release velocity or reliability\n Have experience building or operating services on at least one major cloud platform (AWS, GCP, or Azure), with exposure to Kubernetes, Infrastructure as Code, or container orchestration\n Thrive in cross-functional collaboration with both internal teams and external partners\n Are a fast learner who can quickly ramp up on new technologies, hardware platforms, and provider ecosystems\n Are highly autonomous and take ownership of problems end-to-end, including work that falls outside your job description\n \n Strong Candidates May Also Have Experience With:\n \n LLM inference optimization, batching, and caching strategies\n Capacity-constrained scheduling or shared-resource test infrastructure\n Solid understanding of multi-region deployments, request routing, load balancing, global traffic management\n Working with CSP partner teams to scale infrastructure across multiple platforms, navigating differences in networking, security, privacy, and managed service\n Proficiency in Python or Rust\n The annual compensation range for this role is listed below. \n For sales roles, the range provided is the role’s On Target Earnings (\"OTE\") range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role.\n Annual Salary:\n $320,000 — $485,000 USD \n Logistics \n Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience\n Required field of study:  A field relevant to the role as demonstrated through coursework, training, or professional experience\n Minimum years of experience: Years of experience required will correlate with the internal job level requirements for the posit","salary_min":320000,"salary_max":485000,"location":"San Francisco, CA","workplace":"hybrid","job_type":"full-time","experience_level":"lead","tags":["alignment","distributed-systems","llm","inference","infrastructure"],"apply_url":"https://job-boards.greenhouse.io/anthropic/jobs/5215028008","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-05-08T21:07:33Z","expires_at":"2026-06-29T14:00:28.188979Z","created_at":"2026-05-10T14:00:33.138268Z","updated_at":"2026-05-30T14:00:28.301764Z","company_name":"Anthropic","company_slug":"anthropic","company_logo_url":"https://www.google.com/s2/favicons?domain=anthropic.com\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/c92a34c5-38f3-4162-b7f3-a5e5c8ab11f7"},{"id":"083d93cf-06d3-457c-a124-170650bc5995","company_id":"31ae48bc-c938-4c26-a348-0bf3c089a446","title":"Staff Software Engineer, Inference","slug":"staff-software-engineer-inference-18d02a95","description":"CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at  www.coreweave.com . \n What You’ll Do: \n Inference Platform Team The Inference team builds and operates CoreWeave’s Kubernetes-native inference platform, powering low-latency, high-throughput AI workloads at massive scale. The team is responsible for request routing, scheduling, GPU resource management, and system-wide optimizations that drive performance, efficiency, and reliability across real-time inference systems.\n About the role: As a Staff Software Engineer (IC5) on the Inference team, you will act as a technical leader driving architecture, performance, and reliability across multiple services and teams. Your day-to-day will involve leading cross-team design initiatives, optimizing inference performance (latency, throughput, and GPU utilization), and improving system reliability at scale. You will work deeply in distributed systems and Kubernetes-based infrastructure, focusing on areas like scheduling, batching, and memory optimization. This role requires hands-on technical leadership and the ability to influence engineering direction across the organization.\n Who You Are: \n \n 8–12+ years of experience building and operating large-scale distributed systems or cloud platforms\n Proven experience leading cross-team technical initiatives impacting multiple services or organizations\n Strong programming skills in Go, Python, or C++\n Deep expertise in Kubernetes at production scale, including orchestration, scheduling, and service design\n Strong understanding of distributed systems, networking, and performance optimization\n Experience designing and operating low-latency, high-throughput systems with strict P95/P99 latency requirements\n Hands-on experience with inference systems, including batching or micro-batching strategies, caching, and memory optimization\n Experience improving system performance using metrics-driven approaches (e.g., latency, throughput, utilization)\n Familiarity with mixed precision (BF16, FP8) and streaming inference workloads\n \n Preferred: \n \n Experience with inference frameworks such as vLLM, Triton, TensorRT-LLM, Ray Serve, or TorchServe\n Experience with GPU systems and performance optimization (CUDA, NCCL, RDMA, NUMA, GPU interconnects)\n Experience leading multi-team or org-level technical initiatives\n Exposure to large-scale AI/ML infrastructure or hyperscale cloud environments\n \n Wondering if you’re a good fit? We believe in investing in our people, and value candidates who can bring their own diversified experiences to our teams – even if you aren't a 100% skill or experience match. Here are a few qualities we’ve found compatible with our team. If some of this describes you, we’d love to talk.\n \n You love to design and optimize high-performance distributed systems at scale\n You’re curious about AI inference, GPU systems, and emerging performance techniques\n You’re an expert in building reliable, low-latency infrastructure and driving system-wide improvements\n \n Why CoreWeave? At CoreWeave, we work hard, have fun, and move fast! We’re in an exciting stage of hyper-growth that you will not want to miss out on. We’re not afraid of a little chaos, and we’re constantly learning. Our team cares deeply about how we build our product and how we work together, which is represented through our core values:\n \n Be Curious at Your Core\n Act Like an Owner\n Empower Employees\n Deliver Best-in-Class Client Experiences\n Achieve More Together\n \n We support and encourage an entrepreneurial outlook and independent thinking. We foster an environment that encourages collaboration and enables the development of innovative solutions to complex problems. As we get set for takeoff, the organization's growth opportunities are constantly expanding. You will be surrounded by some of the best talent in the industry, who will want to learn from you, too. Come join us!\n The base salary range for this role is $188,000 to $275,000. The starting salary will be determined based on job-related knowledge, skills, experience, and market location. We strive for both market alignment and internal equity when determining compensation. In addition to base salary, our total rewards package includes a discretionary bonus, equity awards, and a comprehensive benefits program (all based on eligibility).\n What We Offer \n The range we’ve posted represents the typical compensation range for this role. To determine actual compensation, we review the market rate for each candidate which can include ","salary_min":188000,"salary_max":275000,"location":"Sunnyvale, CA","workplace":"hybrid","job_type":"full-time","experience_level":"lead","tags":["gpu","distributed-systems","llm","inference"],"apply_url":"https://coreweave.com/careers/job?4670593006\u0026board=coreweave\u0026gh_jid=4670593006","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-05-07T17:57:21Z","expires_at":"2026-06-29T14:04:53.533418Z","created_at":"2026-05-08T14:04:51.66823Z","updated_at":"2026-05-30T14:04:53.642377Z","company_name":"CoreWeave","company_slug":"coreweave","company_logo_url":"https://www.google.com/s2/favicons?domain=coreweave.com\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/083d93cf-06d3-457c-a124-170650bc5995"},{"id":"bd79bb04-66ea-4f13-bf77-4eace0ab81cb","company_id":"9bad7e3a-74e6-4dae-87c5-f3e9f0e72bd0","title":"Senior AI Inference Engineer - Model Optimization \u0026 Deployment","slug":"ai-inference-engineer-model-optimization-deployment-555a84d0","description":"The Perception team is pioneering the development of a multi-modality foundation model to drive the next generation of autonomous system intelligence.\n\nAs a Model Optimization \u0026 Deployment Engineer, you will focus on bringing highly efficient, production-ready large-scale models to our on-vehicle stack. We are looking for experts with hands-on experience in compressing, accelerating, and deploying complex models (LLMs, VLMs, or FMs) for power- and thermal-constrained vehicle SOCs. You will optimize the ML models, write custom CUDA kernels, and build highly concurrent inference code to ensure real-time, deterministic execution on edge devices.\n","salary_min":242000,"salary_max":290000,"location":"Foster City, CA","workplace":"onsite","job_type":"full-time","experience_level":"senior","tags":["gpu","generative-ai","llm","inference"],"apply_url":"https://jobs.lever.co/zoox/c88c8b02-71b6-492c-a666-584458ac8c6e/apply","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-04-11T00:04:50.918Z","expires_at":"2026-06-29T14:05:48.64447Z","created_at":"2026-04-13T09:41:58.281088Z","updated_at":"2026-05-30T14:05:48.753885Z","company_name":"Zoox","company_slug":"zoox","company_logo_url":"https://www.google.com/s2/favicons?domain=zoox.com\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/bd79bb04-66ea-4f13-bf77-4eace0ab81cb"},{"id":"193acf09-f52a-4b58-bf59-96e6c3ba7d59","company_id":"332b7698-676b-4a3e-8b02-81b1195c5af6","title":"Sr. Manager, Engineering - AI Gateway (LLM Inference)","slug":"sr-manager-engineering-ai-gateway-llm-inference-25d27b2e","description":"RDQ127R255 \n At Databricks, we are passionate about enabling data teams to solve the world’s toughest problems — from making the next mode of transportation a reality to accelerating the development of medical breakthroughs. We do this by building and running the world’s best data and AI infrastructure platform so our customers can use deep data insights to improve their business. Founded by engineers — and customer obsessed — we leap at every opportunity to tackle technical challenges, from designing next-gen UI/UX for interfacing with data to scaling our services and infrastructure across millions of virtual machines. And we're only getting started. \n The impact you will have  \n As the Sr. Manager of Engineering in our newly opened NYC engineering hub, you will be leading teams building multiple products that enable our customers to use innovative ML/AI techniques to add value to their business. You will work closely with product and research on cutting-edge technologies and be responsible for launching and growing new products, playing a key role in defining the future of how AI and data applications are built. \n The Databricks AI Gateway is an enterprise control plane for governing, routing, and monitoring LLM endpoints, coding agents, and model serving endpoints on Databricks. AI Gateway lets you standardize, secure, and observe all LLM inference traffic on Databricks, while capturing detailed production data to manage cost, performance, and quality over time. \n What we look for \n \n 8+ years of industry experience building and supporting large-scale data or production applications \n Experience building classic ML or GenAI systems. \n A passion for building and scaling early-stage products \n Ability to adapt in a fast paced environment.  \n Motivated by delivering customer value and impact. \n Experience driving company initiatives towards customer satisfaction. \n BS/MS/PhD in Computer Science or related majors, or equivalent experience. \n  \n Pay Range Transparency \n Databricks is committed to fair and equitable compensation practices. The pay range(s) for this role is listed below and represents the expected salary range for non-commissionable roles or on-target earnings for commissionable roles.  Actual compensation packages are based on several factors that are unique to each candidate, including but not limited to job-related skills, depth of experience, relevant certifications and training, and specific work location. Based on the factors above, Databricks anticipates utilizing the full width of the range. The total compensation package for this position may also include eligibility for annual performance bonus, equity, and the benefits listed above. For more information regarding which range your location is in visit our page here . \n  \n Local Pay Range\n $228,600 — $314,250 USD \n About Databricks \n Databricks is the data and AI company. More than 10,000 organizations worldwide — including Comcast, Condé Nast, Grammarly, and over 50% of the Fortune 500 — rely on the Databricks Data Intelligence Platform to unify and democratize data, analytics and AI. Databricks is headquartered in San Francisco, with offices around the globe and was founded by the original creators of Lakehouse, Apache Spark™, Delta Lake and MLflow. To learn more, follow Databricks on  Twitter ,  LinkedIn   and   Facebook . Benefits At Databricks, we strive to provide comprehensive benefits and perks that meet the needs of all of our employees. For specific details on the benefits offered in your region click here . \n Our Commitment to Diversity and Inclusion \n At Databricks, we are committed to fostering a diverse and inclusive culture where everyone can excel. We take great care to ensure that our hiring practices are inclusive and meet equal employment opportunity standards. Individuals looking for employment at Databricks are considered without regard to age, color, disability, ethnicity, family or marital status, gender identity or expression, language, national origin, physical and mental ability, political affiliation, race, religion, sexual orientation, socio-economic status, veteran status, and other protected characteristics.\n Compliance \n If access to export-controlled technology or source code is required for performance of job duties, it is within Employer's discretion whether to apply for a U.S. government license for such positions, and Employer may decline to proceed with an applicant on this basis alone.","salary_min":228600,"salary_max":314250,"location":"New York, NY","workplace":"onsite","job_type":"full-time","experience_level":"senior","tags":["mlops","data-pipeline","llm","generative-ai","inference"],"apply_url":"https://databricks.com/company/careers/open-positions/job?gh_jid=8491947002","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-04-02T17:25:50Z","expires_at":"2026-06-29T14:01:59.702397Z","created_at":"2026-04-13T09:37:48.099826Z","updated_at":"2026-05-30T14:01:59.81009Z","company_name":"Databricks","company_slug":"databricks","company_logo_url":"https://www.google.com/s2/favicons?domain=databricks.com\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/193acf09-f52a-4b58-bf59-96e6c3ba7d59"},{"id":"9dab3718-a3e9-422b-92ff-c266ccd0ebd9","company_id":"a0000000-0000-0000-0000-000000000001","title":"Staff Software Engineer, Cloud Inference Safeguards","slug":"software-engineer-cloud-inference-safeguards-666b9557","description":"About Anthropic \n Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.\n About the role \n We are seeking a Staff Software Engineer to build and operate the safety, oversight, and intervention mechanisms that protect Claude on third-party cloud service provider (CSP) platforms. As the engineer responsible for Safeguards on those surfaces, you will ensure that every request served through our CSP partners is monitored for misuse, enforced against policy, and compliant with the data residency and privacy commitments that enterprise CSP customers expect.\n You will sit at the seam between the Safeguards organization and the Cloud Inference team: taking classifiers, detection signals, and enforcement policies developed by Safeguards and making them run reliably inside a CSP partner’s infrastructure at serving-path latency and scale. You will own the architecture that lets our safeguards operate within those constraints without gaps. You will build, deploy and operate the multi-layered defenses that catch unwanted model behavior in real time, the telemetry pipelines that give us situational awareness over CSP traffic, and the enforcement hooks that let us act quickly when something goes wrong. Your work will directly determine whether Anthropic can ship frontier models on CSP platforms at the same safety bar we hold ourselves to on our first-party API.\n Responsibilities: \n \n Build, deploy and operate real-time safeguards infrastructure—classifiers, rate limits, enforcement actions, and intervention hooks—embedded directly in the third-party CSP inference serving path\n Design and maintain the data residency and privacy architecture for safeguards signals on CSP platforms, ensuring we can detect abuse and monitor model behavior while honoring regionalization boundaries and enterprise contractual commitments\n Develop telemetry, logging, and evaluation pipelines that give Safeguards, Policy, and T\u0026S operational teams situational awareness over CSP traffic and close the visibility gap between third-party and first-party serving\n Dive into the CSP serving stack to identify the lowest-impact points to gather signals or introduce interventions without degrading latency, stability, or overall architecture\n Hold a high operational bar: own on-call, drive root-cause analyses and postmortems for safeguards incidents on CSP platforms, and build systems that reduce the human intervention required to keep Claude safe\n Work closely with Safeguards research, Policy \u0026 Enforcement, the Cloud Inference team, and CSP partner contacts to turn detection research and policy decisions into production enforcement that works inside a partner’s cloud.\n \n You may be a good fit if you: \n \n Have a Bachelor’s degree in Computer Science, Software Engineering, or comparable experience\n Have 7+ years of experience in high-scale, high-reliability software development, ideally with exposure to trust \u0026 safety, anti-abuse, fraud, or integrity systems\n Are proficient in Python and comfortable working across the stack—from request-path services to data pipelines to internal tooling\n Think adversarially: you can see a system from a bad actor’s perspective, anticipate how they will respond to countermeasures, and design defenses in depth rather than single points of enforcement\n Have experience scaling infrastructure to accommodate rapid traffic growth while keeping latency and reliability within tight budgets\n Are deeply interested in the potential transformative effects of advanced AI systems and are committed to ensuring their safe development\n Have strong communication skills and can explain complex technical and risk tradeoffs to non-technical stakeholders across Policy, Legal, and partner organizations\n Enjoy working in a fast-paced, early environment; comfortable with adapting priorities as driven by the rapidly evolving AI space\n \n Strong candidates may also have experience with: \n \n Building trust and safety, anti-spam, fraud, or abuse detection and mitigation mechanisms for AI/ML systems, or the infrastructure to support these systems at scale\n Machine learning serving infrastructure (GPUs/TPUs, inference servers, load balancing) and the operational realities of running models in production\n Major cloud platform internals—IAM, Network/service perimeter controls, regional resource constraints, cloud-native logging/monitoring—or experience shipping software that runs inside a partner’s cloud rather than your own\n Data residency, privacy engineering, or compliance-constrained architectures, particularly where telemetry has to stay within regional or contractual boundaries\n Working closely with operational and human-review teams to build custom internal tooling, adm","salary_min":405000,"salary_max":485000,"location":"San Francisco, CA","workplace":"hybrid","job_type":"full-time","experience_level":"lead","tags":["agents","alignment","data-pipeline","inference","rust"],"apply_url":"https://job-boards.greenhouse.io/anthropic/jobs/5168829008","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-03-27T21:15:24Z","expires_at":"2026-06-29T14:00:26.904903Z","created_at":"2026-04-13T09:36:04.690379Z","updated_at":"2026-05-30T14:00:27.014438Z","company_name":"Anthropic","company_slug":"anthropic","company_logo_url":"https://www.google.com/s2/favicons?domain=anthropic.com\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/9dab3718-a3e9-422b-92ff-c266ccd0ebd9"},{"id":"dab304a0-a220-4948-9fbe-56706ce95749","company_id":"a0000000-0000-0000-0000-000000000001","title":"Engineering Manager, Cloud Inference AWS","slug":"engineering-manager-cloud-inference-aws-6b33b1e6","description":"About Anthropic \n Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.\n About the role\n We are seeking an experienced Engineering Manager to lead the Cloud Inference team for AWS. You will lead your team to scale and optimize Claude to serve the massive audiences of developers and enterprise companies using AWS. You will own the end-to-end product of Claude on AWS, including API, load balancing, inference, capacity and operations. Your team will ensure our LLMs meet rigorous performance, safety and security standards and enhance our core infrastructure for packaging, testing, and deploying inference technology across the globe. Your work will increase the scale at which Anthropic operates and accelerate our ability to reliably launch new frontier models and innovative features to customers across all platforms.\n Responsibilities:\n \n Set technical strategy and oversee development of Claude on AWS across all layers of the technical stack.\n Collaborate across teams and companies to deeply understand product, infrastructure, operations and capacity needs, identifying potential solutions to support frontier LLM serving\n Work closely with cross-functional stakeholders across companies to align on goals and drive outcomes\n Create clarity for the team and stakeholders in an ambiguous and evolving environment\n Take an inclusive approach to hiring and coaching top technical talent, and support a high performing team\n Design and run processes (e.g. postmortem review, incident response, on-call rotations) that help the team operate effectively and never fail the same way twice\n \n You may be a good fit if you:\n \n Have 10+ years of experience in high-scale, high-reliability software development, particularly infrastructure or capacity management\n Have 5+ years of engineering management experience\n Experience recruiting, scaling, and retaining engineering talent in a high growth environment\n Have experience scaling products, resources and operations to accommodate rapid growth\n Are deeply interested in the potential transformative effects of advanced AI systems and are committed to ensuring their safe development\n Excel at building strong relationships and strategy with stakeholders across engineering, product, finance, and sales\n Have experience working with external partners to align goals and deliver impact\n Enjoy working in a fast-paced, early environment; comfortable with adapting priorities as driven by the rapidly evolving AI space\n Have excellent written and verbal communication skills\n Demonstrated success building a culture of belonging and engineering excellence\n Are motivated by developing AI responsibly and safely\n Are willing and able to travel frequently between Seattle and the SF Bay Area\n \n Strong candidates may also have experience with:\n \n Experience with machine learning infrastructure like GPUs, TPUs, or Trainium, as well as supporting networking infrastructure like NCCL\n Experience as a Product Manager\n Experience with deployment and capacity management automation\n Security and privacy best practice expertise\n \n  \n The annual compensation range for this role is listed below. \n For sales roles, the range provided is the role’s On Target Earnings (\"OTE\") range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role.\n Annual Salary:\n $405,000 — $485,000 USD \n Logistics \n Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience\n Required field of study:  A field relevant to the role as demonstrated through coursework, training, or professional experience\n Minimum years of experience: Years of experience required will correlate with the internal job level requirements for the position\n Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.\n Visa sponsorship:  We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.\n We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed.  Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We think AI systems like the ones we're building have","salary_min":405000,"salary_max":485000,"location":"San Francisco, CA","workplace":"hybrid","job_type":"full-time","experience_level":"lead","tags":["alignment","cloud","llm","infrastructure","inference"],"apply_url":"https://job-boards.greenhouse.io/anthropic/jobs/5141377008","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-03-05T22:38:16Z","expires_at":"2026-06-29T14:00:14.227361Z","created_at":"2026-04-13T09:35:52.609644Z","updated_at":"2026-05-30T14:00:14.334149Z","company_name":"Anthropic","company_slug":"anthropic","company_logo_url":"https://www.google.com/s2/favicons?domain=anthropic.com\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/dab304a0-a220-4948-9fbe-56706ce95749"},{"id":"456c9786-1c96-4d96-b7f9-155b3e94cc1d","company_id":"d49c7f16-1314-459a-acab-7b3d38ee01a9","title":"Member of Technical Staff, Inference \u0026 RL Systems","slug":"member-of-technical-staff-inference-rl-systems-f9ee45d5","description":"Magic’s mission is to build safe AGI that accelerates humanity’s progress on the world’s most important problems. We believe the most promising path to safe AGI lies in automating research and code generation to improve models and solve alignment more reliably than humans can alone. Our approach combines frontier-scale pre-training, domain-specific RL, ultra-long context, and inference-time compute to achieve this goal.\n\n\n\n\nABOUT THE ROLE\n\nAs a Software Engineer on the Inference \u0026 RL Systems team, you will design and operate the distributed systems that serve our models in production and power large-scale post-training workflows.\n\nThis role sits at the boundary between model execution and distributed infrastructure. You will work on systems that determine inference latency, throughput, stability, and the reliability of RL and post-training training loops.\n\nMagic’s long-context models introduce demanding execution constraints: KV-cache scaling, memory pressure under long sequences, batching trade-offs, long-horizon trajectory rollouts, and sustained throughput under real-world workloads. You will own the infrastructure that makes both production inference and large-scale RL iteration fast and reliable.\n\n\n\n\nWHAT YOU’LL WORK ON\n\n - Design and scale high-performance inference serving systems\n\n - Optimize KV-cache management, batching strategies, and scheduling\n\n - Improve throughput and latency for long-context workloads\n\n - Build and maintain distributed RL and post-training infrastructure\n\n - Improve reliability of rollout, evaluation, and reward pipelines\n\n - Automate fault detection and recovery for serving and RL systems\n\n - Profile and eliminate performance bottlenecks across GPU, networking, and storage layers\n\n - Collaborate with Kernels and Research to align execution systems with model architecture\n   \n   \n\n\nWHAT WE’RE LOOKING FOR\n\n - Strong software engineering and distributed systems fundamentals\n\n - Experience building or operating large-scale inference or training systems\n\n - Deep understanding of GPU execution constraints and memory trade-offs\n\n - Experience debugging performance issues in production ML systems\n\n - Ability to reason about system-level trade-offs between latency, throughput, and cost\n\n - Track record of owning critical production infrastructure\n\n\n\n\nCOMPENSATION, BENEFITS, AND PERKS (US)\n\n - Annual salary range: $225K - $550K\n\n - Equity is a significant part of total compensation, in addition to salary\n\n - 401(k) plan with 6% salary matching\n\n - Generous health, dental and vision insurance for you and your dependents\n\n - Unlimited paid time off\n\n - Visa sponsorship and relocation stipend to bring you to SF, if possible\n\n - A small, fast-paced, highly focused team\n\nMagic strives to be the place where high-potential individuals can do their best work. We value quick learning and grit just as much as skill and experience.\n\n\n\n\nOUR CULTURE\n\n - Integrity. Words and actions should be aligned\n\n - Hands-on. At Magic, everyone is building\n\n - Teamwork. We move as one team, not N individuals\n\n - Focus. Safely deploy AGI. Everything else is noise\n\n - Quality. Magic should feel like magic","salary_min":225000,"salary_max":550000,"location":"San Francisco, CA","workplace":"onsite","job_type":"full-time","experience_level":"lead","tags":["distributed-systems","pre-training","code-generation","inference"],"apply_url":"https://jobs.ashbyhq.com/magic.dev/427ffdee-d4d1-4a39-a730-4a96435daa67/application","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-02-28T00:34:41.815Z","expires_at":"2026-06-29T14:05:05.551796Z","created_at":"2026-04-13T09:41:02.40373Z","updated_at":"2026-05-30T14:05:05.666034Z","company_name":"Magic","company_slug":"magic","company_logo_url":"https://www.google.com/s2/favicons?domain=magic.dev\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/456c9786-1c96-4d96-b7f9-155b3e94cc1d"},{"id":"063453aa-ddae-4637-abd4-2df24521b725","company_id":"fe4d898e-86f1-400e-95db-988d7f632620","title":"Senior Applied Economist, Causal Inference \u0026 Forecasting","slug":"senior-applied-economist-causal-inference-forecasting-9fb326a7","description":"Navan is seeking a Senior Applied Economist to join the Data Science \u0026 Machine Learning team. This is a foundational, \"first-of-its-kind\" role at Navan, designed for a technical leader who can bridge the gaps between hands-on machine learning, rigorous economic theory, and driving business outcomes.\n In this role, you will be the primary architect of our internal economic \"brain.\" You will move beyond point-estimate forecasting to build sophisticated models that account for market nuances, uncertainty, and causal drivers. You will partner closely with Finance, Treasury, and FP\u0026A to steer the company’s financial trajectory, while providing the strategic frameworks that Sales and Pricing teams use to maximize customer adoption and revenue.\n What You’ll Do: \n \n Next-Generation Forecasting: Uplevel our existing forecasting pipelines (currently built on Prophet). You will integrate econometric rigor to improve accuracy and, crucially, provide a range of likely outcomes (probabilistic forecasting) that Finance and Treasury can rely on for risk management.\n Causal Inference \u0026 Strategy: Design and execute experimental and quasi-experimental frameworks to identify the \"levers\" of the business. You will answer critical questions regarding price elasticity, product feature attribution, and the ROI of sales incentives. \n Strategic Blueprinting: Partner with Sales and Account Management to create data-driven frameworks for pricing and customer retention. You will translate complex causal models into actionable blueprints for go-to-market teams. \n Production-Level Data Science: Work hands-on within our ML infrastructure. You will write production-quality Python code to deploy models into our AWS and Snowflake-based ecosystem, ensuring your insights are automated and scalable. \n Internal Advisory: Act as the subject matter expert on economic literature and methodology, translating technical findings into strategic recommendations for executive leadership. \n \n What We’re Looking For: \n \n Education: An advanced degree (PhD preferred, Masters required) in Economics, Statistics, or a related quantitative field with a heavy emphasis on econometrics or causal inference.\n Experience: 4+ years of post-academic experience in an applied research, finance, or data science role, ideally within a high-growth tech environment or fintech.\n \n Technical Proficiency: \n \n Deep expertise in Python and its data science ecosystem (pandas, statsmodels, scikit-learn, etc.).\n Advanced SQL skills, with experience querying large-scale data warehouses like Snowflake .\n Experience working in  production environments and a strong understanding of the ML lifecycle is nice to have.\n \n \n Econometric Mastery: Proven ability to apply advanced methods (e.g., Synthetic Control, IV, Diff-in-Diff, Structural Modeling) to messy, real-world datasets.\n Self-Starter Mentality: Experience functioning in \"underdefined\" spaces. As our first economist, you must be comfortable setting the roadmap.\n Communication: The ability to explain not just the \"what,\" but the \"why\" and the \"what if.\" You can communicate uncertainty and risk to a CFO just as clearly as you can discuss model architecture with an ML Engineer.\n Preferred Qualifications: \n \n Prior experience in Fintech, Payments, or Travel industries.\n Experience building and scaling \"first-of-their-kind\" functions within a data organization.\n \n The posted pay range represents the anticipated low and high end of the compensation for this position and is subject to change based on business need. To determine a successful candidate’s starting pay, we carefully consider a variety of factors, including primary work location, an evaluation of the candidate’s skills and experience, market demands, and internal parity. For roles with on-target-earnings (OTE), the pay range includes both base salary and target incentive compensation. Target incentive compensation for some roles may include a ramping draw period. Compensation is higher for those who exceed targets. Candidates may receive more information from the recruiter.\n Pay Range\n $121,500 — $270,000 USD","salary_min":121500,"salary_max":270000,"location":"San Francisco, CA","workplace":"onsite","job_type":"full-time","experience_level":"senior","tags":["payments","cloud","data-pipeline","inference"],"apply_url":"https://navan.com/careers/openings?gh_jid=6748963","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-02-19T19:49:33Z","expires_at":"2026-06-29T14:17:55.986875Z","created_at":"2026-04-17T02:26:52.365041Z","updated_at":"2026-05-30T14:17:56.103019Z","company_name":"Navan","company_slug":"navan","company_logo_url":"https://www.google.com/s2/favicons?domain=navan.com\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/063453aa-ddae-4637-abd4-2df24521b725"},{"id":"dee62f29-1040-4a4e-80b0-8e3bfa659387","company_id":"fe4d898e-86f1-400e-95db-988d7f632620","title":"Senior Applied Economist, Causal Inference \u0026 Forecasting","slug":"senior-applied-economist-causal-inference-forecasting-e287a80a","description":"Navan is seeking a Senior Applied Economist to join the Data Science \u0026 Machine Learning team. This is a foundational, \"first-of-its-kind\" role at Navan, designed for a technical leader who can bridge the gaps between hands-on machine learning, rigorous economic theory, and driving business outcomes.\n In this role, you will be the primary architect of our internal economic \"brain.\" You will move beyond point-estimate forecasting to build sophisticated models that account for market nuances, uncertainty, and causal drivers. You will partner closely with Finance, Treasury, and FP\u0026A to steer the company’s financial trajectory, while providing the strategic frameworks that Sales and Pricing teams use to maximize customer adoption and revenue.\n What You’ll Do: \n \n Next-Generation Forecasting: Uplevel our existing forecasting pipelines (currently built on Prophet). You will integrate econometric rigor to improve accuracy and, crucially, provide a range of likely outcomes (probabilistic forecasting) that Finance and Treasury can rely on for risk management.\n Causal Inference \u0026 Strategy: Design and execute experimental and quasi-experimental frameworks to identify the \"levers\" of the business. You will answer critical questions regarding price elasticity, product feature attribution, and the ROI of sales incentives. \n Strategic Blueprinting: Partner with Sales and Account Management to create data-driven frameworks for pricing and customer retention. You will translate complex causal models into actionable blueprints for go-to-market teams. \n Production-Level Data Science: Work hands-on within our ML infrastructure. You will write production-quality Python code to deploy models into our AWS and Snowflake-based ecosystem, ensuring your insights are automated and scalable. \n Internal Advisory: Act as the subject matter expert on economic literature and methodology, translating technical findings into strategic recommendations for executive leadership. \n \n What We’re Looking For: \n \n Education: An advanced degree (PhD preferred, Masters required) in Economics, Statistics, or a related quantitative field with a heavy emphasis on econometrics or causal inference.\n Experience: 4+ years of post-academic experience in an applied research, finance, or data science role, ideally within a high-growth tech environment or fintech.\n \n Technical Proficiency: \n \n Deep expertise in Python and its data science ecosystem (pandas, statsmodels, scikit-learn, etc.).\n Advanced SQL skills, with experience querying large-scale data warehouses like Snowflake .\n Experience working in  production environments and a strong understanding of the ML lifecycle is nice to have.\n \n \n Econometric Mastery: Proven ability to apply advanced methods (e.g., Synthetic Control, IV, Diff-in-Diff, Structural Modeling) to messy, real-world datasets.\n Self-Starter Mentality: Experience functioning in \"underdefined\" spaces. As our first economist, you must be comfortable setting the roadmap.\n Communication: The ability to explain not just the \"what,\" but the \"why\" and the \"what if.\" You can communicate uncertainty and risk to a CFO just as clearly as you can discuss model architecture with an ML Engineer.\n Preferred Qualifications: \n \n Prior experience in Fintech, Payments, or Travel industries.\n Experience building and scaling \"first-of-their-kind\" functions within a data organization.\n \n The posted pay range represents the anticipated low and high end of the compensation for this position and is subject to change based on business need. To determine a successful candidate’s starting pay, we carefully consider a variety of factors, including primary work location, an evaluation of the candidate’s skills and experience, market demands, and internal parity. For roles with on-target-earnings (OTE), the pay range includes both base salary and target incentive compensation. Target incentive compensation for some roles may include a ramping draw period. Compensation is higher for those who exceed targets. Candidates may receive more information from the recruiter.\n Pay Range\n $121,500 — $270,000 USD","salary_min":121500,"salary_max":270000,"location":"New York, NY","workplace":"onsite","job_type":"full-time","experience_level":"senior","tags":["data-pipeline","payments","cloud","inference"],"apply_url":"https://navan.com/careers/openings?gh_jid=7637605","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-02-19T19:49:32Z","expires_at":"2026-06-29T14:17:56.06822Z","created_at":"2026-04-17T02:26:52.37136Z","updated_at":"2026-05-30T14:17:56.185919Z","company_name":"Navan","company_slug":"navan","company_logo_url":"https://www.google.com/s2/favicons?domain=navan.com\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/dee62f29-1040-4a4e-80b0-8e3bfa659387"},{"id":"dd9f2414-9dde-4323-b5bd-735f50c690e5","company_id":"a0000000-0000-0000-0000-000000000001","title":"Staff + Sr. Software Engineer, Cloud Inference","slug":"staff-senior-software-engineer-cloud-inference-b1214925","description":"About Anthropic \n Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.\n About the Role \n The Cloud Inference team scales and optimizes Claude to serve the massive audiences of developers and enterprise companies across AWS, GCP, Azure, and future cloud service providers (CSPs). We own the end-to-end product of Claude on each cloud platform, from API integration and intelligent request routing to inference execution, capacity management, and day-to-day operations.\n Our engineers are extremely high leverage: we simultaneously drive multiple major revenue streams while optimizing one of Anthropic's most precious resources: compute. As we expand to more cloud platforms, the complexity of managing inference efficiently across providers with different hardware, networking stacks, and operational models grows significantly. We need product-minded backend engineers who can navigate these platform differences, design the services and abstractions that work across providers, and make architectural decisions that keep us reliable and cost-effective at massive scale.\n Your work will increase the scale at which our services operate, accelerate our ability to reliably launch new frontier models and innovative features to customers across all platforms, and ensure our LLMs meet rigorous safety, performance, and security standards.\n  \n What You'll Do \n \n Design, build, and own backend services and infrastructure that serve Claude across multiple CSPs, accounting for differences in compute hardware, networking, APIs, and operational models\n Work cross-functionally with internal inference, product API, systems, and security teams, among others, and with CSP partners to stand up the full serving stack on new cloud platforms, resolve operational issues, and influence provider roadmaps\n Build and evolve CI/CD automation systems, including validation and deployment pipelines, that reliably ship new model versions to millions of users across cloud platforms without regressions\n Design interfaces and tooling abstractions across CSPs that enable cost-effective inference management, scale across providers, and reduce per-platform complexity\n Contribute to capacity planning, autoscaling, and workload routing strategies that match supply with demand and direct requests to the most cost-effective accelerator and region\n Analyze observability data across providers to identify performance bottlenecks, cost anomalies, and regressions, and drive remediation based on real-world production workloads\n \n You May Be a Good Fit If You: \n \n Have significant software engineering experience, with a strong background in high-performance, large-scale distributed systems serving millions of users\n Have experience building or operating services on at least one major cloud platform (AWS, GCP, or Azure), with exposure to Kubernetes, Infrastructure as Code, or container orchestration\n Are curious about LLM serving; prior inference or ML experience is not required\n Thrive in cross-functional collaboration with both internal teams and external partners\n Are a fast learner who can quickly ramp up on new technologies, hardware platforms, and provider ecosystems\n Are highly autonomous and take ownership of problems end-to-end, including work that falls outside your job description\n \n Strong Candidates May Also Have Experience With \n \n Direct experience working with CSPs to scale infrastructure or products across multiple platforms, navigating differences in networking, security, privacy, billing, and managed service offerings\n Have experience working with external partners to align goals and deliver impact\n Hands-on experience with capacity management, cost optimization, or resource planning at scale across heterogeneous environments\n Solid understanding of multi-region deployments, geographic routing, and global traffic management\n Proficiency in Python or Rust\n The annual compensation range for this role is listed below. \n For sales roles, the range provided is the role’s On Target Earnings (\"OTE\") range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role.\n Annual Salary:\n $320,000 — $485,000 USD \n Logistics \n Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience\n Required field of study:  A field relevant to the role as demonstrated through coursework, training, or professional experience\n Minimum years of experience: Years of experience required will correlate with the internal job level requirements for the position\n Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require mo","salary_min":320000,"salary_max":485000,"location":"San Francisco, CA","workplace":"hybrid","job_type":"full-time","experience_level":"lead","tags":["distributed-systems","alignment","llm","payments","infrastructure","inference"],"apply_url":"https://job-boards.greenhouse.io/anthropic/jobs/5107466008","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-02-03T18:44:30Z","expires_at":"2026-06-29T14:00:28.108275Z","created_at":"2026-04-13T09:36:06.011877Z","updated_at":"2026-05-30T14:00:28.219347Z","company_name":"Anthropic","company_slug":"anthropic","company_logo_url":"https://www.google.com/s2/favicons?domain=anthropic.com\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/dd9f2414-9dde-4323-b5bd-735f50c690e5"},{"id":"14275ac7-838a-45c5-85ae-7c96258ff159","company_id":"31ae48bc-c938-4c26-a348-0bf3c089a446","title":"Senior Software Engineer I, Inference","slug":"senior-software-engineer-i-inference-19e7e2ea","description":"CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at  www.coreweave.com . \n What You’ll Do: \n Senior engineers are area owners who lead designs, raise engineering standards, and deliver measurable improvements to latency, throughput, and reliability across multiple services. You’ll partner with product, orchestration, and hardware teams to evolve our Kubernetes-native inference platform and meet strict P99 SLAs at scale.\n About the role: \n \n Lead design reviews and drive architecture within the team; decompose multi-service work into clear milestones.\n Define and own SLIs/SLOs; ensure post-incident actions land and reliability improves release-over-release.\n Implement advanced optimizations (e.g., micro-batch schedulers, speculative decoding, KV-cache reuse) and quantify impact.\n Strengthen incident posture: capacity planning, autoscaling policy, graceful degradation, rollback/traffic-shift strategies.\n Mentor IC1/IC2 engineers; review cross-team designs and elevate coding/testing standards.\n For IC4: own an area spanning multiple services and teams (e.g., request routing \u0026 adaptive scheduling, cost-per-token analytics, GPU resource isolation).\n \n Who You Are: \n \n IC3: ~3–5 years; IC4: ~5–8 years industry experience building distributed systems or cloud services.\n Computer Science or \n Strong coding in Python or Go (C++ a plus) and deep familiarity with networked systems and performance.\n Hands-on experience with Kubernetes at production scale, CI/CD, and observability stacks (Prometheus, Grafana, OpenTelemetry).\n Practical knowledge of inference internals: batching, caching, mixed precision (BF16/FP8), streaming token delivery.\n Proven track record improving tail latency (P95/P99) and service reliability through metrics-driven work.\n Bachelor’s or Master’s in CS, EE, or related field (or equivalent practical experience).\n \n Preferred: \n \n Contributions to inference frameworks (vLLM, Triton, TensorRT-LLM, Ray Serve, TorchServe).\n Experience with CUDA kernels, NCCL/SHARP, RDMA/NUMA, or GPU interconnect topologies.\n Leading multi-team initiatives or partnering with customers on mission-critical launches.\n \n Wondering if you’re a good fit? We believe in investing in our people and value candidates who can bring their diverse experiences to our teams – even if you aren't a 100% skill or experience match. \n Why CoreWeave? \n At CoreWeave, we work hard, have fun, and move fast!  We’re in an exciting stage of hyper-growth that you will not want to miss out on. We’re not afraid of a little chaos, and we’re constantly learning. Our team cares deeply about how we build our product and how we work together, which is represented through our core values: \n \n Be Curious at Your Core\n Act Like an Owner\n Empower Employees\n Deliver Best-in-Class Client Experiences\n Achieve More Together\n \n We support and encourage an entrepreneurial outlook and independent thinking. We foster an environment that encourages collaboration and provides the opportunity to develop innovative solutions to complex problems. As we get set for takeoff, the growth opportunities within the organization are constantly expanding. You will be surrounded by some of the best talent in the industry, who will want to learn from you, too. Come join us!  \n The base salary range for this role is $139,000 to $204,000. The starting salary will be determined based on job-related knowledge, skills, experience, and market location. We strive for both market alignment and internal equity when determining compensation. In addition to base salary, our total rewards package includes a discretionary bonus, equity awards, and a comprehensive benefits program (all based on eligibility).  \n What We Offer \n The range we’ve posted represents the typical compensation range for this role. To determine actual compensation, we review the market rate for each candidate which can include a variety of factors. These include qualifications, experience, interview performance, and location.\n In addition to a competitive salary, we offer a variety of benefits to support your needs, including:\n \n Medical, dental, and vision insurance - 100% paid for by CoreWeave\n Company-paid Life Insurance \n Voluntary supplemental life insurance \n Short and long-term disability insurance \n Flexible Spending Account\n Health Savings Account\n Tuition Reimbursement \n Ability to Participate in Employee Stock Purchase Program (ESPP)\n Mental Wellness Benefits through Spring Health \n Family-Forming support provided by Carrot","salary_min":139000,"salary_max":204000,"location":"Sunnyvale, CA","workplace":"hybrid","job_type":"full-time","experience_level":"senior","tags":["gpu","distributed-systems","llm","inference"],"apply_url":"https://coreweave.com/careers/job?4647603006\u0026board=coreweave\u0026gh_jid=4647603006","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-01-23T13:04:06Z","expires_at":"2026-06-29T14:04:52.917345Z","created_at":"2026-04-13T09:40:47.769209Z","updated_at":"2026-05-30T14:04:53.031088Z","company_name":"CoreWeave","company_slug":"coreweave","company_logo_url":"https://www.google.com/s2/favicons?domain=coreweave.com\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/14275ac7-838a-45c5-85ae-7c96258ff159"},{"id":"75b0cba8-fa54-4f3d-b490-a1f477112aee","company_id":"2114efab-ea67-411b-bfb8-7899153105f3","title":"Member of Technical Staff, Inference","slug":"member-of-technical-staff-inference-6510aaab","description":"Inferact's mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. Founded by the creators and core maintainers of vLLM, we sit at the intersection of models and hardware—a position that took years to build.\n\n\n\n\nABOUT THE ROLE\n\nWe're looking for an inference runtime engineer to push the boundaries of what's possible in LLM and diffusion model serving. Models grow larger. Architectures shift: mixture-of-experts, multimodal, agentic. Every breakthrough demands innovations on the inference engine itself. You'll work at the core of vLLM, optimizing how models execute across diverse hardware and architectures. Your work will directly impact how the world runs AI inference.\n\n\n\n\nSKILLS AND QUALIFICATIONS\n\nMinimum qualifications:\n\n - Bachelor's degree or equivalent experience in computer science, engineering, or similar.\n\n - Deep understanding of transformer architectures and their variants.\n\n - Strong programming skills in Python with experience in PyTorch internals.\n\n - Experience with LLM inference systems (vLLM, TensorRT-LLM, SGLang, TGI).\n\n - Ability to read and implement model architectures and inference techniques from research papers.\n\n - Demonstrate the ability to contribute performant and maintainable code and debug in complex ML codebases.\n\nPreferred qualifications:\n\n - Deep understanding of KV-cache memory management, prefix caching, and hybrid model serving.\n\n - Familiarity with RL frameworks and algorithms for LLMs.\n\n - Experience with multimodal inference (audio/image/video/text).\n\n - Contributions to open-source ML or system infrastructure projects.\n\nBonus points if you have:\n\n - Implemented core features in vLLM or other inference engine projects.\n\n - Contributed to vLLM integrations (verl, OpenRLHF, Unsloth, LlamaFactory, etc).\n\n - Written widely-shared technical blogs or side projects on vLLM or LLM inference.\n   \n   \n\n\nLOGISTICS\n\n - Location: This role is based in San Francisco, California. Will consider remote in the US for exceptional candidates.\n\n - Compensation: Depending on background, skills, and experience, the expected annual salary range for this position is $200,000 - $400,000 USD + equity.\n\n - Visa sponsorship: We sponsor visas on a case-by-case basis.\n\n - Benefits: Inferact offers generous health, dental, and vision benefits as well as 401(k) company match.","salary_min":200000,"salary_max":400000,"location":"San Francisco, CA","workplace":"hybrid","job_type":"full-time","experience_level":"lead","tags":["reinforcement-learning","agents","diffusion-models","pytorch","mlops","llm","research","inference"],"apply_url":"https://jobs.ashbyhq.com/inferact/9470565b-c62d-4de9-8b87-26d525ecec49/application","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-01-22T01:55:42.607Z","expires_at":"2026-06-29T14:10:49.864397Z","created_at":"2026-04-14T03:21:40.751222Z","updated_at":"2026-05-30T14:10:49.973493Z","company_name":"Inferact","company_slug":"inferact","company_logo_url":"https://www.google.com/s2/favicons?domain=inferact.ai\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/75b0cba8-fa54-4f3d-b490-a1f477112aee"},{"id":"6dff59d0-56c9-4472-bb7f-c584bcff4049","company_id":"1a3abe34-d1c1-45b9-9259-3e2e007a961c","title":"Senior Software Engineer, Inference Platform","slug":"senior-software-engineer-inference-platform-9a726f59","description":"About the Role\n We’re looking for a Senior Engineer to help build the next-generation inference platform that supports embedding models used for semantic search, retrieval, and AI-native experiences in MongoDB Atlas.\n You’ll join the broader Search and AI Platform organization and collaborate with ML researchers and engineers from our Voyage.ai acquisition. Together, we’re building infrastructure for real-time, low-latency, and high-scale inference — fully integrated with Atlas and designed for developer-first experiences.\n As a Senior Engineer, you'll focus on building core systems and services that power model inference at scale. You'll own key components of the infrastructure, work across teams to ensure tight integration with Atlas, and contribute to a platform designed for reliability, performance, and ease of use.\n We're looking to speak with candidates in Palo Alto for our hybrid working model.\n What You’ll Do\n \n Design and build components of a multi-tenant inference platform integrated directly with MongoDB Atlas, supporting semantic search and hybrid retrieval\n Collaborate with AI engineers and researchers to productionize inference for embedding models and rerankers — enabling both batch and real-time use cases\n Contribute to platform capabilities such as latency-aware routing, model versioning, health monitoring, and observability\n Improve performance, autoscaling, GPU utilization, and resource efficiency in a cloud-native environment\n Work across product, infrastructure, and ML teams to ensure the inference platform meets the scale, reliability, and latency demands of Atlas users\n Gain hands-on experience with tools like vLLM and container orchestration with Kubernetes\n \n Who You Are\n \n 5+ years of experience building backend or infrastructure systems at scale\n Strong software engineering skills in languages such as Go, Rust, Python, or C++, with an emphasis on performance and reliability\n Experienced in cloud-native architectures, distributed systems, and multi-tenant service design\n Familiar with concepts in ML model serving and inference runtimes, even if not directly deploying models\n Knowledge of vector search systems (e.g., Faiss, HNSW, ScaNN) is a plus\n Comfortable working across functional teams, including ML researchers, backend engineers, and platform teams\n Motivated to work on systems integrated into MongoDB Atlas and used by thousands of developers\n \n Nice to Have\n \n Experience integrating infrastructure with production ML workloads\n Understanding of hybrid retrieval, prompt-driven systems, or retrieval-augmented generation (RAG)\n Contributions to open-source infrastructure for ML serving or search\n \n Why Join Us\n \n Be part of building the AI foundation of the world’s most popular developer data platform\n Collaborate with ML researchers from Voyage.ai to bring novel ideas into scalable systems\n Tackle challenging problems in inference, observability, and distributed infrastructure\n Work in a culture that emphasizes mentorship, ownership, and technical excellence\n \n About MongoDB\n MongoDB is built for change, empowering our customers and our people to innovate at the speed of the market. We have redefined the database for the AI era, enabling innovators to create, transform, and disrupt industries with software. MongoDB’s unified database platform—the most widely available, globally distributed database on the market—helps organizations modernize legacy workloads, embrace innovation, and unleash AI. Our cloud-native platform, MongoDB Atlas, is the only globally distributed, multi-cloud database and is available across AWS, Google Cloud, and Microsoft Azure.\n With offices worldwide and nearly 60,000 customers—including 75% of the Fortune 100 and AI-native startups—relying on MongoDB for their most important applications, we’re powering the next era of software.\n Our compass at MongoDB is our Leadership Commitment, guiding how and why we make decisions, show up for each other, and win. It’s what makes us MongoDB. \n To drive the personal growth and business impact of our employees, we’re committed to developing a supportive and enriching culture for everyone. From employee affinity groups, to fertility assistance and a generous parental leave policy , we value our employees’ wellbeing and want to support them along every step of their professional and personal journeys. Learn more about what it’s like to work at MongoDB , and help us make an impact on the world!\n MongoDB is committed to providing any necessary accommodations for individuals with disabilities within our application and interview process. To request an accommodation due to a disability, please inform your recruiter.\n MongoDB, Inc. provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type and makes all hiring decisions without regard to race, color, religion, age, sex, national origin, disability status,","salary_min":126000,"salary_max":248000,"location":"Palo Alto, CA","workplace":"hybrid","job_type":"full-time","experience_level":"senior","tags":["llm","embeddings","mlops","distributed-systems","rag","search","inference"],"apply_url":"https://www.mongodb.com/careers/job/?gh_jid=7467701","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-01-07T16:39:23Z","expires_at":"2026-06-29T14:08:48.453122Z","created_at":"2026-04-13T11:48:36.803516Z","updated_at":"2026-05-30T14:08:48.564853Z","company_name":"MongoDB","company_slug":"mongodb","company_logo_url":"https://www.google.com/s2/favicons?domain=www.mongodb.com\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/6dff59d0-56c9-4472-bb7f-c584bcff4049"},{"id":"7a53590f-003d-47e6-b352-b90d5c7d10ce","company_id":"6ce2d21e-b00f-4343-9bd0-5ac62ff81431","title":"Software Engineer, Bulk/Interactive Inference","slug":"software-engineer-bulkinteractive-inference-a684dbfb","description":"Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building the Waymo Driver—The World's Most Experienced Driver™—to improve access to mobility while saving thousands of lives now lost to traffic crashes. The Waymo Driver powers Waymo’s fully autonomous ride-hail service and can also be applied to a range of vehicle platforms and product use cases. The Waymo Driver has provided over ten million rider-only trips, enabled by its experience autonomously driving over 100 million miles on public roads and tens of billions in simulation across 15+ U.S. states.\n The ML Ops team, part of Waymo ML Platform team, builds tools and infrastructure to realize the ML flywheel at Waymo. This includes building automation and orchestration solutions to make complex ML workflows manageable and reliable. This team also partners closely with the modeling team to realize solutions to speed up developer velocity.\n We’re looking for a software engineer to join the team to build and maintain the critical data and ML pipelines that powers ML development at Waymo.\n In this hybrid role, you will report to the Head of ML Platform- Senior Staff Software Engineer. \n  \n You will: \n \n Develop Waymo's inference platform to make it scalable, high throughput, and low latency\n Work closely with other teams across Waymo in hosting both internal and external ML models, including LLMs\n Improving the efficiency of running inference on these large models to increase throughput and save cost\n Deploy and integrate model inference solutions across a variety of use cases, such as distillation, eval, dataset generation, active learning, and auto-labeling\n \n  \n You have: \n \n 2+ years of professional experience in the field of software engineering\n Experience in programming C++\n Experience with building highly scalable distributed system\n \n  \n We prefer: \n \n Passionate about building internal infra and tools\n Experience with building model hosting and inference solutions\n Experience with handling datasets in the order of exabytes\n \n #LI-Hybrid\n The expected base salary range for this full-time position across US locations is listed below. Actual starting pay will be based on job-related factors, including exact work location, experience, relevant training and education, and skill level. Your recruiter can share more about the specific salary range for the role location or, if the role can be performed remote, the specific salary range for your preferred location, during the hiring process.  \n Waymo employees are also eligible to participate in Waymo’s discretionary annual bonus program, equity incentive plan, and generous Company benefits program, subject to eligibility requirements.  \n Salary Range\n $170,000 — $216,000 USD","salary_min":170000,"salary_max":216000,"location":"Mountain View, CA","workplace":"onsite","job_type":"full-time","experience_level":"junior","tags":["llm","autonomous-vehicles","mlops","distributed-systems","inference"],"apply_url":"https://careers.withwaymo.com/jobs?gh_jid=7466529","is_featured":false,"is_sticky":false,"status":"active","published_at":"2025-12-16T20:47:12Z","expires_at":"2026-06-29T14:04:29.648753Z","created_at":"2026-04-13T09:40:19.856895Z","updated_at":"2026-05-30T14:04:29.769287Z","company_name":"Waymo","company_slug":"waymo","company_logo_url":"https://www.google.com/s2/favicons?domain=waymo.com\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/7a53590f-003d-47e6-b352-b90d5c7d10ce"},{"id":"de989320-cf2e-4e5f-842c-b3984fe6a551","company_id":"1f4520df-9fc1-4ace-a80b-6c3266f03e8a","title":"Research Engineer, Infrastructure, Inference","slug":"research-engineer-infrastructure-inference-aa43fe56","description":"Thinking Machines Lab's mission is to empower humanity through advancing collaborative general intelligence. We're building a future where everyone has access to the knowledge and tools to make AI work for their unique needs and goals. \n We are scientists, engineers, and builders who’ve created some of the most widely used AI products, including ChatGPT and Character.ai, open-weights models like Mistral, as well as popular open source projects like PyTorch, OpenAI Gym, Fairseq, and Segment Anything.\n About the Role\n We’re looking for an infrastructure research engineer to design, optimize, and scale the systems that power large AI models. Your work will make inference faster, more cost-effective, more reliable, and more reproducible to enable our teams to focus on advancing model capabilities rather than managing bottlenecks.\n Our focus is on performant and efficient model inference both to power real-world applications and to accelerate research. This role is responsible for the infrastructure that ensures every experiment, evaluation, and deployment runs smoothly at scale.\n Note: This is an \"evergreen role\" that we keep open on an on-going basis to express interest. We receive many applications, and there may not always be an immediate role that aligns perfectly with your experience and skills. Still, we encourage you to apply. We continuously review applications and reach out to applicants as new opportunities open. You are welcome to reapply if you get more experience, but please avoid applying more than once every 6 months. You may also find that we put up postings for singular roles for separate, project or team specific needs. In those cases, you're welcome to apply directly in addition to an evergreen role. \n What You’ll Do\n \n Work alongside researchers and engineers to bring cutting-edge AI models into production.\n Collaborate with research teams to enable high-performance inference for novel architectures.\n Design and implement new techniques, tools, and architectures that improve performance, latency, throughput, and efficiency.\n Optimize our codebase and compute fleet (e.g., GPUs) to fully utilize hardware FLOPs, bandwidth, and memory.\n Extend orchestration frameworks (e.g., Kubernetes, Ray, SLURM) for distributed inference, evaluation, and large-batch serving.\n Establish standards for reliability, observability, and reproducibility across the inference stack.\n Publish and share learnings through internal documentation, open-source libraries, or technical reports that advance the field of scalable AI infrastructure. \n \n Skills and Qualifications\n Minimum qualifications:\n \n Bachelor’s degree or equivalent experience in computer science, engineering, or similar.\n Understanding of deep learning frameworks (e.g., PyTorch, JAX) and their underlying system architectures.\n Experience with inference serving systems optimized for throughput and latency (e.g., SGLang, vLLM).\n Thrive in a highly collaborative environment involving many, different cross-functional partners and subject matter experts.\n A bias for action with a mindset to take initiative to work across different stacks and different teams where you spot the opportunity to make sure something ships.\n Strong engineering skills, ability to contribute performant, maintainable code and debug in complex codebases\n \n Preferred qualifications — we encourage you to apply if you meet some but not all of these:\n \n Experience training or supporting large-scale language models with hundreds of billions of parameters or more.\n Understanding of distributed compute systems, GPU parallelism, and hardware-aware optimizations.\n Contributions to open-source ML or systems infrastructure projects (e.g., SGLang, vLLM, PyTorch, Triton, DeepSpeed, XLA).\n Track record of improving research productivity through infrastructure design or process improvements.\n \n Logistics\n \n Location: This role is based in San Francisco, California. \n Compensation: Depending on background, skills and experience, the expected annual salary range for this position is $350,000 - $475,000 USD.\n Visa sponsorship: We sponsor visas. While we can't guarantee success for every candidate or role, if you're the right fit, we're committed to working through the visa process together.\n Benefits: Thinking Machines offers generous health, dental, and vision benefits, unlimited PTO, paid parental leave, and relocation support as needed.\n As set forth in Thinking Machines' Equal Employment Opportunity policy, we do not discriminate on the basis of any protected group status under any applicable law. \n Thinking Machines Lab will consider for employment qualified applicants with criminal histories in a manner consistent with the requirements of the California Fair Chance Act, the San Francisco Fair Chance Ordinance, and any other applicable state or local fair chance ordinance or law.","salary_min":350000,"salary_max":475000,"location":"San Francisco, CA","workplace":"onsite","job_type":"full-time","experience_level":"principal","tags":["deep-learning","search","pytorch","llm","gpu","research","infrastructure","inference"],"apply_url":"https://job-boards.greenhouse.io/thinkingmachines/jobs/5013924008","is_featured":false,"is_sticky":false,"status":"active","published_at":"2025-11-27T18:55:52Z","expires_at":"2026-06-29T14:17:15.723973Z","created_at":"2026-04-17T00:25:56.678391Z","updated_at":"2026-05-30T14:17:15.835042Z","company_name":"Thinking Machines","company_slug":"thinking-machines","company_logo_url":"https://www.google.com/s2/favicons?domain=thinkingmachin.es\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/de989320-cf2e-4e5f-842c-b3984fe6a551"},{"id":"2e26d7cc-cdbf-4c16-8062-ad43837688d9","company_id":"6ce2d21e-b00f-4343-9bd0-5ac62ff81431","title":"Software Engineer, ML Inference, Simulation Infrastructure","slug":"software-engineer-ml-inference-simulation-infrastructure-e57c9c8d","description":"Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building the Waymo Driver—The World's Most Experienced Driver™—to improve access to mobility while saving thousands of lives now lost to traffic crashes. The Waymo Driver powers Waymo’s fully autonomous ride-hail service and can also be applied to a range of vehicle platforms and product use cases. The Waymo Driver has provided over ten million rider-only trips, enabled by its experience autonomously driving over 100 million miles on public roads and tens of billions in simulation across 15+ U.S. states.\n The Simulation Infrastructure team creates reliable, scalable, and cost-effective Simulation-based products that evaluate the Waymo Driver's software stack at a massive scale. We solve complex technical challenges to build services and tools for a broad range of customers Software Engineers, Product, Data Science, System Engineering, and more. So if you want to build the next generation of Simulation products and infrastructure, we'd love to hear from you!\n In this hybrid role you will report to the Software Engineering Manager. \n  \n You will: \n \n Build and evolve ML inference infrastructure for simulations.\n Be responsible for the reliability, latency, and user experience of ML model deployment and serving.\n \n  \n You have: \n \n B.Sc. in Computer Science, or a related field, or equivalent years of experience\n 3+ years of experience C++ and/or Golang programming experience\n Experience in developing and maintaining distributed systems.\n \n  \n We prefer: \n \n Experience working with large-scale distributed inference service.\n Experience working with popular ML frameworks, TPUs and optimizing models for serving.\n Experience with distributed systems principles, including scheduling, load balancing, and fault tolerance.\n Experience working with large FAANG scale distributed systems.\n \n  \n #LI-Hybrid\n The expected base salary range for this full-time position across US locations is listed below. Actual starting pay will be based on job-related factors, including exact work location, experience, relevant training and education, and skill level. Your recruiter can share more about the specific salary range for the role location or, if the role can be performed remote, the specific salary range for your preferred location, during the hiring process.  \n Waymo employees are also eligible to participate in Waymo’s discretionary annual bonus program, equity incentive plan, and generous Company benefits program, subject to eligibility requirements.  \n Salary Range\n $170,000 — $216,000 USD","salary_min":170000,"salary_max":216000,"location":"Mountain View, CA","workplace":"onsite","job_type":"full-time","experience_level":"mid","tags":["distributed-systems","autonomous-vehicles","mlops","inference","infrastructure"],"apply_url":"https://careers.withwaymo.com/jobs?gh_jid=7353876","is_featured":false,"is_sticky":false,"status":"active","published_at":"2025-11-03T18:08:23Z","expires_at":"2026-06-29T14:04:30.150756Z","created_at":"2026-05-27T14:04:39.525133Z","updated_at":"2026-05-30T14:04:30.25868Z","company_name":"Waymo","company_slug":"waymo","company_logo_url":"https://www.google.com/s2/favicons?domain=waymo.com\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/2e26d7cc-cdbf-4c16-8062-ad43837688d9"},{"id":"e6182bf4-3b0b-4c06-8a62-fac6a5e30529","company_id":"31ae48bc-c938-4c26-a348-0bf3c089a446","title":"Software Engineer, Inference AI/ML","slug":"software-engineer-inference-aiml-1e117aa6","description":"CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at  www.coreweave.com . \n What You’ll Do: \n Join the Inference team to ship production features that improve latency, reliability, and cost for model serving on our GPU platform. As an IC1, you’ll implement well-scoped changes, learn our operational practices, and grow quickly with mentorship from experienced engineers.\n About the role: \n \n Implement well-scoped features and fixes in Python/Go/C++ for model-serving services (e.g., Triton, vLLM, TensorRT-LLM, Ray Serve).\n Write tests, code comments, and short design docs; participate in code reviews.\n Add basic metrics and dashboards; assist with alarms and runbooks.\n Follow on-call runbooks and learn incident response in a guided rotation.\n Contribute to performance experiments (e.g., request batching, concurrency, caching) with guidance.\n \n Who You Are: \n \n BS/MS in CS, EE, or related field, or equivalent practical experience.\n Foundations in data structures, algorithms, and networked services. Experience with Python or Go (C++ a plus) and Linux fundamentals; Git/CI basics. Exposure to containers and Kubernetes (coursework or projects welcome). Curiosity about GPU inference concepts (micro-batching, KV cache, streaming). \n \n Preferred: \n \n Internship or project that deployed a microservice or ML inference demo.\n Coursework/research with PyTorch or TensorFlow; simple CUDA projects a plus.\n Familiarity with Grafana/Prometheus/OpenTelemetry or similar tooling.\n \n Why CoreWeave? \n At CoreWeave, we work hard, have fun, and move fast!  We’re in an exciting stage of hyper-growth that you will not want to miss out on. We’re not afraid of a little chaos, and we’re constantly learning. Our team cares deeply about how we build our product and how we work together, which is represented through our core values: \n \n Be Curious at Your Core\n Act Like an Owner\n Empower Employees\n Deliver Best-in-Class Client Experiences\n Achieve More Together\n \n We support and encourage an entrepreneurial outlook and independent thinking. We foster an environment that encourages collaboration and provides the opportunity to develop innovative solutions to complex problems. As we get set for take off, the growth opportunities within the organization are constantly expanding. You will be surrounded by some of the best talent in the industry, who will want to learn from you, too. Come join us!  \n  \n The base salary range for this role is $92,000 to $135,000. The starting salary will be determined based on job-related knowledge, skills, experience, and market location. We strive for both market alignment and internal equity when determining compensation. In addition to base salary, our total rewards package includes a discretionary bonus, equity awards, and a comprehensive benefits program (all based on eligibility). \n What We Offer \n The range we’ve posted represents the typical compensation range for this role. To determine actual compensation, we review the market rate for each candidate which can include a variety of factors. These include qualifications, experience, interview performance, and location.\n In addition to a competitive salary, we offer a variety of benefits to support your needs, including:\n \n Medical, dental, and vision insurance - 100% paid for by CoreWeave\n Company-paid Life Insurance \n Voluntary supplemental life insurance \n Short and long-term disability insurance \n Flexible Spending Account\n Health Savings Account\n Tuition Reimbursement \n Ability to Participate in Employee Stock Purchase Program (ESPP)\n Mental Wellness Benefits through Spring Health \n Family-Forming support provided by Carrot\n Paid Parental Leave \n Flexible, full-service childcare support with Kinside\n 401(k) with a generous employer match\n Flexible PTO\n Catered lunch each day in our office and data center locations\n A casual work environment\n A work culture focused on innovative disruption\n \n Our Workplace \n While we prioritize a hybrid work environment, remote work may be considered for candidates located more than 30 miles from an office, based on role requirements for specialized skill sets. New hires will be invited to attend onboarding at one of our hubs within their first month. Teams also gather quarterly to support collaboration.\n California Consumer Privacy Act - California applicants only\n CoreWeave is an equal opportunity employer, committed to fostering an inclusive and supportive workplace. All qualified applicants and candidates will receive consideration for em","salary_min":92000,"salary_max":135000,"location":"Sunnyvale, CA","workplace":"hybrid","job_type":"full-time","experience_level":"mid","tags":["mlops","microservices","pytorch","llm","tensorflow","gpu","inference"],"apply_url":"https://coreweave.com/careers/job?4609928006\u0026board=coreweave\u0026gh_jid=4609928006","is_featured":false,"is_sticky":false,"status":"active","published_at":"2025-10-24T18:18:12Z","expires_at":"2026-06-29T14:04:53.076863Z","created_at":"2026-04-13T09:40:47.850529Z","updated_at":"2026-05-30T14:04:53.187129Z","company_name":"CoreWeave","company_slug":"coreweave","company_logo_url":"https://www.google.com/s2/favicons?domain=coreweave.com\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/e6182bf4-3b0b-4c06-8a62-fac6a5e30529"},{"id":"b6fd1e09-2de3-47af-ae22-2960a75b6466","company_id":"a0000000-0000-0000-0000-000000000001","title":"Staff + Sr. Software Engineer, Inference","slug":"staff-senior-software-engineer-inference-ddffe87d","description":"About Anthropic \n Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.\n \n \n About the role\n \n Our Inference team is responsible for building and maintaining the critical systems that serve Claude to millions of users worldwide. We bring Claude to life by serving our models via the industry's largest compute-agnostic inference deployments.  We are responsible for the entire stack from intelligent request routing to fleet-wide orchestration across diverse AI accelerators.\nThe team has a dual mandate: maximizing compute efficiency to serve our explosive customer growth, while enabling breakthrough research by giving our scientists the high-performance inference infrastructure they need to develop next-generation models. We tackle complex, distributed systems challenges across multiple accelerator families and emerging AI hardware running in multiple cloud platforms.\n You may be a good fit if you:\n \n Have significant software engineering experience, particularly with distributed systems\n Are results-oriented, with a bias towards flexibility and impact\n Pick up slack, even if it goes outside your job description\n Enjoy pair programming (we love to pair!)\n Want to learn more about machine learning systems and infrastructure\n Thrive in environments where technical excellence directly drives both business results and research breakthroughs\n Care about the societal impacts of your work\n \n Strong candidates may also have experience with:\n \n High-performance, large-scale distributed systems\n Implementing and deploying machine learning systems at scale\n Load balancing, request routing, or traffic management systems\n LLM inference optimization, batching, and caching strategies\n Kubernetes and cloud infrastructure (AWS, GCP, Azure)\n Python or Rust\n \n Representative projects: \n \n Designing intelligent routing algorithms that optimize request distribution across thousands of accelerators\n Autoscaling our compute fleet to dynamically match supply with demand across production, research, and experimental workloads\n Building production-grade deployment pipelines for releasing new models to millions of users\n Integrating new AI accelerator platforms to maintain our hardware-agnostic competitive advantage\n Contributing to new inference features (e.g., structured sampling, prompt caching)\n Supporting inference for new model architectures\n Analyzing observability data to tune performance based on real-world production workloads\n Managing multi-region deployments and geographic routing for global customers\n \n Deadline to apply:  None. Applications will be reviewed on a rolling basis. \n The annual compensation range for this role is listed below. \n For sales roles, the range provided is the role’s On Target Earnings (\"OTE\") range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role.\n Annual Salary:\n $300,000 — $485,000 USD \n Logistics \n Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience\n Required field of study:  A field relevant to the role as demonstrated through coursework, training, or professional experience\n Minimum years of experience: Years of experience required will correlate with the internal job level requirements for the position\n Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.\n Visa sponsorship:  We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.\n We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed.  Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We think AI systems like the ones we're building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team. Your safety matters to us. To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious ","salary_min":300000,"salary_max":485000,"location":"San Francisco, CA","workplace":"hybrid","job_type":"full-time","experience_level":"lead","tags":["distributed-systems","cloud","alignment","llm","inference","infrastructure"],"apply_url":"https://job-boards.greenhouse.io/anthropic/jobs/4951696008","is_featured":false,"is_sticky":false,"status":"active","published_at":"2025-10-10T15:42:47Z","expires_at":"2026-06-29T14:00:28.270535Z","created_at":"2026-04-13T09:36:06.173236Z","updated_at":"2026-05-30T14:00:28.382614Z","company_name":"Anthropic","company_slug":"anthropic","company_logo_url":"https://www.google.com/s2/favicons?domain=anthropic.com\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/b6fd1e09-2de3-47af-ae22-2960a75b6466"}],"page":1,"per_page":20,"total":82,"total_pages":5}
