Solution Specialist, AI Runtime Services

CoreWeave · San Francisco, CA · $207k - $275k

full-time lead Posted 1 day ago

Apply Now Stand out: build a proof-of-work pitch →

Free GitHub-based preview. Direct apply stays one click away.

Get weekly job alerts like this →

Hiring for this role?

llm distributed-systems gpu generative-ai mlops

About this role

CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at www.coreweave.com . What You'll Do: As CoreWeave turns raw GPU capacity into production-grade AI services, it is launching new ways for customers to run, scale, and serve AI workloads, and those new services need a market-maker. As a Solution Specialist for AI Runtime Services, you open new market opportunities across the execution layer (high-throughput, low-latency model serving through our Inference platform, and secure, isolated execution through Sandboxes) and drive the initial adoption of these offerings with the earliest customers and industries to need them. You channel what you learn into the product roadmap and make the broader sales and solution architecture organization fluent in the value runtime services create for new customers. About the role: As a Solution Specialist for AI Runtime Services, you work at the leading edge of how CoreWeave brings new runtime offerings to market. Rather than running an existing motion, you create one: you take newly launched services like Inference and Sandboxes into new accounts and new industries, prove their value with the first wave of customers operationalizing AI at scale, and establish the playbooks sales and solution architects use to repeat those wins. You are the field's authoritative voice back to engineering, translating what early adopters need around serving frameworks, batching, and execution isolation into the priorities that shape the AI Runtime Services roadmap. In this role, you will: Own the commercial and technical strategy for net new customer wins in AI runtime infrastructure, where execution performance, deployment flexibility, and operational reliability are the primary buying triggers. Drive new business opportunities where inference latency, throughput bottlenecks, workload isolation requirements, or operational complexity are barriers to scaling AI on CoreWeave. Develop deep expertise across the AI runtime landscape (model serving architectures, execution scheduling, containerized AI workloads, and secure multi-tenant compute), using CoreWeave's Inference and Sandboxes products as flagship examples of what best-in-class runtime looks like. Translate customer requirements around serving frameworks (e.g., vLLM, TensorRT-LLM, TGI), batching strategies, and execution isolation into specific product feedback that shapes the AI Runtime Services roadmap. Develop deal structures, technical playbooks, and benchmark narratives that help sales and SA teams accelerate runtime-sensitive opportunities across the full spectrum of AI deployment patterns. Engage directly with enterprise and research buyers as the authoritative voice on runtime performance tradeoffs, cost-per-token economics, and the architectural decisions that separate prototype deployments from production-scale AI systems. Design the commercial framework for large-scale runtime deployment deals, including throughput modeling, GPU utilization commitments, and SLA structures that support enterprise closings. Partner with product and infrastructure teams to maintain a competitive edge on serving efficiency, execution isolation, and operational reliability across active and prospective customer deployments. Who You Are: 10+ years of experience in distributed systems, ML infrastructure, or production AI engineering, with a track record of applying that expertise to drive customer outcomes and revenue. 5+ years working with AI runtime systems (model serving, inference optimization, containerized workload execution, or real-time ML pipelines) in a customer-facing or deal-shaping capacity. Deep working knowledge of how AI workloads execute at runtime: serving frameworks, batching strategies, GPU memory management, and the performance levers that determine throughput and latency at scale (with specific familiarity with products like vLLM, TensorRT-LLM, or Triton). Experience with sandboxed and isolated execution environments (microVM architectures, container runtimes, secure multi-tenant scheduling) and how execution isolation requirements shape platform selection decisions. Strong understanding of GPU memory hierarchies, model parallelism strategies, and how runtime architecture decisions translate into cost, latency, and scalability outcomes for enterprise customers. Familiarity with Kubernetes-native runtime orchestration (autoscaling, scheduling policies, GPU operators) and how it impacts workload portability, operational complexity, and platform st