Staff Software Engineer, Inference
full-time
lead
Posted 5 days ago
About this role
CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at www.coreweave.com .
What You’ll Do:
Inference Platform Team The Inference team builds and operates CoreWeave’s Kubernetes-native inference platform, powering low-latency, high-throughput AI workloads at massive scale. The team is responsible for request routing, scheduling, GPU resource management, and system-wide optimizations that drive performance, efficiency, and reliability across real-time inference systems.
About the role: As a Staff Software Engineer (IC5) on the Inference team, you will act as a technical leader driving architecture, performance, and reliability across multiple services and teams. Your day-to-day will involve leading cross-team design initiatives, optimizing inference performance (latency, throughput, and GPU utilization), and improving system reliability at scale. You will work deeply in distributed systems and Kubernetes-based infrastructure, focusing on areas like scheduling, batching, and memory optimization. This role requires hands-on technical leadership and the ability to influence engineering direction across the organization.
Who You Are:
8–12+ years of experience building and operating large-scale distributed systems or cloud platforms
Proven experience leading cross-team technical initiatives impacting multiple services or organizations
Strong programming skills in Go, Python, or C++
Deep expertise in Kubernetes at production scale, including orchestration, scheduling, and service design
Strong understanding of distributed systems, networking, and performance optimization
Experience designing and operating low-latency, high-throughput systems with strict P95/P99 latency requirements
Hands-on experience with inference systems, including batching or micro-batching strategies, caching, and memory optimization
Experience improving system performance using metrics-driven approaches (e.g., latency, throughput, utilization)
Familiarity with mixed precision (BF16, FP8) and streaming inference workloads
Preferred:
Experience with inference frameworks such as vLLM, Triton, TensorRT-LLM, Ray Serve, or TorchServe
Experience with GPU systems and performance optimization (CUDA, NCCL, RDMA, NUMA, GPU interconnects)
Experience leading multi-team or org-level technical initiatives
Exposure to large-scale AI/ML infrastructure or hyperscale cloud environments
Wondering if you’re a good fit? We believe in investing in our people, and value candidates who can bring their own diversified experiences to our teams – even if you aren't a 100% skill or experience match. Here are a few qualities we’ve found compatible with our team. If some of this describes you, we’d love to talk.
You love to design and optimize high-performance distributed systems at scale
You’re curious about AI inference, GPU systems, and emerging performance techniques
You’re an expert in building reliable, low-latency infrastructure and driving system-wide improvements
Why CoreWeave? At CoreWeave, we work hard, have fun, and move fast! We’re in an exciting stage of hyper-growth that you will not want to miss out on. We’re not afraid of a little chaos, and we’re constantly learning. Our team cares deeply about how we build our product and how we work together, which is represented through our core values:
Be Curious at Your Core
Act Like an Owner
Empower Employees
Deliver Best-in-Class Client Experiences
Achieve More Together
We support and encourage an entrepreneurial outlook and independent thinking. We foster an environment that encourages collaboration and enables the development of innovative solutions to complex problems. As we get set for takeoff, the organization's growth opportunities are constantly expanding. You will be surrounded by some of the best talent in the industry, who will want to learn from you, too. Come join us!
The base salary range for this role is $188,000 to $275,000. The starting salary will be determined based on job-related knowledge, skills, experience, and market location. We strive for both market alignment and internal equity when determining compensation. In addition to base salary, our total rewards package includes a discretionary bonus, equity awards, and a comprehensive benefits program (all based on eligibility).
What We Offer
The range we’ve posted represents the typical compensation range for this role. To determine actual compensation, we review the market rate for each candidate which can include
Similar Jobs
Related searches:
Hybrid Jobs
Lead Jobs
Hybrid Lead Jobs
Lead NLP & Language AILead AI InfrastructureLead Machine LearningLead Backend & Systems
AI Jobs in Sunnyvale
NLP & Language AI in SunnyvaleAI Infrastructure in SunnyvaleMachine Learning in SunnyvaleBackend & Systems in Sunnyvale
llmdistributed-systemsgpuinference
Get jobs like this delivered weekly
Free AI jobs newsletter. No spam.