Application Software Engineer, Inference

SpaceX · Palo Alto, CA · $155k - $185k
full-time junior Posted 1 week ago
Apply Now Stand out: build a proof-of-work pitch →

Free GitHub-based preview. Direct apply stays one click away.

Get weekly job alerts like this →

Hiring for this role?

About this role

SpaceX was founded under the belief that a future where humanity is out exploring the stars is fundamentally more exciting than one where we are not. Today SpaceX is actively developing the technologies to make this possible, with the ultimate goal of enabling human life on Mars. APPLICATION SOFTWARE ENGINEER, INFERENCE The application software team is the central nervous system of SpaceX – we create mission critical applications that are used throughout SpaceX to accelerate launch vehicle production and flight as well as systems that allow Starlink to grow into a worldwide fast, reliable Internet service. We are looking for engineers who treat fellow teammates with fairness, respect, and support. Our team maintains a high-performance AI inference platform that serves the best models internally at SpaceX to accelerate our most ambitious engineering goals. As part of this effort in Palo Alto, you will design and optimize large-scale model serving systems end-to-end, owning everything from distributed infrastructure to deep low-level optimizations. You will work on systems that deliver reliable, high-throughput inference to power SpaceX’s mission-critical applications while maintaining the highest standards of performance and availability. Aerospace experience is not required to be successful here - rather we look for smart, motivated, respectful, collaborative engineers who love solving problems and want to make an impact on a super inspiring mission. You will have full ownership of challenging problems, working with a team of enthusiastic engineers with diverse perspectives to design and produce solutions that enable SpaceX to achieve its loftiest engineering goals at a rapid pace. The success of the missions at SpaceX depends on the software that you and your team produce. This role will report through SpaceX Application Software while also working closely with xAI engineering teams.  RESPONSIBILITIES: Develop highly reliable, high-throughput inference systems that serve the best AI models internally across SpaceX Architect and implement scalable distributed infrastructure for model serving, including load balancing, auto-scaling, batch scheduling, global KV cache, and continuous batching   Optimize latency and throughput of model inference under real production workloads, including low-level GPU kernel work, quantization, speculative decoding, and other acceleration techniques   Build reliable, high-concurrency serving systems with 100% uptime, low tail latency, and excellent observability   Own end-to-end components such as request routing, SDK development, rate limiting, and efficient scaling for internal SpaceX AI inference platforms   Benchmark, fine-tune, and accelerate inference engines (e.g., SGLang, vLLM, TensorRT-LLM)   Develop custom tools for tracing, replaying, and resolving issues across the full stack — from orchestration down to GPU kernels   Create robust CI/CD infrastructure for seamless endpoint deployment, image publishing, and inference engine updates   Collaborate across SpaceXAI   teams to integrate inference capabilities into broader systems and workflows   BASIC QUALIFICATIONS: Bachelor's degree in computer science, engineering, math, or scientific discipline; OR 2+ years of professional experience building software in lieu of a degree Experience in designing, implementing, and maintaining reliable and horizontally scalable distributed systems 1+ years of experience in full stack development or backend development with production systems 1+ years of experience with Rust or C++ PREFERRED SKILLS AND EXPERIENCE: Experience with LLM inference engines and serving frameworks (e.g., SGLang, vLLM, Triton, TensorRT-LLM)   Deep low-level systems programming and optimizations: GPU kernels, code generation, batching, caching, parallelism, quantization, and speculative decoding   Experience with large-scale, high-concurrency production serving systems   Knowledge of service observability and reliability best practices   Experience operating commonly used databases such as PostgreSQL, ClickHouse, or MongoDB   Experience designing or building with agent SDKs and agent orchestration frameworks   Experience with Docker, Kubernetes, and containerized applications   Expert knowledge of gRPC (unary, response streaming, bi-directional streaming, REST mapping)   Programming experience in Python, Go, or similar languages   Experience with version control, continuous integration, continuous delivery, build systems, and monitoring   Expertise in profiling and improving application performance   ADDITIONAL REQUIREMENTS: You may be asked to work extended hours/weekends dependent on launch cadence and platform demands   This role requires you to be onsite in Palo Alto. Remote and/or hybrid work will not be considered   COMPENSATION AND BENEFITS:   Pay Range: Software Engineer/Level I: $135,000.00 - $16

Similar Jobs

Related searches:

Hybrid Jobs Junior Jobs Hybrid Junior Jobs Junior Machine LearningJunior AI InfrastructureJunior Backend & SystemsJunior NLP & Language AIJunior AI Agents & RAG AI Jobs in Palo Alto Machine Learning in Palo AltoAI Infrastructure in Palo AltoBackend & Systems in Palo AltoNLP & Language AI in Palo AltoAI Agents & RAG in Palo Alto agentsapi-designmlopsllmdistributed-systemsgpucode-generationinference

Get jobs like this delivered weekly

Free AI jobs newsletter. No spam.