Staff Software Engineer, Machine Learning Platform

Stripe · San Francisco, CA

full-time lead Posted 3 weeks ago

Apply Now Stand out: build a proof-of-work pitch →

Free GitHub-based preview. Direct apply stays one click away.

Get weekly job alerts like this →

Hiring for this role?

agents rag payments data-pipeline fine-tuning mlops llm distributed-systems

About this role

Who we are About Stripe Stripe is a financial infrastructure platform for businesses. Millions of companies—from the world's largest enterprises to the most ambitious startups—use Stripe to accept payments, grow their revenue, and accelerate new business opportunities. Our mission is to increase the GDP of the internet, and we have a staggering amount of work ahead. That means you have an unprecedented opportunity to put the global economy within everyone's reach while doing the most important work of your career. About the team Stripe processes over $1.9T in payments volume per year, which is roughly 1.6% of the world's GDP, for millions of customers from startups to enterprises. The tremendous amount of data makes Stripe one of the best places to do machine learning. While being an integral part of almost every product line at Stripe (e.g., Payments, Radar, Capital, Billing, etc.), we have lots of exciting opportunities to innovate in ML Platform at Stripe. The ML Platform team builds the platforms and services that enable ML engineers and data scientists across Stripe to take data and build features and models from prototype to production—reliably, at low latency, and at scale. Our scope spans ML training infrastructure, model serving and deployment, feature computation and online serving, observability and monitoring, and agentic AI capabilities. We work closely with product teams, data scientists, and platform infrastructure teams to build powerful, flexible, and user-friendly systems that substantially increase ML velocity across the company. What you'll do You'll serve as a technical lead across the ML Platform space and a key contributor to the evolution of the platforms that power Stripe's ML-driven products. As a Staff Engineer, you'll make decisions with a large impact on Stripe. You'll influence our investments and strategy while making our systems more reliable, secure, and a delight to use. You'll work cross-functionally with other technical staff, data science, product, and senior leadership to increase the impact of ML at Stripe. You'll help define the long-term strategy and lead the technical direction for the next generation of ML infrastructure that powers Stripe's ML-driven products. Responsibilities Take ownership of end-to-end architecture and system design for large, complex projects across ML Platform. Define technical direction for highly ambiguous projects, transforming complex user needs into long-lasting platform strategy. Design system architectures for the most challenging ML Platform problems in one or more areas, including AI and ML workflow orchestration, scalable CPU and GPU compute infrastructure, model training, LLM fine-tuning, low-latency model inference, large-scale feature stores, real-time monitoring, and LLM and agent orchestration. Turn high-leverage ideas into tangible, robust solutions that shape platform and product roadmap, combining technical excellence with creative problem-solving. Scope and lead large projects with significant business impact, driving them from requirements through design, implementation, and production operation. Work with ML engineers, data scientists, and product teams directly to translate their needs into functional requirements and scalable technical solutions. Arbitrate critical decisions that balance competing priorities while meeting latency, reliability, cost, and security constraints. Serve as a key engineering representative, engaging senior leaders across Stripe and advising the leadership team on key technical considerations related to the end-to-end ML lifecycle. Drive cross-team technical initiatives that improve ML development velocity and MLOps maturity across the company. Mentor and grow other engineers. Serve as a role model for designing, implementing, and operating great software systems. Who you are We're looking for someone who meets the minimum requirements to be considered for the role. If you meet these requirements, you are encouraged to apply. The preferred qualifications are a bonus, not a requirement. Minimum requirements 10+ years of professional software development experience, or equivalent domain expertise, with a solid background in service-oriented architecture and large-scale distributed systems. Track record of serving as a technical lead, with the ability to provide technical direction, lead multi-team initiatives, and mentor team members. Experience building and operating production ML platform in one or more areas such as model training, model serving, orchestration, or ML data systems, with requirements for performance, reliability, scalability, and cost efficiency. Strong product instincts and a deep understanding of the business context in which you operate. Strong communication skills with the ability to explain complex technical concepts to both technical and non-technical stakeholders. Demonstrated ability to work cross-functionally, collaborating effect