Inference Engineer
full-time
mid
Posted 1 year ago
About this role
ABOUT CARTESIA
Our mission is to build the next generation of AI: ubiquitous, interactive intelligence that runs wherever you are. Today, not even the best models can continuously process and reason over a year-long stream of audio, video and text—1B text tokens, 10B audio tokens and 1T video tokens—let alone do this on-device.
We're pioneering the model architectures that will make this possible. Our founding team met as PhDs at the Stanford AI Lab, where we invented State Space Models or SSMs, a new primitive for training efficient, large-scale foundation models. Our team combines deep expertise in model innovation and systems engineering paired with a design-minded product engineering team to build and ship cutting edge models and experiences.
We're funded by leading investors at Index Ventures and Lightspeed Venture Partners, along with Factory, Conviction, A Star, General Catalyst, SV Angel, Databricks and others. We're fortunate to have the support of many amazing advisors, and 90+ angels across many industries, including the world's foremost experts in AI.
ABOUT THE ROLE
We're hiring an Inference Engineer to advance our mission of building real-time multimodal intelligence.
YOUR IMPACT
- Design and build low latency, scalable, and reliable model inference and serving stack for our cutting edge foundation models using Transformers, SSMs and hybrid models.
- Work closely with our research team and product engineers to serve our suite of products in a fast, cost-effective, and reliable manner.
- Design and build robust inference infrastructure and monitoring for our products.
- Have significant autonomy to shape our products and directly impact how cutting-edge AI is applied across various devices and applications.
WHAT YOU BRING
Given the scale and difficulty of problems we work on, we value strong engineering skills at Cartesia.
- Strong engineering skills, comfortable navigating complex codebases and an eye for writing clean and maintainable code.
- Experience building large-scale distributed systems with high demands on performance, reliability, and observability.
- Technical leadership with the ability to execute and deliver zero-to-one results amidst ambiguity.
- Background in or experience working on inference pipelines with machine learning and generative models.
- Experience implementing state of the art Machine Learning models and research to applied problems.
- Preferable: experience with vLLM, SGLang, Continuous Batching or other inference frameworks.
- Preferable: experience working in CUDA, Triton or similar
WHAT WE OFFER
🍽 Lunch, dinner and snacks at the office.
🏥 Fully covered medical, dental, and vision insurance for employees.
🏦 401(k).
✈️ Relocation and immigration support.
🦖 Your own personal Yoshi.
OUR CULTURE
🏢 We’re an in-person team based out of San Francisco. We love being in the office, hanging out together, and learning from each other every day.
🚢 We ship fast. All of our work is novel and cutting edge, and execution speed is paramount. We have a high bar, and we don’t sacrifice quality or design along the way.
🤝 We support each other. We have an open & inclusive culture that’s focused on giving everyone the resources they need to succeed.
Similar Jobs
Related searches:
Hybrid Jobs
Mid-Level Jobs
Hybrid Mid-Level Jobs
Mid-Level NLP & Language AIMid-Level Machine LearningMid-Level Backend & SystemsMid-Level AI InfrastructureMid-Level AI ResearchMid-Level Generative AI
AI Jobs in San Francisco
NLP & Language AI in San FranciscoMachine Learning in San FranciscoBackend & Systems in San FranciscoAI Infrastructure in San FranciscoAI Research in San FranciscoGenerative AI in San Francisco
gpudistributed-systemsgenerative-aillmresearchinference