AI Infrastructure Engineer

Stack AI · SF Office - 171 2nd, 4th floor
full-time mid Posted 7 months ago

About this role

ABOUT THE ROLE We’re hiring an AI Infrastructure Engineer to shape and scale the backend systems that power our AI platform. As a Series A company, your work will be foundational, enabling safe, efficient, and reliable AI workflows from end to end. WHAT YOU’LL DO - Design and implement scalable backend architectures for AI workloads (inference, orchestration, monitoring). - Own distributed job orchestration with Temporal and related systems. - Improve data pipeline performance by designing smarter caching strategies (e.g., file deduplication, hot/cold storage, Redis caching layers) to reduce redundant compute and API calls. - Build observability, monitoring, retries, and fault tolerance into all workflows. - Manage infrastructure reliability, incident response, and performance. - Develop tooling and platform infrastructure to support rapid growth. - Partner with ML engineers to bring models to production at scale. WHAT WE’RE LOOKING FOR - 4+ years of backend engineering (Python is a must). - Strong background in distributed systems, job orchestration, and task queues. - Deep knowledge of concurrency, parallelism, and multithreading—including async/await, event loops, thread pools, synchronization primitives, deadlocks, and race conditions—is a must. You should know how to design systems that maximize throughput without sacrificing correctness or safety. - Hands-on experience with Temporal, Redis, Airflow, Celery, RabbitMQ (or similar). - Experience with LLM serving and routing fundamentals (rate limiting, streaming, load balancing, budgets). - Comfortable with containers & orchestration: Docker, Kubernetes. - Familiarity with cloud platforms (AWS/GCP) and IaC (Terraform). - Experience with multiple storage systems: S3, Postgres, MongoDB, Redis, and Elasticsearch. - Track record scaling systems in startups or fast-paced environments. - Understanding of deploying, monitoring, and optimizing AI/ML systems in production with strong CI/CD practices. WHY YOU’LL LOVE WORKING HERE - Play a foundational role at a fast-growing Series A startup that is shaping the future of AI in enterprise workflows. - Collaborate across Product, ML, and Platform teams, being the bridge between AI logic and scalable execution. - Build infrastructure that enables real value for large enterprises: low-code, secure, and scalable AI workflows. - Join a company that’s scaling thoughtfully and values developer experience.

Similar Jobs

Related searches:

Remote Jobs Mid-Level Jobs Remote Mid-Level Jobs Mid-Level Backend & SystemsMid-Level Machine LearningMid-Level AI InfrastructureMid-Level NLP & Language AIMid-Level Data Engineering data-pipelinellmdistributed-systemssearchinfrastructure

Get jobs like this delivered weekly

Free AI jobs newsletter. No spam.