Manager, Data & AI Platform Engineering

Stitch Fix · Remote (US) · $146k - $195k

full-time senior Posted 2 months ago

Apply Now

healthcare llm search pytorch fine-tuning payments mlops generative-ai

About this role

About Stitch Fix, Inc. Stitch Fix (NASDAQ: SFIX) is the leading online personal styling service that helps people discover the styles they will love that fit perfectly so they always look - and feel - their best. Few things are more personal than getting dressed, but finding clothing that fits and looks great can be a challenge. Stitch Fix solves that problem. By pairing expert stylists with best-in-class AI and recommendation algorithms, the company leverages its assortment of exclusive and national brands to meet each client's individual tastes and needs, making it convenient for clients to express their personal style without having to spend hours in stores or sifting through endless choices online. Stitch Fix, which was founded in 2011, is headquartered in San Francisco. About the Role We’re hiring a Manager of Data & AI Platform Engineering to lead the organization that manages Stitch Fix’s engineers on our core data, machine learning, and generative AI platforms. You will contribute towards delivering on the vision and drive the technical execution for the systems that make AI-powered, data-driven experiences possible across the company - enabling richer personalization, better decision-making, intelligent automation and, and enterprise-wide innovation. You will help evolve our technical foundation to support next-gen AI use cases - unified signals, dynamic and context-aware models, semantic understanding, retrieval-based intelligence, and evolved ML workflows. You're excited about this opportunity because you will… Lead in a player-coach capacity in execution for Stitch Fix’s next-gen Data, ML, and GenAI platforms - building a unified, secure and scalable architecture for semantic search, retrieval-based intelligence, multi-model orchestration, and agent automation, while operationalizing GenAI through safe, performant, and production-ready systems that power real-world client and employee experiences. Contribute towards modernization of data and ML foundations to support unified signals, adaptive models, experimentation velocity, and scalable AI/ML workloads. Provide foundational APIs, SDKs, frameworks, and self-service tools that make it easy for data scientists, ML engineers, analysts, and application teams to build and deploy AI solutions quickly, safely, and at scale. Partner with Data Science, Engineering, and Product teams to translate Data/ML/GenAI platform capabilities into production-grade features and intelligent experiences that deliver measurable business value. Drive responsible AI and data adoption by creating reusable templates, documentation, and enablement programs, and by partnering closely with technology and business teams to identify and prioritize high-impact opportunities for personalization, automation, and intelligence. Contribute towards improving governance practices including data contracts, lineage, metric definitions, access policies, and responsible AI guardrails - for trust, safety, and compliance. Ensure operational excellence through platform reliability, performance, observability, cost efficiency, and simplification of legacy systems. Lead and develop high-performing engineering teams fostering a culture of clarity, excellence, and trust. Balance speed of innovation with platform stability, ensuring engineering efforts are tightly aligned to business priorities and long-term client value. We’re excited about you because you have… Experience: 5+ years in software, data, ML, or platform engineering; 1+ years leading engineering individual contributors is a plus. Demonstrated success contributing towards large-scale data platforms, ML platforms, or AI/GenAI platforms in cloud environments. Experience delivering platform modernization, unification, and multi-year architectural transformation. Technical Expertise: Strong software engineering foundation, with experience designing and building large-scale distributed systems and resilient, high-quality APIs and services using modern programming languages and cloud-native architectures. Track record operating and evolving modern data infrastructure, including some of the following: distributed compute and storage technologies (Spark, Trino, Iceberg), real-time processing frameworks (Kafka/Flink), metadata / catalog systems, and Kubernetes-based orchestration. Expertise across the ML lifecycle - feature engineering, training pipelines, model deployment and serving, monitoring, validation, fine-tuning, and MLOps best practices. Proven capability in building self-service platform abstractions and tooling that enable teams to develop, experiment, and deploy data and ML products efficiently. Experience with modern GenAI architectures - semantic retrieval, knowledge-grounded indexing, LLM orchestration, agent workflows, and evaluation frameworks. Familiarity with modern ML frameworks like PyTorch and Ray is a plus. Leadership Skills: Strategic thinker able