Associate Director, MLOps Engineering

PathAI · New York, NY · $181k - $278k

full-time lead Posted 7 months ago

Apply Now Stand out: build a proof-of-work pitch →

Free GitHub-based preview. Direct apply stays one click away.

Get weekly job alerts like this →

Hiring for this role?

AI Market Demand Pack · $29 one-time

Compare this role's skills with the full AI hiring market. Get ranked demand, salary bands, leading companies, public source URLs, and a decision brief.

See the live sample →

distributed-systems pytorch healthcare code-generation generative-ai mlops cloud machine-learning

About this role

Who We Are PathAI's mission is to improve patient outcomes with AI-powered pathology. Our platform promises substantial improvements to the accuracy of diagnosis and the efficacy of treatment of diseases like cancer, leveraging modern approaches in machine learning and artificial intelligence. We have a track record of success in deploying AI algorithms for histopathology in translational research, pathology labs and clinical trials. Rigorous science and careful analysis is critical to the success of everything we do. Our team, composed of diverse employees with a wide range of backgrounds and experiences, is passionate about solving challenging problems and making a huge impact on patient outcomes. Where You Fit As the Associate Director, MLOps Lead, you will lead the team responsible for the backbone of our AI/ML Stack: the infrastructure that bridges ML research and massive-scale production. Your primary directive is to evolve our stack to meet the next scale of needs in large scale ML training & inference workloads. You’re someone who enjoys designing and building for reliability, relishes collaboration and technical challenges, and takes pride in making things better – without taking yourself too seriously. Our technical space is broad: high-scale AI training & inference workloads, cloud infrastructure, Kubernetes, observability, distributed systems, and a bit of everything in between. What You’ll Do This role is critical for driving the scalability and efficiency of our Machine Learning Operations platform with high-impact & high growth strategic initiatives. Vision and Roadmap: Develop and execute the long term vision & roadmap for MLOPs team to support ML development and deployment needs across the business units. Successfully manage the tension between short-term tactical deliveries and long-term architectural transformation for future growth. Team Management: Lead and mentor a team of 6-7+ high-performing engineers. Strategically allocate resources to manage support for existing services while executing key strategic initiatives. Cross-Functional Collaboration: Partner with leaders across machine learning, data science, product engineering, and infrastructure to proactively identify pain points, address bottlenecks, and facilitate the deployment of new solutions. Foundation Model Readiness: Architect the compute and storage pipelines required for ML Engineers to manage millions of slides and complex derived artifacts without data fragmentation or synchronization latency. Inference Modernization: Modernize the AI Product inference stack to support 5-10x growth of AI runs across global deployments. System Observability: Collaborate with Site Reliability Engineering (SRE) to establish comprehensive metrics covering compute under-utilization, network bottlenecks, and granular cost and turn-around-time attribution. Technology Refresh: Conduct "Build vs. Buy" assessments, leading "Stack Refresh" audits to benchmark our proprietary tools against best-in-class commercial and open-source alternatives to meet our future needs. What You Bring To be successful in this role with us, you'll at least need: Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field (or equivalent experience). 2-3+ years of experience managing engineering team(s), with a focus on building production-grade frameworks for MLOps or ML Infrastructure. Deep technical expertise with ML workloads on kubernetes, cloud computing platforms (AWS/GCP/Azure), workflow orchestration (Airflow, Kubeflow, or proprietary equivalents) and DevOps principles and infrastructure-as-code (Helm, Terraform). Proven experience managing petabyte-scale datasets and high-throughput production inference pipelines. Strong software engineering skills in complex, multi-language systems and experience with scalable service architecture. Use of AI assistants (e.g. CoPilot, Cursor, Claude) across platform development lifecycle. It Would Be Great If You Also Have Exposure to ML frameworks like PyTorch or Scikit-learn. Experience with large-scale data processing frameworks (e.g. Spark, Hive, Databricks, Amazon EMR) Expertise in MLOps principles, including model lifecycle management, feature stores, model monitoring, and CI/CD for ML. Familiarity with security and compliance best practices in ML systems. We Want To Hear From You At PathAI, we are looking for individuals who are team players, are willing to do the work no matter how big or small it may be, and who are passionate about everything they do. If this sounds like you, even if you may not match the job description to a tee, we encourage you to apply. You could be exactly what we're looking for. PathAI is an equal opportunity employer, dedicated to creating a workplace that is free of harassment and discrimination. We base our employment decisions on business needs, job requirements, and qualifications — that's all. We do