AI/ML Scientist – Protein Foundation Models
full-time
junior
Posted 2 days ago
About this role
Manifold Bio builds AI models for protein therapeutic design, trained on proprietary experimental data generated at unprecedented scale. Our in vivo-centric discovery platform produces millions of experimentally validated protein designs per campaign, creating the datasets that make our models possible and our approach uniquely powerful. We combine high-throughput protein engineering with computational design to create antibody-like drugs and other biologics. Our world-class team of protein engineers, biologists, and computational scientists are working together to aim the platform at therapeutic opportunities where precise targeting is the key to overcoming clinical challenges.
Position
Manifold's AI team is actively training protein foundation models on our proprietary experimental datasets. Our generative antibody design model, mBER, has already demonstrated controllable de novo binder design across multiple million-scale screening campaigns, and the team is now scaling foundation model capabilities to push well beyond current performance. We are looking for an AI/ML Scientist to join this effort. You will work alongside our existing model training team to accelerate the development of foundation models fine-tuned on Manifold's data, bringing additional depth in pre-training methodology, architecture development, and large-scale training. Your work will directly improve mBER's design capabilities and unlock new modeling paradigms for the broader team. You'll own foundation model projects end-to-end, from architecture selection and training infrastructure to evaluation against real experimental outcomes, while contributing to the team's shared research agenda.
This is an on-site role and can be based in either Boston, Massachusetts or San Francisco, California. Please only apply if you reside in these cities or are open to relocate.
Responsibilities
Advance the team's ongoing foundation model training efforts—pretraining, fine-tuning, and evaluating folding, docking, language, and generative design models on Manifold's proprietary experimental data
Bring depth in training methodology, architecture selection, and optimization to complement the existing team's expertise
Develop and scale training pipelines for distributed, multi-GPU and multi-node training runs
Integrate foundation model outputs into mBER to improve binder design success rates and enable new design capabilities
Design and execute ML experiments with clear hypotheses, rigorous evaluation frameworks, and systematic analysis
Establish best practices for mixed-precision training, gradient checkpointing, and computational efficiency at scale
Produce clear documentation and analysis supporting architecture and training decisions
Required Qualifications
Demonstrated experience pretraining and/or fine-tuning protein foundation models (folding, docking, language models, or generative design) with published or otherwise demonstrable results
Strong familiarity with AlphaFold architecture and training methodology
2+ years of hands-on experience with PyTorch and/or JAX for deep learning
Experience with large-scale model training: distributed training, multi-GPU/multi-node setups, mixed precision, gradient checkpointing
Solid understanding of deep learning architectures (transformers, attention mechanisms, diffusion/flow matching) and optimization techniques
Experience working with protein structure data (PDB, mmCIF) and/or protein sequence datasets
Strong statistical analysis and experimental design skills
Proficiency in Python scientific computing stack (NumPy, Pandas, scikit-learn)
Self-directed researcher who can balance guidance with independence
Excellent written and verbal communication skills for cross-functional collaboration
Preferred Qualifications
Experience with protein generative design methods (e.g., RFdiffusion, ProteinMPNN, flow matching approaches)
Experience with protein language models (e.g., ESM family)
Published research in computational biology, protein design, or structural biology
Experience training on proprietary or domain-specific biological datasets
Familiarity with Ray for distributed computing
Experience with Kubernetes (EKS) and cloud computing platforms (AWS)
Knowledge of protein engineering, directed evolution, or structural biology wet lab techniques
Experience working with agentic AI coding tools for fast, parallelized execution of modeling experiments
Previous biotech/pharma industry experience
This Role Might Be Perfect For You If:
You have deep experience training protein foundation models and want to apply that expertise to some of the richest proprietary experimental datasets in the field
You're excited about pushing beyond public model performance by leveraging unique, large-scale in vivo screening data
You thrive in high-ownership roles where you can drive research direction while collaborating with a tight-knit, world-class team
You want your models
Similar Jobs
Related searches:
On-site Jobs
Junior Jobs
On-site Junior Jobs
Junior Backend & SystemsJunior Generative AIJunior AI InfrastructureJunior AI ResearchJunior AI Agents & RAGJunior Machine Learning
AI Jobs in San Francisco
Backend & Systems in San FranciscoGenerative AI in San FranciscoAI Infrastructure in San FranciscoAI Research in San FranciscoAI Agents & RAG in San FranciscoMachine Learning in San Francisco
pre-trainingdeep-learningdistributed-systemspytorchjaxfine-tuninggenerative-aiagents