Machine Learning Engineer — AI Architecture Research

Featherless AI · Remote

full-time mid Posted 4 months ago

Apply Now Get weekly job alerts like this → Hiring? Promote this listing →

deep-learning fine-tuning pytorch machine-learning research

About this role

ABOUT THE ROLE We’re looking for a Machine Learning Engineer focused on AI architecture research to help design, prototype, and validate next-generation model architectures. You’ll work at the intersection of research and production — turning new ideas into scalable, real-world systems. This role is ideal for someone who enjoys questioning architectural assumptions, experimenting with novel model designs, and pushing beyond standard Transformer-style approaches. WHAT YOU’LL WORK ON - Research and develop new neural network architectures (e.g. alternatives or extensions to Transformers, recurrent / hybrid models, long-context systems) - Design and run architecture-level experiments (scaling laws, memory mechanisms, compute trade-offs) - Prototype models end-to-end — from research code to training-ready implementations - Collaborate with inference and systems engineers to ensure architectures are deployable and efficient - Analyze model behavior, failure modes, and inductive biases - Read, reproduce, and extend cutting-edge research papers - Contribute to internal research notes, benchmarks, and open-source efforts (where applicable) WHAT WE’RE LOOKING FOR - Strong background in machine learning fundamentals and deep learning - Hands-on experience implementing model architectures from scratch - Solid understanding of: - Attention mechanisms, RNNs, state-space models, or hybrid architectures - Training dynamics, scaling behavior, and optimization - Memory, latency, and compute constraints at the model level - Comfortable working in PyTorch or JAX - Ability to move fluidly between theory, experimentation, and engineering - Clear communicator who can explain architectural trade-offs NICE TO HAVE - Experience with non-Transformer architectures (RNN variants, SSMs, long-context models) - Background in research-driven startups or open-source ML projects - Experience with large-scale training or custom training loops - Publications, preprints, or notable research contributions - Familiarity with inference optimization and deployment constraints WHY JOIN - Work on core model architecture, not just fine-tuning - Direct influence on the technical direction of a Series-A company - Small, high-caliber team with fast feedback loops - Opportunity to ship research into production - Competitive compensation + meaningful equity