Research Scientist, Frontier Capabilities
full-time
senior
Posted 7 months ago
Apply Now
Stand out: build a proof-of-work pitch →
Free GitHub-based preview. Direct apply stays one click away.
Get weekly job alerts like this → Hiring? Promote this listing →About this role
Your impact at LILA
We’re building a talent-dense, high-agency research team to develop the next generation of learning systems and reasoning algorithms for agentic LLMs. Our work sits at the intersection of large language models, post-training, and scientific reasoning, with the goal of enabling systems that learn from experience, reason effectively, and improve through interaction .
Scientific domains present a distinct set of challenges that make this problem uniquely hard. Feedback is sparse and delayed — experiments take days or weeks, not milliseconds. Ground truth is expensive or contested. Distribution shift is structural, as instruments, techniques, and knowledge bases evolve continuously. The hypothesis space is vast and reward signal is thin. Existing benchmark do not capture these nuances. The goal is to build systems that can operate effectively in this scientific regime.
This role spans a few complementary directions. Candidates are expected to bring deep expertise in one (ore more) of the following areas. In the event of cross-track expertise, please select the one you align to the most. Our interview process will be catered to verifying the chosen expertise area.
Expertise Area 1 — Agentic system building
Focus: Build systems that autonomously propose, execute, and verify scientific hypotheses over long time horizons.
Create and analyze long-running auto-research systems that propose and verify hypotheses
Design planning frameworks for agentic systems operating over long, sparse feedback loops
Design memory architectures that allow agents to build and retrieve structured knowledge over time
Explore algorithms in recursive self-improvement, multi-agent coordination, and continual learning
Expertise Area 2: Distillation
Focus: Translate strong inference-time behaviors and reasoning traces into efficient, trainable models.
Develop distillation strategies from large or ensemble models into deployable systems
Research methods for self-improvement, including iterative self-distillation and critique loops
Investigate how to preserve generalization and reduce catastrophic forgetting through the distillation process
Expertise Area 3 — Scalable experience generation
Focus: Develop inference-time algorithms and synthetic data pipelines that generate high-quality training signal for scientific reasoning.
Design and benchmark inference-time search, sampling, and verification strategies
Propose new techniques in synthetic environment creation and curriculum learning
Develop synthetic data generation strategies that capture high-quality scientific reasoning for agentic model training
Measure the end-to-end impact of inference-time improvements on real scientific tasks
What you’ll need to succeed:
An advanced degree in computer science, machine learning, or a related field, or or comparable experience
Strong foundation in LLMs and empirical research
Experience designing and executing rigorous ML experiments, including benchmarking and ablations
Experience working with large-scale training or evaluation pipelines
Ability to define and pursue research directions in open-ended, rapidly evolving spaces
Strong collaboration and communication skills across research and engineering teams
Bonus points for:
Experience with synthetic data generation, distillation, or self-improvement loops
Familiarity with reinforcement learning (e.g., RLHF, on-policy methods)
Experience with planning, search, or decision-making systems at scale
Experience in building agentic systems with tool use, or multi-agent workflows
Background in program synthesis, coding benchmarks, or long-horizon tasks
Experience building evaluation frameworks or large-scale benchmarks
Scientific rigor & persistence:
You take a principled approach to experimentation, with careful baselines, ablations, and evaluation design
You are motivated by understanding why systems work, not just improving metrics
You prioritize clarity, reproducibility, and intellectual honesty in research
You are comfortable working through long, nonlinear iteration cycles
You operate effectively in ambiguous, fast-evolving research environments
Compensation
We offer competitive base compensation with bonus potential and generous early-stage equity. Your final offer will reflect your background, expertise, and expected impact.
U.S. Benefits. Full-time U.S. employees receive a comprehensive benefits program including medical, dental, and vision coverage; employer-paid life and disability insurance; flexible time off with generous company wide holidays; paid parental leave; an educational assistance program; commuter benefits, including bike share memberships for office based employees; and a company subsidized lunch program.
International Benefits. Full-time employees outside the U.S. receive a comprehensive benefits program tailored to their region. USD salary ranges apply only to U.S.-based
Similar Jobs
Related searches:
On-site Jobs
Senior Jobs
On-site Senior Jobs
Senior AI ResearchSenior Robotics & AutonomySenior Data EngineeringSenior Machine LearningSenior AI Agents & RAGSenior NLP & Language AI
AI Jobs in Boston
AI Research in BostonRobotics & Autonomy in BostonData Engineering in BostonMachine Learning in BostonAI Agents & RAG in BostonNLP & Language AI in Boston
llmagentsdata-pipelinereinforcement-learningresearch
Get jobs like this delivered weekly
Free AI jobs newsletter. No spam.