Staff ML Research Scientist, Pegasus

Twelve Labs · Seoul, South Korea
full-time lead Posted 4 days ago

About this role

WHO WE ARE At TwelveLabs, we are pioneering the development of cutting-edge multimodal foundation models that have the ability to comprehend videos just like humans do. Our models have redefined the standards in video-language modeling, empowering us with more intuitive and far-reaching capabilities, and fundamentally transforming the way we interact with and analyze various forms of media. With a $110+ million in Seed and Series A funding, our company is backed by top-tier venture capital firms such as NVIDIA’s NVentures, NEA, Radical Ventures, and Index Ventures, and prominent AI visionaries and founders such as Fei-Fei Li, Silvio Savarese, Alexandr Wang and more. Headquartered in San Francisco, with an influential APAC presence in Seoul, our global footprint underscores our commitment to driving worldwide innovation. Our partnership with NVIDIA and AWS gives us access to the most advanced chips, including B300s, enabling us to push the boundaries of what's possible in video AI. We are a global company that values the uniqueness of each person’s journey. It is the differences in our cultural, educational, and life experiences that allow us to constantly challenge the status quo. We are looking for individuals who are motivated by our mission and eager to make an impact as we push the bounds of technology to transform the world. Join us as we revolutionize video understanding and multimodal AI. ABOUT THE TEAM The Pegasus team sits at the core of TwelveLabs' video understanding capabilities and is responsible for driving Pegasus, our Video Analysis product. Our focus is on developing multimodal video analysis systems that are designed for high instruction following capability and producing highly complex, hierarchically structured outputs. We focus on shipping products with real-world value rather than doing research in isolation, and we work in a goal-oriented, cross-functional team that encompasses both ML researchers and engineers. Our work covers a broad range of challenges: large-scale distributed training of multi-modal LLMs that span from pre-training to RL, accurate temporal segmentation and structured metadata extraction for real-world use cases, extending temporal context length to multiple hours, and data curation processes that enable well-aligned evaluation and performance improvements through training data enhancements. Our team has access to the most advanced chips in the world, including NVIDIA B300s, to push the boundaries of video analysis systems—accelerating our research-to-production cycle as fast as possible. IN THIS ROLE, YOU WILL - Identify and frame the highest-impact research problems for Pegasus across multi-hour temporal understanding, hierarchical output generation, and novel training paradigms and shape the team's research direction accordingly. - Raise the team's research bar by improving how the team designs experiments, chooses research directions, and decides what to pursue or abandon. - Design evaluation strategies and data curation methods for problems where existing benchmarks are insufficient. - Drive research into product, ensuring that advances in temporal understanding, structured output quality, and instruction following translate into measurable gains. - Communicate research direction and findings to align the broader team and inform strategic technical decisions. - Explore and adopt AI-assisted development tools such as Claude, Gemini, and GPT to improve productivity across coding, experimentation, debugging, and documentation. Even if you don't check every box, we encourage you to apply. If you're a zero-to-one achiever, a ferocious learner, and a kind team player who motivates others, you'll find a home at TwelveLabs. YOU MAY BE A GOOD FIT IF YOU HAVE - Deep research experience with a demonstrated track record of impact in one or more areas relevant to video understanding, such as multimodal LLMs, large-scale distributed training, temporal modeling, data-centric model development, computer vision, or vision-language systems. - A track record of identifying and framing high-value research problems — not just executing on well-defined ones, but recognizing where the most impactful work lies. - Strong proficiency in Python and PyTorch. - Exceptional experimental judgment and the ability to design evaluation strategies for frontier problems where existing approaches are insufficient. - A track record of raising the research bar for a team — improving how others design experiments, evaluate results, and assess research directions. - Strong communication skills and the ability to align technical direction through clear articulation of research strategy and findings. PREFERRED QUALIFICATIONS - Experience working on multimodal systems involving video, vision, language, or structured output generation. - Experience improving model quality through data curation, evaluation design, or training data enhancements. - Ex

Similar Jobs

Related searches:

Remote Jobs Lead Jobs Remote Lead Jobs Lead Machine LearningLead Backend & SystemsLead Computer VisionLead AI InfrastructureLead AI ResearchLead Generative AILead NLP & Language AI AI Jobs in Seoul Machine Learning in SeoulBackend & Systems in SeoulComputer Vision in SeoulAI Infrastructure in SeoulAI Research in SeoulGenerative AI in SeoulNLP & Language AI in Seoul generative-aipre-trainingcomputer-visioncloudllmdistributed-systemspytorchresearch