Machine Learning Eval Engineer

Reducto · San Francisco, CA
full-time mid Posted 2 months ago

About this role

About Reducto Reducto is the agentic document platform for leading AI teams who demand enterprise performance at scale. We provide a comprehensive toolkit for working with documents the way a human would, combining custom in-house and leading frontier models to power efficient and accurate document workflows. We’ve grown rapidly, increasing revenue 8x year over year and partnering with hundreds of companies, from leading AI teams like Harvey, Vanta, and Scale, to enterprise customers across FAANG and top trading firms. Reducto has raised over $100M from world-class investors including a16z, Benchmark, and First Round Capital. The Opportunity As an ML Eval Engineer, you’ll play a key role in building the evaluation systems and benchmarks that make Reducto’s models better over time. You’ll collaborate closely with our ML, platform, and GTM teams to identify model weaknesses, design strong benchmarks, and create metrics and tooling that surface new failure modes as we scale. This is a high-impact role where you’ll help define how model quality is measured at Reducto and shape the systems we use to improve it. WHAT YOU’LL DO - Design, build, and maintain evaluation benchmarks that reveal where our models perform well and where they fail. - Develop metrics, heuristics, and workflows to automatically identify new failure modes across large and messy real-world datasets. - Partner closely with other ML engineers to turn evaluation insights into model improvements and better training priorities. - Work hands-on with unstructured enterprise data, including PDFs, spreadsheets, and other difficult document formats, to uncover edge cases and hard examples. - Build lightweight internal and user-facing tools, including simple interfaces in Python frameworks like Flask, to help teams inspect results, analyze model behavior, and communicate evaluation outcomes. - Collaborate with customers and internal teams to understand real-world data needs and create bespoke benchmarks that highlight Reducto’s strengths. YOU’LL THRIVE HERE IF YOU: - Hold yourself to a high bar for quality and precision. - Enjoy solving complex problems and building from first principles. - Have strong Python skills and can independently build clean, reliable technical solutions. Bonus points for product and frontend experience! - Are comfortable working with data infrastructure such as AWS S3 and OLAP or analytics systems like Tinybird. - Love getting your hands dirty with unstructured data and chasing down difficult failure cases. - Operate well in fast-changing, high-growth environments. - Collaborate effectively across technical and non-technical teams. - Take full ownership from strategy through execution. BONUS POINTS IF YOU: - Have experience at an early-stage or high-growth startup. - Have some background in product thinking and can build simple, polished user-facing interfaces. - Are comfortable working directly with customers to understand their workflows and data needs. - Have experience in AI/ML, data infrastructure, enterprise software, or document understanding systems. - Care deeply about combining technical excellence with business impact. THIS IS AN IN PERSON ROLE AT OUR OFFICE IN SF. WE’RE AN EARLY STAGE COMPANY WHICH MEANS THAT THE ROLE REQUIRES WORKING HARD AND MOVING QUICKLY. PLEASE ONLY APPLY IF THAT EXCITES YOU. ABOUT REDUCTO Nearly 80% of enterprise data is in unstructured formats like PDFs PDFs are the status quo for enterprise knowledge in nearly every industry. Insurance claims, financial statements, invoices, and health records are all stored in a structure that’s simply impractical for use in digital workflows. This isn’t an inconvenience—it’s a critical bottleneck that leads to dozens of wasted hours every week https://www.reducto.ai/blog/the-real-cost-of-manual-document-processing. Traditional approaches fail at reliably extracting information in complex PDFs OCR and even more sophisticated ML approaches work for simple text documents but are unreliable for anything more complex. Text from different columns are jumbled together, figures are ignored, and tables are a nightmare to get right. Overcoming this usually requires a large engineering effort dedicated to building specialized pipelines for every document type you work with. Reducto https://www.reducto.ai/ breaks document layouts into subsections and then contextually parses each depending on the type of content. This is made possible by a combination of vision models, LLMs, and a suite of heuristics we built over time. Put simply, we can help you: - Accurately extract text and tables even with nonstandard layouts - Automatically convert graphs to tabular data and summarize images in documents - Extract important fields from complex forms with simple, natural language instructions - Build powerful retrieval pipelines using Reducto’s document metadata - Intelli

Similar Jobs

Related searches:

On-site Jobs Mid-Level Jobs On-site Mid-Level Jobs Mid-Level AI Agents & RAGMid-Level NLP & Language AIMid-Level Data ScienceMid-Level Machine LearningMid-Level AI InfrastructureMid-Level AI Research AI Jobs in San Francisco AI Agents & RAG in San FranciscoNLP & Language AI in San FranciscoData Science in San FranciscoMachine Learning in San FranciscoAI Infrastructure in San FranciscoAI Research in San Francisco cloudllmagentsmachine-learningevaluation

Get jobs like this delivered weekly

Free AI jobs newsletter. No spam.