Data Scientist — LLM Evaluation

Axion AI · Remote (US) · $140k - $220k
full-time mid Posted 1 day ago

About this role

Design and implement evaluation frameworks for large language models. Build benchmarks, run experiments, and measure model quality across dimensions. Your work determines which models ship and which don't.

Requirements

Strong statistics background. Experience with LLM evaluation or NLP benchmarking. Python required. Experience with statistical testing.

Similar Jobs

Related searches:

Remote Jobs Mid-Level Jobs Remote Mid-Level Jobs Mid-Level AI ResearchMid-Level NLP & Language AIMid-Level Machine LearningMid-Level Backend & Systems pythonllmevaluationstatisticsnlpbenchmarking