Data Scientist — LLM Evaluation

Axion AI · Remote (US) · $140k - $220k

full-time mid Posted 3 months ago

Apply Now Stand out: build a proof-of-work pitch →

Free GitHub-based preview. Direct apply stays one click away.

Get weekly job alerts like this →

Hiring for this role?

AI Market Demand Pack · $29 one-time

Compare this role's skills with the full AI hiring market. Get ranked demand, salary bands, leading companies, public source URLs, and a decision brief.

See the live sample →

python llm evaluation statistics nlp benchmarking

About this role

Design and implement evaluation frameworks for large language models. Build benchmarks, run experiments, and measure model quality across dimensions. Your work determines which models ship and which don't.

Requirements

Strong statistics background. Experience with LLM evaluation or NLP benchmarking. Python required. Experience with statistical testing.

Job Details

Company: Axion AI
Location: Remote (US)
Workplace: Remote
Type: full-time
Level: mid
Salary: $140k - $220k

Similar Jobs

Contract — AI Agent Evaluation

Remote · $100k - $150k

evaluationpythonagentsstatisticsnlpbenchmarking

Prompt Engineer — Enterprise

San Francisco, CA / Remote · $140k - $220k

prompt-engineeringllmpythonevaluationenterprise

Member of Technical Staff, Data Analysis and Evaluation

nlppytorchdistributed-systemstensorflowllmevaluation

Machine Learning Engineer, LLM Evals & Observability

Mountain View, CA · $200k - $300k

agentscloudllmdata-pipelinenlpevaluationmachine-learning

Related searches:

Remote Jobs Mid-Level Jobs Remote Mid-Level Jobs Mid-Level Backend & Systems Mid-Level Machine Learning Mid-Level AI Research Mid-Level NLP & Language AI python llm evaluation statistics nlp benchmarking

Get jobs like this delivered weekly

Free AI jobs newsletter. No spam.