Contract — AI Agent Evaluation
contract
mid
Posted 1 month ago
About this role
Evaluate and benchmark Aaru's synthetic research agents. Design evaluation protocols, run A/B tests, and measure agent accuracy against human researchers.
3-month contract with extension possibility. Remote-friendly.
Requirements
Experience with LLM evaluation or user research. Statistical analysis skills. Python. Available to start immediately.
Similar Jobs
Related searches:
Get jobs like this delivered weekly
Free AI jobs newsletter. No spam.