Data Scientist

Arena · San Francisco, CA

full-time mid Posted 21 hours ago

Apply Now Stand out: build a proof-of-work pitch →

Free GitHub-based preview. Direct apply stays one click away.

Get weekly job alerts like this →

Hiring for this role?

data-pipeline mlops cloud distributed-systems reinforcement-learning agents data-science

About this role

ABOUT ARENA INTELLIGENCE Arena is the platform for evaluating how AI models perform in the real world. Founded by researchers from UC Berkeley's SkyLab, we're on a mission to measure and advance the frontier of AI for real-world use, and to build the foundation for everyone to understand, shape, and benefit from it. Tens of millions of people use Arena each month to evaluate how frontier systems handle the work they actually do. The preferences they share power the most transparent, rigorous, and human-centered evaluations in AI. Leading AI labs, enterprises, and independent researchers rely on our work and open datasets to understand how models behave in real workflows: agentic coding, creative generation, professional productivity, and beyond. We go beyond leaderboards and decompose what human experience reveals about AI, so models advance toward the work people actually do. We're a team of researchers, academics, builders, and creatives from UC Berkeley, Google, Stanford, and DeepMind. We seek truth, move fast, and value craftsmanship, curiosity, and impact over hierarchy. We're building a company where thoughtful, curious people from all backgrounds can do their best work together, in an office culture that radiates excellence, energy, and focus. ABOUT THE ROLE We are seeking a Data Scientist with expertise in experimentation, causal inference, and retention analytics to drive data-informed decision-making and optimize user engagement. In this role, you will design and analyze experiments (A/B tests, quasi-experiments), develop measurement frameworks for key metrics (DAU, WAU, MAU, retention), and provide actionable insights to improve product growth and user retention. Proficiency in PySpark is highly desirable to handle large-scale datasets efficiently. ABOUT THE ROLE - Experimentation & Causal Inference - Design, implement, and analyze A/B tests, multi-armed bandits, and quasi-experimental methods to measure the impact of product changes. - Apply causal inference techniques (e.g., difference-in-differences, propensity score matching, synthetic control, regression discontinuity) to estimate treatment effects in non-randomized settings. - Collaborate with product, engineering, and marketing teams to define hypotheses, success metrics, and statistical power requirements. - Ensure rigorous statistical validity (e.g., controlling for biases, multiple testing corrections, confidence intervals). - Retention & Engagement Analytics - Develop and refine retention measurement frameworks (e.g., cohort analysis, survival analysis, churn prediction). - Define and track core engagement metrics (DAU, WAU, MAU, rolling retention, N-day retention) and diagnose trends. - Identify key drivers of retention through segmentation, funnel analysis, and predictive modeling. - Work with growth teams to optimize onboarding, engagement loops, and monetization strategies. - Data Infrastructure & Scalable Analytics - Build and maintain scalable data pipelines (using PySpark, SQL, or big data tools) to process and analyze large datasets. - Develop automated dashboards and reports (e.g., Tableau, Looker, Metabase) to monitor experiment performance and retention trends. - Ensure data quality and consistency in metric definitions across teams. - Optimize queries and computations for performance and cost efficiency in distributed systems (e.g., Databricks, AWS EMR, GCP BigQuery). - Cross-Functional Collaboration - Partner with product managers, engineers, and marketers to translate business questions into data-driven analyses. - Present findings and recommendations to executive stakeholders in clear, actionable formats. - Mentor junior data scientists and analysts on best practices in experimentation and retention analytics. YOU’LL HAVE - 3+ years of experience in data science, analytics, or experimentation (or equivalent in academic research). - Strong background in statistics and causal inference (hypothesis testing, Bayesian methods, experimental design). - Hands-on experience with SQL and Python (Pandas, NumPy, SciPy, StatsModels, Scikit-learn). - Proficiency in experimentation tools (e.g., Optimizely, Statsig, Eppo, or custom in-house systems). - Experience defining and analyzing retention metrics (DAU/WAU/MAU, cohort retention, churn). - Familiarity with big data tools (PySpark, Hadoop, or similar distributed computing frameworks). HIGHLY DESIRABLE: - Expertise in PySpark for large-scale data processing and analytics. - Experience with time-series forecasting, survival analysis, or uplift modeling. - Knowledge of ML for retention (e.g., propensity models, clustering, recommendation systems). - Experience with data visualization tools (Tableau, Looker, Plotly, Matplotlib/Seaborn). - Background in growth analytics, product analytics, or marketing analytics. NICE TO HAVE: - Ad