MLE Bench – Data Analyst

MLE Bench – Data Analyst

Flexible Turing

Bangladesh Brazil Egypt Ghana India Kenya Mexico Nigeria Pakistan Turkey

Role overview

The MLE Bench – Data Analyst contributes to benchmark-driven evaluation projects focused on real-world machine learning systems. This role centers on hands-on analytical work with production-like datasets, performance metrics, and ML outputs to help evaluate, diagnose, and improve advanced AI systems.

The position sits at the intersection of data analysis and machine learning. It is suited for professionals who are comfortable working with real datasets, ML evaluation workflows, and rigorous analytical processes.


What you’ll actually be doing

  • Analyze structured and unstructured datasets generated from ML training, inference, and evaluation pipelines
  • Define, compute, and validate metrics used to evaluate model performance and behavior
  • Investigate data distributions, model outputs, failure modes, and edge cases relevant to benchmark tasks
  • Write and run Python and SQL code to analyze data, create reports, and support evaluation workflows
  • Validate data quality, consistency, and correctness across datasets and experiments
  • Create clear, well-documented analytical artifacts and reproducible analysis workflows
  • Collaborate with ML engineers and researchers to design challenging, real-world evaluation scenarios for MLE Bench

Who this role is for

  • Data Analysts or analytics-focused engineers with at least 3+ years of experience
  • Professionals with strong proficiency in Python for data analysis
  • Candidates with solid experience working with SQL and relational datasets
  • Individuals experienced in analyzing ML outputs and evaluation metrics
  • Those with a strong understanding of statistics and analytical reasoning
  • Analysts comfortable working with large, complex datasets and drawing reliable insights
  • Professionals who write clean, readable, and well-documented analytical code
  • Candidates with excellent spoken and written English communication skills

Who this role is likely NOT for

  • Professionals without experience in data analysis or analytics-focused engineering
  • Candidates without proficiency in Python and SQL for analytical work
  • Individuals without experience working with ML outputs or evaluation metrics
  • Those who are not comfortable working with large, complex datasets
  • Candidates who do not meet the minimum 3+ years of experience requirement

Technical background

  • Minimum 3+ years of experience as a Data Analyst or analytics-focused engineer
  • Strong proficiency in Python for data analysis
  • Solid experience with SQL and relational datasets
  • Experience analyzing ML outputs and evaluation metrics
  • Strong understanding of statistics and analytical reasoning
  • Ability to work with large, complex datasets
  • Experience writing clean, readable, and well-documented analytical code
  • Excellent spoken and written English communication skills

Project scope

Remote work environment

Benchmark-driven evaluation projects focused on real-world machine learning systems

Work on production-like datasets, metrics, and ML outputs

Collaboration with ML engineers and researchers on evaluation scenarios

Minimum commitment of at least 4 hours per day and 20 hours per week, with 4 hours of overlap with PST

Contractor assignment

Initial duration of 3 months, adjustable based on engagement