MLE Bench – Data Analyst

Flexible • Turing

Bangladesh Brazil Egypt Ghana India Kenya Mexico Nigeria Pakistan Turkey

Role overview

The MLE Bench – Data Analyst contributes to benchmark-driven evaluation projects focused on real-world machine learning systems. This role centers on hands-on analytical work with production-like datasets, performance metrics, and ML outputs to help evaluate, diagnose, and improve advanced AI systems.

The position sits at the intersection of data analysis and machine learning. It is suited for professionals who are comfortable working with real datasets, ML evaluation workflows, and rigorous analytical processes.

What you’ll actually be doing

Analyze structured and unstructured datasets generated from ML training, inference, and evaluation pipelines
Define, compute, and validate metrics used to evaluate model performance and behavior
Investigate data distributions, model outputs, failure modes, and edge cases relevant to benchmark tasks
Write and run Python and SQL code to analyze data, create reports, and support evaluation workflows
Validate data quality, consistency, and correctness across datasets and experiments
Create clear, well-documented analytical artifacts and reproducible analysis workflows
Collaborate with ML engineers and researchers to design challenging, real-world evaluation scenarios for MLE Bench

Who this role is for

Data Analysts or analytics-focused engineers with at least 3+ years of experience
Professionals with strong proficiency in Python for data analysis
Candidates with solid experience working with SQL and relational datasets
Individuals experienced in analyzing ML outputs and evaluation metrics
Those with a strong understanding of statistics and analytical reasoning
Analysts comfortable working with large, complex datasets and drawing reliable insights
Professionals who write clean, readable, and well-documented analytical code
Candidates with excellent spoken and written English communication skills

Who this role is likely NOT for

Professionals without experience in data analysis or analytics-focused engineering
Candidates without proficiency in Python and SQL for analytical work
Individuals without experience working with ML outputs or evaluation metrics
Those who are not comfortable working with large, complex datasets
Candidates who do not meet the minimum 3+ years of experience requirement

Technical background

Minimum 3+ years of experience as a Data Analyst or analytics-focused engineer
Strong proficiency in Python for data analysis
Solid experience with SQL and relational datasets
Experience analyzing ML outputs and evaluation metrics
Strong understanding of statistics and analytical reasoning
Ability to work with large, complex datasets
Experience writing clean, readable, and well-documented analytical code
Excellent spoken and written English communication skills

Project scope

Remote work environment

Benchmark-driven evaluation projects focused on real-world machine learning systems

Work on production-like datasets, metrics, and ML outputs

Collaboration with ML engineers and researchers on evaluation scenarios

Minimum commitment of at least 4 hours per day and 20 hours per week, with 4 hours of overlap with PST

Contractor assignment

Initial duration of 3 months, adjustable based on engagement