Role overview
This role focuses on designing rigorous evaluation frameworks for advanced AI systems performing machine learning engineering tasks. Rather than building production ML features, you will translate real-world ML research and experimentation workflows into structured, testable benchmarks that assess model capability.
The work sits at the intersection of applied ML research, systems thinking, and technical writing. It is best suited for experienced machine learning engineers or researchers who understand model development deeply and can formalize that knowledge into clear, high-signal evaluation tasks.
What you’ll actually be doing
- Designing structured evaluation suites that reflect real ML engineering workflows
- Translating practical tasks (model training, experimentation, debugging, optimization) into measurable benchmarks
- Reviewing and assessing AI-generated ML solutions for technical correctness and reasoning quality
- Defining success criteria that capture both implementation quality and system-level tradeoffs
- Analyzing model outputs to identify strengths, weaknesses, and failure patterns
- Documenting evaluation logic with precision and reproducibility
Who this role is for
- Machine learning engineers with hands-on model development experience
- Applied ML researchers familiar with experimentation pipelines
- Engineers comfortable reasoning about architecture decisions and performance tradeoffs
- Professionals who enjoy formalizing complex workflows into structured evaluation frameworks
- Technically strong writers who can express ML concepts clearly and unambiguously
Who this role is likely NOT for
- Entry-level ML practitioners without independent project ownership
- Data analysts who primarily focus on reporting or BI workflows
- Software engineers without direct model training and experimentation experience
- Candidates looking for traditional product ML engineering roles
- Professionals who prefer implementation over analytical evaluation work
Technical background
- 3+ years of experience in machine learning engineering or applied ML research
- Strong hands-on experience training, evaluating, and iterating on ML models
- Deep familiarity with experimentation workflows and performance analysis
- Ability to reason about model architecture, optimization strategies, and system tradeoffs
- Experience in research-oriented environments (industry lab or academic setting preferred)
- High attention to technical detail and structured thinking
Project scope
Potential for continued contributions based on performance and project needs
Project-based engagement focused on AI evaluation initiatives
Work structured around defined evaluation deliverables
Flexible scheduling with outcome-driven expectations
