Flexible • Turing
Role overview
This role focuses on evaluating and improving large language models through software engineering expertise. You will contribute to AI model training initiatives by creating and curating high-quality code datasets, assessing AI-generated outputs, and designing verification mechanisms.
The work involves hands-on coding across multiple programming languages, structured evaluation of model performance across the software engineering lifecycle, and collaboration with cross-functional teams to strengthen AI-driven coding systems.
This role is suited for experienced software engineers with strong full-stack and production experience who can critically assess code quality, architecture, and scalability in a structured and analytical way.
What you’ll actually be doing
- Curating code examples, building solutions, and correcting code in Python, JavaScript (including ReactJS), C/C++, Java, Rust, and Go
- Evaluating and refining AI-generated code to ensure efficiency, scalability, and reliability
- Collaborating with cross-functional teams to improve AI-driven coding solutions against industry performance benchmarks
- Building agents that verify code quality and identify recurring error patterns
- Hypothesizing steps across the software engineering lifecycle (prototyping, architecture design, API design, production implementation, launch, experiments, monitoring, operations maintenance) and evaluating model capabilities within those stages
- Designing verification mechanisms that automatically validate solutions to software engineering tasks
Who this role is for
- Software engineers with several years of experience
- Engineers with 2+ years of continuous full-time experience at a top-tier product company (e.g., Google, Stripe, Amazon, Apple, Meta, Netflix, Microsoft, Datadog, Dropbox, Shopify, PayPal, IBM Research)
- Professionals experienced in building full-stack applications and deploying scalable, production-grade software
- Engineers with deep understanding of software architecture, system design, debugging, and code quality evaluation
- Individuals with strong oral and written communication skills capable of delivering structured evaluation rationales
Who this role is likely NOT for
- Engineers without several years of software engineering experience
- Candidates without at least 2+ years of continuous full-time experience at a top-tier product company as specified
- Professionals without production-grade full-stack development experience
- Individuals without strong software architecture and code review expertise
- Candidates who lack clear written and verbal communication skills
Technical background
- Several years of software engineering experience
- 2+ years of continuous full-time experience at a top-tier product company (as specified)
- Strong expertise in full-stack application development
- Experience deploying scalable, production-grade software
- Deep understanding of software architecture, design, development, debugging, and code quality assessment
- Proficiency in Python, JavaScript (including ReactJS), C/C++, Java, Rust, and Go
Project scope
Contractor engagement
Flexible engagement
Minimum 10 hours per week, up to 40 hours per week
Partial PST overlap required
Duration: 1 month (starting next week; potential extensions based on performance and fit)
