AI Red-Teamer — Adversarial AI Testing (Advanced); English & Spanish

Flexible • Mercor

Mexico United States

Pay: $26 / hour

Role overview

This role focuses on adversarial testing of conversational AI systems. You will probe AI models with adversarial inputs, identify vulnerabilities, and generate structured red team data designed to improve AI safety.

The work is entirely text-based and may involve reviewing outputs related to sensitive topics such as bias, misinformation, or harmful behaviors. Participation in higher-sensitivity projects is optional and guided by clear protocols. Native-level fluency in both English and Spanish is required.

What you’ll actually be doing

Red team conversational AI models and agents through jailbreaks, prompt injections, misuse cases, bias exploitation, and multi-turn manipulation
Generate structured human data by annotating failures, classifying vulnerabilities, and flagging systemic risks
Apply defined taxonomies, benchmarks, and playbooks to ensure consistent testing
Produce reproducible documentation including reports, datasets, and structured attack cases

Who this role is for

Professionals with prior red teaming experience in AI adversarial work, cybersecurity, or socio-technical probing
Individuals who approach systems adversarially and push them to failure points
Practitioners who rely on structured frameworks or benchmarks rather than unstructured experimentation
Communicators who can explain risks clearly to both technical and non-technical stakeholders
Contributors who are comfortable moving across different projects and customer contexts
Candidates with native-level fluency in English and Spanish

Who this role is likely NOT for

Candidates without prior red teaming or adversarial testing experience
Individuals who prefer non-technical or non-analytical roles
Applicants without native-level fluency in both English and Spanish

Technical background

Prior experience in AI red teaming (adversarial testing, cybersecurity, socio-technical risk analysis)
Experience applying structured taxonomies, benchmarks, or testing frameworks
Optional specialties may include:
- Adversarial machine learning (e.g., jailbreak datasets, prompt injection, RLHF/DPO attacks, model extraction)
- Cybersecurity practices such as penetration testing, exploit development, or reverse engineering
- Socio-technical risk analysis including harassment or misinformation probing
- Creative adversarial techniques informed by psychology, acting, or writing

Project scope

Expanded evaluation coverage and reduced production surprises

Text-based adversarial testing of AI systems

Work may include exposure to sensitive topics, communicated in advance

Participation in higher-sensitivity workstreams is optional

Success is reflected in:

Identification of vulnerabilities missed by automated testing

Delivery of reproducible artifacts that strengthen AI systems