AI Red-Teamer — Adversarial AI Testing (Advanced); English & Hindi

Flexible • Mercor

India

Pay: $13.87 / hour

Role overview

This role focuses on adversarial testing of conversational AI models and agents. You will probe systems with jailbreaks, prompt injections, misuse cases, and bias exploitation to surface vulnerabilities and generate structured red team data. The work is text-based and involves reviewing AI outputs that may include sensitive topics such as bias, misinformation, or harmful behaviors.

This position is suited for experienced red teamers with strong English and Hindi fluency who are comfortable working within structured evaluation frameworks and documenting reproducible findings.

What you’ll actually be doing

Red team conversational AI models and agents through jailbreaks, prompt injections, misuse cases, bias exploitation, and multi-turn manipulation
Review AI outputs involving sensitive topics, following clearly communicated guidelines
Annotate model failures, classify vulnerabilities, and flag systemic risks
Follow defined taxonomies, benchmarks, and playbooks to ensure consistent testing
Produce reproducible reports, datasets, and documented attack cases

Who this role is for

Professionals with prior red teaming experience in AI adversarial work, cybersecurity, or socio-technical probing
Individuals who instinctively push systems to breaking points
Structured testers who use frameworks or benchmarks rather than ad hoc approaches
Clear communicators who can explain risks to both technical and non-technical stakeholders
Professionals comfortable moving across projects and customers
Candidates with native-level fluency in English and Hindi

Who this role is likely NOT for

Candidates without prior red teaming or closely related adversarial experience
Individuals who are not comfortable reviewing AI outputs that may involve sensitive topics
Candidates without native-level fluency in both English and Hindi
Those who prefer unstructured or purely exploratory testing without defined frameworks

Technical background

Prior red teaming experience in AI adversarial work, cybersecurity, or socio-technical risk domains
Experience generating and annotating adversarial datasets
Familiarity with structured evaluation approaches such as taxonomies and benchmarks
Native-level fluency in English and Hindi

Nice-to-have specialties (not mandatory):

Adversarial ML (e.g., jailbreak datasets, prompt injection, RLHF/DPO attacks, model extraction)
Cybersecurity (e.g., penetration testing, exploit development, reverse engineering)
Socio-technical risk analysis (e.g., harassment/disinformation probing, abuse analysis, conversational AI testing)
Creative adversarial probing backgrounds (e.g., psychology, acting, writing)

Project scope

Success defined by uncovering vulnerabilities, delivering reproducible artifacts, and expanding evaluation coverage

Text-based adversarial testing of conversational AI systems

Review of AI outputs that may include higher-sensitivity content, with topics communicated in advance

Optional participation in higher-sensitivity projects

Work guided by structured taxonomies, benchmarks, and playbooks