AI Red-Teamer — Adversarial AI Testing (Advanced); English & Hindi

AI Red-Teamer — Adversarial AI Testing (Advanced); English & Hindi

Flexible Mercor

India

Pay: $13.87 / hour

Role overview

This role focuses on adversarial testing of conversational AI models and agents. You will probe systems with jailbreaks, prompt injections, misuse cases, and bias exploitation to surface vulnerabilities and generate structured red team data. The work is text-based and involves reviewing AI outputs that may include sensitive topics such as bias, misinformation, or harmful behaviors.

This position is suited for experienced red teamers with strong English and Hindi fluency who are comfortable working within structured evaluation frameworks and documenting reproducible findings.


What you’ll actually be doing

  • Red team conversational AI models and agents through jailbreaks, prompt injections, misuse cases, bias exploitation, and multi-turn manipulation
  • Review AI outputs involving sensitive topics, following clearly communicated guidelines
  • Annotate model failures, classify vulnerabilities, and flag systemic risks
  • Follow defined taxonomies, benchmarks, and playbooks to ensure consistent testing
  • Produce reproducible reports, datasets, and documented attack cases

Who this role is for

  • Professionals with prior red teaming experience in AI adversarial work, cybersecurity, or socio-technical probing
  • Individuals who instinctively push systems to breaking points
  • Structured testers who use frameworks or benchmarks rather than ad hoc approaches
  • Clear communicators who can explain risks to both technical and non-technical stakeholders
  • Professionals comfortable moving across projects and customers
  • Candidates with native-level fluency in English and Hindi

Who this role is likely NOT for

  • Candidates without prior red teaming or closely related adversarial experience
  • Individuals who are not comfortable reviewing AI outputs that may involve sensitive topics
  • Candidates without native-level fluency in both English and Hindi
  • Those who prefer unstructured or purely exploratory testing without defined frameworks

Technical background

  • Prior red teaming experience in AI adversarial work, cybersecurity, or socio-technical risk domains
  • Experience generating and annotating adversarial datasets
  • Familiarity with structured evaluation approaches such as taxonomies and benchmarks
  • Native-level fluency in English and Hindi

Nice-to-have specialties (not mandatory):

  • Adversarial ML (e.g., jailbreak datasets, prompt injection, RLHF/DPO attacks, model extraction)
  • Cybersecurity (e.g., penetration testing, exploit development, reverse engineering)
  • Socio-technical risk analysis (e.g., harassment/disinformation probing, abuse analysis, conversational AI testing)
  • Creative adversarial probing backgrounds (e.g., psychology, acting, writing)

Project scope

Success defined by uncovering vulnerabilities, delivering reproducible artifacts, and expanding evaluation coverage

Text-based adversarial testing of conversational AI systems

Review of AI outputs that may include higher-sensitivity content, with topics communicated in advance

Optional participation in higher-sensitivity projects

Work guided by structured taxonomies, benchmarks, and playbooks