AI Red-Teamer — Adversarial AI Testing (Advanced); English & Spanish

AI Red-Teamer — Adversarial AI Testing (Advanced); English & Spanish

Flexible Mercor

Mexico United States

Pay: $26 / hour

Role overview

This role focuses on adversarial testing of conversational AI systems. You will probe AI models with adversarial inputs, identify vulnerabilities, and generate structured red team data designed to improve AI safety.

The work is entirely text-based and may involve reviewing outputs related to sensitive topics such as bias, misinformation, or harmful behaviors. Participation in higher-sensitivity projects is optional and guided by clear protocols. Native-level fluency in both English and Spanish is required.


What you’ll actually be doing

  • Red team conversational AI models and agents through jailbreaks, prompt injections, misuse cases, bias exploitation, and multi-turn manipulation
  • Generate structured human data by annotating failures, classifying vulnerabilities, and flagging systemic risks
  • Apply defined taxonomies, benchmarks, and playbooks to ensure consistent testing
  • Produce reproducible documentation including reports, datasets, and structured attack cases

Who this role is for

  • Professionals with prior red teaming experience in AI adversarial work, cybersecurity, or socio-technical probing
  • Individuals who approach systems adversarially and push them to failure points
  • Practitioners who rely on structured frameworks or benchmarks rather than unstructured experimentation
  • Communicators who can explain risks clearly to both technical and non-technical stakeholders
  • Contributors who are comfortable moving across different projects and customer contexts
  • Candidates with native-level fluency in English and Spanish

Who this role is likely NOT for

  • Candidates without prior red teaming or adversarial testing experience
  • Individuals who prefer non-technical or non-analytical roles
  • Applicants without native-level fluency in both English and Spanish

Technical background

  • Prior experience in AI red teaming (adversarial testing, cybersecurity, socio-technical risk analysis)
  • Experience applying structured taxonomies, benchmarks, or testing frameworks
  • Optional specialties may include:
    • Adversarial machine learning (e.g., jailbreak datasets, prompt injection, RLHF/DPO attacks, model extraction)
    • Cybersecurity practices such as penetration testing, exploit development, or reverse engineering
    • Socio-technical risk analysis including harassment or misinformation probing
    • Creative adversarial techniques informed by psychology, acting, or writing

Project scope

Expanded evaluation coverage and reduced production surprises

Text-based adversarial testing of AI systems

Work may include exposure to sensitive topics, communicated in advance

Participation in higher-sensitivity workstreams is optional

Success is reflected in:

Identification of vulnerabilities missed by automated testing

Delivery of reproducible artifacts that strengthen AI systems