AI Red-Teamer — Adversarial AI Testing (Advanced); English & Korean

AI Red-Teamer — Adversarial AI Testing (Advanced); English & Korean

Flexible Mercor

South Korea United States

Pay: $50.5 / hour

Role overview

This role focuses on adversarial testing of conversational AI models and agents. You will probe systems with jailbreaks, prompt injections, misuse cases, and bias exploitation to surface vulnerabilities and generate structured red team data.

The work is text-based and involves reviewing AI outputs that may touch on sensitive topics such as bias, misinformation, or harmful behaviors. Participation in higher-sensitivity projects is optional and supported by clear guidelines, with topics clearly communicated in advance.

This role is suited for individuals with prior red teaming experience who are comfortable systematically testing AI systems to uncover weaknesses and document findings in a reproducible way.


What you’ll actually be doing

  • Red team conversational AI models and agents, including jailbreaks, prompt injections, misuse cases, bias exploitation, and multi-turn manipulation
  • Generate high-quality human data by annotating failures, classifying vulnerabilities, and flagging systemic risks
  • Follow defined taxonomies, benchmarks, and playbooks to ensure consistent testing
  • Produce reproducible documentation, including reports, datasets, and attack cases
  • Review AI outputs related to sensitive topics in accordance with provided guidelines

Who this role is for

  • Individuals with prior red teaming experience in AI adversarial work, cybersecurity, or socio-technical probing
  • Professionals who actively push systems to identify breaking points
  • Candidates who use structured frameworks or benchmarks in their testing approach
  • Communicators who can explain risks clearly to both technical and non-technical stakeholders
  • Individuals comfortable moving across projects and customers
  • Native-level fluency in English and Korean

Who this role is likely NOT for

  • Candidates without prior red teaming, adversarial AI, cybersecurity, or socio-technical probing experience
  • Individuals who are not fluent in both English and Korean
  • Those uncomfortable reviewing AI outputs that may involve sensitive topics
  • Professionals who rely on unstructured or ad hoc testing methods

Technical background

  • Prior experience in AI adversarial work, cybersecurity, or socio-technical risk analysis
  • Experience red teaming conversational AI systems or related technologies
  • Familiarity with structured evaluation methods, benchmarks, or taxonomies
  • Native-level fluency in English and Korean

Project scope

Success is measured by uncovering vulnerabilities automated tests miss, delivering reproducible artifacts, expanding evaluation coverage, and strengthening customer AI systems

Focused on adversarial testing of conversational AI models and agents

Work is entirely text-based

Participation in higher-sensitivity content review is optional and supported by clear guidelines