AI Red-Teamer — Adversarial AI Testing (Advanced); English & Korean

Flexible • Mercor

South Korea United States

Pay: $50.5 / hour

Role overview

This role focuses on adversarial testing of conversational AI models and agents. You will probe systems with jailbreaks, prompt injections, misuse cases, and bias exploitation to surface vulnerabilities and generate structured red team data.

The work is text-based and involves reviewing AI outputs that may touch on sensitive topics such as bias, misinformation, or harmful behaviors. Participation in higher-sensitivity projects is optional and supported by clear guidelines, with topics clearly communicated in advance.

This role is suited for individuals with prior red teaming experience who are comfortable systematically testing AI systems to uncover weaknesses and document findings in a reproducible way.

What you’ll actually be doing

Red team conversational AI models and agents, including jailbreaks, prompt injections, misuse cases, bias exploitation, and multi-turn manipulation
Generate high-quality human data by annotating failures, classifying vulnerabilities, and flagging systemic risks
Follow defined taxonomies, benchmarks, and playbooks to ensure consistent testing
Produce reproducible documentation, including reports, datasets, and attack cases
Review AI outputs related to sensitive topics in accordance with provided guidelines

Who this role is for

Individuals with prior red teaming experience in AI adversarial work, cybersecurity, or socio-technical probing
Professionals who actively push systems to identify breaking points
Candidates who use structured frameworks or benchmarks in their testing approach
Communicators who can explain risks clearly to both technical and non-technical stakeholders
Individuals comfortable moving across projects and customers
Native-level fluency in English and Korean

Who this role is likely NOT for

Candidates without prior red teaming, adversarial AI, cybersecurity, or socio-technical probing experience
Individuals who are not fluent in both English and Korean
Those uncomfortable reviewing AI outputs that may involve sensitive topics
Professionals who rely on unstructured or ad hoc testing methods

Technical background

Prior experience in AI adversarial work, cybersecurity, or socio-technical risk analysis
Experience red teaming conversational AI systems or related technologies
Familiarity with structured evaluation methods, benchmarks, or taxonomies
Native-level fluency in English and Korean

Project scope

Success is measured by uncovering vulnerabilities automated tests miss, delivering reproducible artifacts, expanding evaluation coverage, and strengthening customer AI systems

Focused on adversarial testing of conversational AI models and agents

Work is entirely text-based

Participation in higher-sensitivity content review is optional and supported by clear guidelines