AI Red-Teamer — Adversarial AI Testing (Advanced); English & Arabic

Flexible • Mercor

Egypt Saudi Arabia UAE United Kingdom

Pay: $32.25 / hour

Role overview

This role centers on red teaming conversational AI models and agents using adversarial techniques. You will probe systems with structured attack strategies to surface vulnerabilities, identify systemic risks, and generate human red team data that improves AI safety.

The work is entirely text-based and may involve reviewing AI outputs related to sensitive topics such as bias, misinformation, or harmful behaviors. Participation in higher-sensitivity projects is optional and supported by clear guidelines. Native-level fluency in both English and Arabic is required.

What you’ll actually be doing

Red team conversational AI models and agents using jailbreaks, prompt injections, misuse cases, bias exploitation, and multi-turn manipulation
Annotate failures, classify vulnerabilities, and flag systemic risks
Follow taxonomies, benchmarks, and playbooks to ensure consistent testing
Produce reproducible reports, datasets, and documented attack cases

Who this role is for

Professionals with prior red teaming experience (AI adversarial work, cybersecurity, or socio-technical probing)
Native-level bilinguals in English and Arabic
Curious and adversarial thinkers who push systems to breaking points
Structured practitioners who rely on frameworks and benchmarks
Clear communicators who can explain risks to both technical and non-technical stakeholders
Individuals who thrive working across different projects and customers

Who this role is likely NOT for

Candidates without prior red teaming or adversarial experience
Individuals who prefer unstructured experimentation over benchmark-driven testing
Professionals seeking model-building or product development roles
Candidates who are not fully fluent in both English and Arabic

Technical background

Native-level fluency in English and Arabic (required)
Prior experience in AI red teaming, adversarial AI work, cybersecurity, or socio-technical risk analysis
Familiarity with jailbreak techniques, prompt injection, misuse-case testing, and multi-turn manipulation
Experience working with taxonomies, benchmarks, or structured evaluation playbooks

Project scope

Success is measured by uncovering vulnerabilities automated tests miss, delivering reproducible artifacts, and expanding evaluation coverage

Text-based adversarial testing of conversational AI systems

Work may involve reviewing outputs related to bias, misinformation, or harmful behaviors

Participation in higher-sensitivity testing areas is optional and clearly communicated