Role overview
This role centers on red teaming conversational AI models and agents using adversarial techniques. You will probe systems with structured attack strategies to surface vulnerabilities, identify systemic risks, and generate human red team data that improves AI safety.
The work is entirely text-based and may involve reviewing AI outputs related to sensitive topics such as bias, misinformation, or harmful behaviors. Participation in higher-sensitivity projects is optional and supported by clear guidelines. Native-level fluency in both English and Arabic is required.
What you’ll actually be doing
- Red team conversational AI models and agents using jailbreaks, prompt injections, misuse cases, bias exploitation, and multi-turn manipulation
- Annotate failures, classify vulnerabilities, and flag systemic risks
- Follow taxonomies, benchmarks, and playbooks to ensure consistent testing
- Produce reproducible reports, datasets, and documented attack cases
Who this role is for
- Professionals with prior red teaming experience (AI adversarial work, cybersecurity, or socio-technical probing)
- Native-level bilinguals in English and Arabic
- Curious and adversarial thinkers who push systems to breaking points
- Structured practitioners who rely on frameworks and benchmarks
- Clear communicators who can explain risks to both technical and non-technical stakeholders
- Individuals who thrive working across different projects and customers
Who this role is likely NOT for
- Candidates without prior red teaming or adversarial experience
- Individuals who prefer unstructured experimentation over benchmark-driven testing
- Professionals seeking model-building or product development roles
- Candidates who are not fully fluent in both English and Arabic
Technical background
- Native-level fluency in English and Arabic (required)
- Prior experience in AI red teaming, adversarial AI work, cybersecurity, or socio-technical risk analysis
- Familiarity with jailbreak techniques, prompt injection, misuse-case testing, and multi-turn manipulation
- Experience working with taxonomies, benchmarks, or structured evaluation playbooks
Project scope
Success is measured by uncovering vulnerabilities automated tests miss, delivering reproducible artifacts, and expanding evaluation coverage
Text-based adversarial testing of conversational AI systems
Work may involve reviewing outputs related to bias, misinformation, or harmful behaviors
Participation in higher-sensitivity testing areas is optional and clearly communicated
