Role overview
This role focuses on adversarial testing of conversational AI models and agents. You will probe systems with jailbreaks, prompt injections, misuse cases, and bias exploitation to surface vulnerabilities and generate structured red team data. The work is text-based and involves reviewing AI outputs that may include sensitive topics such as bias, misinformation, or harmful behaviors.
This position is suited for experienced red teamers with strong English and Hindi fluency who are comfortable working within structured evaluation frameworks and documenting reproducible findings.
What you’ll actually be doing
- Red team conversational AI models and agents through jailbreaks, prompt injections, misuse cases, bias exploitation, and multi-turn manipulation
- Review AI outputs involving sensitive topics, following clearly communicated guidelines
- Annotate model failures, classify vulnerabilities, and flag systemic risks
- Follow defined taxonomies, benchmarks, and playbooks to ensure consistent testing
- Produce reproducible reports, datasets, and documented attack cases
Who this role is for
- Professionals with prior red teaming experience in AI adversarial work, cybersecurity, or socio-technical probing
- Individuals who instinctively push systems to breaking points
- Structured testers who use frameworks or benchmarks rather than ad hoc approaches
- Clear communicators who can explain risks to both technical and non-technical stakeholders
- Professionals comfortable moving across projects and customers
- Candidates with native-level fluency in English and Hindi
Who this role is likely NOT for
- Candidates without prior red teaming or closely related adversarial experience
- Individuals who are not comfortable reviewing AI outputs that may involve sensitive topics
- Candidates without native-level fluency in both English and Hindi
- Those who prefer unstructured or purely exploratory testing without defined frameworks
Technical background
- Prior red teaming experience in AI adversarial work, cybersecurity, or socio-technical risk domains
- Experience generating and annotating adversarial datasets
- Familiarity with structured evaluation approaches such as taxonomies and benchmarks
- Native-level fluency in English and Hindi
Nice-to-have specialties (not mandatory):
- Adversarial ML (e.g., jailbreak datasets, prompt injection, RLHF/DPO attacks, model extraction)
- Cybersecurity (e.g., penetration testing, exploit development, reverse engineering)
- Socio-technical risk analysis (e.g., harassment/disinformation probing, abuse analysis, conversational AI testing)
- Creative adversarial probing backgrounds (e.g., psychology, acting, writing)
Project scope
Success defined by uncovering vulnerabilities, delivering reproducible artifacts, and expanding evaluation coverage
Text-based adversarial testing of conversational AI systems
Review of AI outputs that may include higher-sensitivity content, with topics communicated in advance
Optional participation in higher-sensitivity projects
Work guided by structured taxonomies, benchmarks, and playbooks
