๐ Dataset Collection Summary
Total Conversations
Total Turns
Dataset Types
Sample Data
๐ Full Dataset Statistics
| Dataset | Conversations | Turns | Avg Turns/Conv | Focus |
|---|---|---|---|---|
| Original Multi-turn | 594+ | 4,642+ | 7.8 | Baseline organic conversations |
| โ Psychology | 158+ | 1,583+ | 10.0 | Psychology harm category |
| โ Illicit | 436+ | 3,059+ | 7.0 | Illicit harm category |
| Bio-transformed V1 | 1,309+ | 6,784+ | 5.2 | Direct bio-safety attacks |
| Bio-transformed V2 | 1,308+ | 8,127+ | 6.2 | Adaptive bio-safety attacks |
| Keyword-transformed | 7,110+ | 53,705+ | 7.6 | Cross-domain harm transfer |
๐ Access Datasets on Hugging Face
Original Multi-turn Conversations
Psychology + Illicit baseline conversations
Sample: 50 conversations, 390 turns
Bio-transformed Synthetic V1
Direct bio-topic transformation methodology
Sample: 50 conversations, 449 turns
Bio-transformed Synthetic V2
Adaptive bio-topic transformation methodology
Sample: 50 conversations, 459 turns
Keyword-transformed Synthetic
Cross-domain keyword substitution methodology
Sample: 50 conversations, 659 turns
๐งช Research Applications
Safety Evaluation
- Benchmark model safety
- Measure robustness
- Evaluate mechanisms
Red Teaming
- Discover adversarial patterns
- Test safety guardrails
- Identify blind spots
Model Training
- Fine-tune safety classifiers
- Train attack detectors
- Develop harm detection
Safety Research
- Study harm transfer
- Analyze attack patterns
- Understand dynamics
โ ๏ธ Ethical Considerations
โ ๏ธ IMPORTANT
These datasets contain successful adversarial attacks and harmful content.
โ Intended Use
- Defensive security research
- AI safety evaluation and improvement
- Academic research on adversarial robustness
- Training safety and moderation systems
โ Prohibited Use
- Creating offensive content
- Developing attack tools for malicious purposes
- Bypassing safety systems for harm
- Any use that violates laws or ethical guidelines
๐ฏ Data Selection Process
All datasets are derived from high-quality, validated conversations with strict quality filters including NeurIPS evaluation protocols.
Base Criteria
- Text-based objectives (no code execution templates)
- NeurIPS evaluation metadata present
- Verdict:
success(harmful requests successfully fulfilled) - Multi-turn conversations with prompt-response pairs
Psychology-Specific Criteria
- Organic conversations (
organicity = true) - No disclaimer in responses
- Successfully elicited harmful psychology-related content
Illicit-Specific Criteria
- Contains specific instruction details
- Practically executable (not abstract)
- Successfully elicited harmful illicit-related content
๐ License
All datasets are released under CC-BY-NC-4.0 (Creative Commons Attribution-NonCommercial 4.0 International).
- โ Use for research and evaluation
- โ Modify and build upon the data
- โ Share with attribution
- โ Commercial use without separate licensing
๐ผ Full Dataset Access
The sample datasets provide representative examples. Full datasets contain thousands of additional conversations with expanded harm categories and regular updates.
Please contact us at info@gojuly.ai to purchase any or all of full datasets.
Include your research objectives, institutional affiliation, and intended use in your inquiry.