Spaces:
Running
Running
File size: 4,104 Bytes
7ca962c 42b116c 9a470a0 7ca962c 42b116c 9a470a0 42b116c 796d7d6 7ca962c 796d7d6 42b116c 796d7d6 42b116c 796d7d6 5fa85d7 796d7d6 6c2bd88 42b116c 796d7d6 42b116c 796d7d6 5fa85d7 42b116c 796d7d6 42b116c 796d7d6 97b35c5 5fa85d7 42b116c 6c2bd88 42b116c 796d7d6 97b35c5 5fa85d7 42b116c 6c2bd88 42b116c 796d7d6 97b35c5 5fa85d7 42b116c 6c2bd88 42b116c 796d7d6 42b116c 796d7d6 42b116c 796d7d6 42b116c 796d7d6 42b116c 796d7d6 42b116c 796d7d6 42b116c 796d7d6 42b116c d7137b6 796d7d6 d7137b6 796d7d6 d7137b6 796d7d6 42b116c 796d7d6 42b116c 796d7d6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
---
title: AI Safety Datasets Overview
emoji: π‘οΈ
colorFrom: red
colorTo: pink
sdk: static
pinned: false
license: cc-by-nc-4.0
short_description: AI safety datasets with adversarial conversations
tags:
- safety
- adversarial
- red-teaming
- ai-safety
- multi-turn
- synthetic
datasets:
- GoJulyAI/multi-turn-conversations
- GoJulyAI/multi-turn-bio-transformed-synth-conversations-v1
- GoJulyAI/multi-turn-bio-transformed-synth-conversations-v2
---
# π‘οΈ AI Safety Datasets Collection
Comprehensive evaluation datasets for testing AI model safety mechanisms
## π Dataset Collection Summary
| Metric | Value |
|--------|-------|
| **Total Conversations** | 849+ |
| **Total Turns** | 6,694+ |
| **Dataset Types** | 3 complementary methodologies |
| **Sample Data Available** | 150 free conversations |
## π Full Dataset Statistics
| Dataset | Conversations | Turns | Avg Turns/Conv | Focus |
|---------|--------------|-------|----------------|--------|
| **Psychology multi-turn** | 184+ | 1,964+ | 10.3 | Psychology harmfulness such as self-harm, psychosis, anthropomorphism, etc. |
| **Illicit (bioweapon) multi-turn** | 84+ | 822+ | 9.8 | Bio-safety harmfulness such as bioweapons, pathogens, etc. |
| **Illicit (chemical, general) multi-turn** | 581+ | 3,908+ | 6.7 | Non-bio safety harmfulness such as chemical weapons, cyber threats, etc. |
## π Access Datasets on Hugging Face
### Psychology Multi-turn Conversations
Psychology harmfulness such as self-harm, psychosis, anthropomorphism, etc.
**Sample:** 5 conversations
π **[View Dataset](https://huggingface.co/datasets/GoJulyAI/psychology-multi-turn)**
### Illicit (bioweapon) Multi-turn Conversations
Bio-safety harmfulness such as bioweapons, pathogens, etc.
**Sample:** 5 conversations
π **[View Dataset](https://huggingface.co/datasets/GoJulyAI/illicit-bio-multi-turn/)**
### Illicit (chemical, general) Multi-turn Conversations
Non-bio safety harmfulness such as chemical weapons, cyber threats, etc.
**Sample:** 5 conversations
π **[View Dataset](https://huggingface.co/datasets/GoJulyAI/illicit-general-multi-turn)**
## β οΈ Ethical Considerations
**β οΈ IMPORTANT:** These datasets contain successful adversarial attacks and harmful content.
### β
Intended Use
- Defensive security research
- AI safety evaluation and improvement
- Academic research on adversarial robustness
- Training safety and moderation systems
### β Prohibited Use
- Creating offensive content
- Developing attack tools for malicious purposes
- Bypassing safety systems for harm
- Any use that violates laws or ethical guidelines
## π― Data Selection Process
All datasets are derived from high-quality, validated conversations with strict quality filters including NeurIPS evaluation protocols.
### Base Criteria
- Text-based objectives (no code execution templates)
- Verdict: `success` (harmful requests successfully fulfilled)
- Multi-turn conversations with prompt-response pairs
### Psychology-Specific Criteria
- Organic conversations (`organicity = true`)
- Successfully elicited harmful psychology-related content
### Illicit-Specific Criteria
- Contains specific instruction details
- Practically executable (not abstract)
- Successfully elicited harmful illicit-related content
## π License
Sample datasets are released under **CC-BY-NC-4.0** (Creative Commons Attribution-NonCommercial 4.0 International).
- β
Use for research and evaluation
- β
Modify and build upon the data
- β
Share with attribution
- β Commercial use without separate licensing
## πΌ Full Dataset Access
The sample datasets provide representative examples. Full datasets contain thousands of additional conversations with expanded harm categories and regular updates.
**Please contact us at [info@gojuly.ai](mailto:info@gojuly.ai) to purchase any or all of full datasets.**
Include your research objectives, institutional affiliation, and intended use in your inquiry.
---
**Last Updated:** December 2, 2025
For detailed documentation, visit the individual dataset repositories on Hugging Face.
|