File size: 4,104 Bytes
7ca962c
42b116c
 
 
9a470a0
7ca962c
 
42b116c
9a470a0
42b116c
 
 
 
 
 
 
 
796d7d6
 
 
7ca962c
 
796d7d6
42b116c
796d7d6
42b116c
 
 
796d7d6
 
5fa85d7
 
796d7d6
6c2bd88
42b116c
796d7d6
42b116c
796d7d6
 
5fa85d7
 
 
42b116c
796d7d6
42b116c
796d7d6
97b35c5
5fa85d7
42b116c
6c2bd88
42b116c
796d7d6
97b35c5
5fa85d7
42b116c
6c2bd88
42b116c
796d7d6
97b35c5
5fa85d7
42b116c
6c2bd88
42b116c
796d7d6
42b116c
796d7d6
42b116c
796d7d6
 
 
 
 
42b116c
796d7d6
 
 
 
 
42b116c
 
 
796d7d6
42b116c
796d7d6
 
 
 
42b116c
 
 
 
 
 
 
 
 
 
 
 
796d7d6
42b116c
 
 
 
 
 
d7137b6
 
796d7d6
d7137b6
796d7d6
d7137b6
796d7d6
42b116c
 
 
796d7d6
42b116c
796d7d6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
---
title: AI Safety Datasets Overview
emoji: πŸ›‘οΈ
colorFrom: red
colorTo: pink
sdk: static
pinned: false
license: cc-by-nc-4.0
short_description: AI safety datasets with adversarial conversations
tags:
- safety
- adversarial
- red-teaming
- ai-safety
- multi-turn
- synthetic
datasets:
- GoJulyAI/multi-turn-conversations
- GoJulyAI/multi-turn-bio-transformed-synth-conversations-v1
- GoJulyAI/multi-turn-bio-transformed-synth-conversations-v2
---

# πŸ›‘οΈ AI Safety Datasets Collection

Comprehensive evaluation datasets for testing AI model safety mechanisms

## πŸ“Š Dataset Collection Summary

| Metric | Value |
|--------|-------|
| **Total Conversations** | 849+ |
| **Total Turns** | 6,694+ |
| **Dataset Types** | 3 complementary methodologies |
| **Sample Data Available** | 150 free conversations |

## πŸ“ˆ Full Dataset Statistics

| Dataset | Conversations | Turns | Avg Turns/Conv | Focus |
|---------|--------------|-------|----------------|--------|
| **Psychology multi-turn** | 184+ | 1,964+ | 10.3 | Psychology harmfulness such as self-harm, psychosis, anthropomorphism, etc. |
| **Illicit (bioweapon) multi-turn** | 84+ | 822+ | 9.8 | Bio-safety harmfulness such as bioweapons, pathogens, etc. |
| **Illicit (chemical, general) multi-turn** | 581+ | 3,908+ | 6.7 | Non-bio safety harmfulness such as chemical weapons, cyber threats, etc. |

## πŸ”— Access Datasets on Hugging Face

### Psychology Multi-turn Conversations
Psychology harmfulness such as self-harm, psychosis, anthropomorphism, etc.  
**Sample:** 5 conversations

πŸ”— **[View Dataset](https://huggingface.co/datasets/GoJulyAI/psychology-multi-turn)**

### Illicit (bioweapon) Multi-turn Conversations
Bio-safety harmfulness such as bioweapons, pathogens, etc.  
**Sample:** 5 conversations

πŸ”— **[View Dataset](https://huggingface.co/datasets/GoJulyAI/illicit-bio-multi-turn/)**

### Illicit (chemical, general) Multi-turn Conversations
Non-bio safety harmfulness such as chemical weapons, cyber threats, etc.  
**Sample:** 5 conversations

πŸ”— **[View Dataset](https://huggingface.co/datasets/GoJulyAI/illicit-general-multi-turn)**

## ⚠️ Ethical Considerations

**⚠️ IMPORTANT:** These datasets contain successful adversarial attacks and harmful content.

### βœ… Intended Use
- Defensive security research
- AI safety evaluation and improvement
- Academic research on adversarial robustness
- Training safety and moderation systems

### ❌ Prohibited Use
- Creating offensive content
- Developing attack tools for malicious purposes
- Bypassing safety systems for harm
- Any use that violates laws or ethical guidelines

## 🎯 Data Selection Process

All datasets are derived from high-quality, validated conversations with strict quality filters including NeurIPS evaluation protocols.

### Base Criteria
- Text-based objectives (no code execution templates)
- Verdict: `success` (harmful requests successfully fulfilled)
- Multi-turn conversations with prompt-response pairs

### Psychology-Specific Criteria
- Organic conversations (`organicity = true`)
- Successfully elicited harmful psychology-related content

### Illicit-Specific Criteria
- Contains specific instruction details
- Practically executable (not abstract)
- Successfully elicited harmful illicit-related content

## πŸ“„ License

Sample datasets are released under **CC-BY-NC-4.0** (Creative Commons Attribution-NonCommercial 4.0 International).

- βœ… Use for research and evaluation
- βœ… Modify and build upon the data
- βœ… Share with attribution
- ❌ Commercial use without separate licensing

## πŸ’Ό Full Dataset Access

The sample datasets provide representative examples. Full datasets contain thousands of additional conversations with expanded harm categories and regular updates.

**Please contact us at [info@gojuly.ai](mailto:info@gojuly.ai) to purchase any or all of full datasets.**

Include your research objectives, institutional affiliation, and intended use in your inquiry.

---

**Last Updated:** December 2, 2025

For detailed documentation, visit the individual dataset repositories on Hugging Face.