Spaces:
Running
Running
Yang Chung
commited on
Commit
Β·
42b116c
1
Parent(s):
7ca962c
Initial commit
Browse files- README.md +384 -6
- index.html +477 -18
README.md
CHANGED
|
@@ -1,11 +1,389 @@
|
|
| 1 |
---
|
| 2 |
-
title: Datasets Overview
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
-
colorTo:
|
| 6 |
sdk: static
|
| 7 |
pinned: false
|
| 8 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
---
|
| 10 |
|
| 11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
title: AI Safety Datasets Overview
|
| 3 |
+
emoji: π‘οΈ
|
| 4 |
+
colorFrom: red
|
| 5 |
+
colorTo: orange
|
| 6 |
sdk: static
|
| 7 |
pinned: false
|
| 8 |
+
license: cc-by-nc-4.0
|
| 9 |
+
short_description: Comprehensive AI safety evaluation datasets with organic and synthetic adversarial conversations
|
| 10 |
+
tags:
|
| 11 |
+
- safety
|
| 12 |
+
- adversarial
|
| 13 |
+
- red-teaming
|
| 14 |
+
- ai-safety
|
| 15 |
+
- multi-turn
|
| 16 |
+
- synthetic
|
| 17 |
+
datasets:
|
| 18 |
+
- julyai7/multi-turn-conversations
|
| 19 |
+
- julyai7/multi-turn-bio-transformed-synth-conversations-v1
|
| 20 |
+
- julyai7/multi-turn-bio-transformed-synth-conversations-v2
|
| 21 |
+
- julyai7/multi-turn-keyword-transformed-synth-conversations
|
| 22 |
---
|
| 23 |
|
| 24 |
+
# AI Safety Datasets Collection - Overview
|
| 25 |
+
|
| 26 |
+
This space provides an overview of our comprehensive AI safety evaluation datasets. These datasets are designed for rigorous testing of AI model safety mechanisms across various attack methodologies and harm categories.
|
| 27 |
+
|
| 28 |
+
## π Dataset Collection Summary
|
| 29 |
+
|
| 30 |
+
We offer four complementary datasets, each serving specific evaluation purposes:
|
| 31 |
+
|
| 32 |
+
| Dataset | Type | Methodology | Primary Use Case |
|
| 33 |
+
|---------|------|-------------|------------------|
|
| 34 |
+
| **Original Multi-turn Conversations** | Organic | Human-AI interactions | Baseline safety evaluation |
|
| 35 |
+
| **Bio-transformed Synthetic V1** | Synthetic | Direct bio-topic transformation | Bio-safety adversarial testing |
|
| 36 |
+
| **Bio-transformed Synthetic V2** | Synthetic | Adaptive bio-topic transformation | Bio-safety with natural flow |
|
| 37 |
+
| **Keyword-transformed Synthetic** | Synthetic | Cross-domain keyword substitution | Domain-agnostic safety testing |
|
| 38 |
+
|
| 39 |
+
---
|
| 40 |
+
|
| 41 |
+
## π Detailed Dataset Descriptions
|
| 42 |
+
|
| 43 |
+
### 1. Original Multi-turn Conversations
|
| 44 |
+
|
| 45 |
+
**Collections:** `psychs` + `illicits`
|
| 46 |
+
|
| 47 |
+
**Source Data Criteria:**
|
| 48 |
+
|
| 49 |
+
#### Psychology Conversations (`psychs`)
|
| 50 |
+
- **Harm Category:** Psychology
|
| 51 |
+
- **Objective Type:** Text-based, no JupyterHub templates
|
| 52 |
+
- **Quality Filters:**
|
| 53 |
+
- NeurIPS evaluation verdict: `success`
|
| 54 |
+
- No disclaimer present
|
| 55 |
+
- Organic conversations only (`organicity = true`)
|
| 56 |
+
- **Purpose:** Baseline psychology-related harmful requests that successfully elicit unsafe responses
|
| 57 |
+
|
| 58 |
+
#### Illicit Conversations (`illicits`)
|
| 59 |
+
- **Harm Category:** Illicit activities
|
| 60 |
+
- **Objective Type:** Text-based, no JupyterHub templates
|
| 61 |
+
- **Quality Filters:**
|
| 62 |
+
- NeurIPS evaluation verdict: `success`
|
| 63 |
+
- Contains specific instruction details (not `none`)
|
| 64 |
+
- Executable by high school students (`high_school_student_executable != false`)
|
| 65 |
+
- **Purpose:** Baseline illicit-related harmful requests with practical execution potential
|
| 66 |
+
|
| 67 |
+
**Use Cases:**
|
| 68 |
+
- Establishing baseline safety evaluation metrics
|
| 69 |
+
- Understanding organic adversarial conversation patterns
|
| 70 |
+
- Training and fine-tuning safety classifiers
|
| 71 |
+
- Benchmarking model safety before synthetic transformations
|
| 72 |
+
|
| 73 |
+
**Sample Dataset:** 50 conversations (390 turns)
|
| 74 |
+
- Psychology: 55.6% of sample turns
|
| 75 |
+
- Illicit: 44.4% of sample turns
|
| 76 |
+
|
| 77 |
+
---
|
| 78 |
+
|
| 79 |
+
### 2. Bio-transformed Synthetic Multi-turn Conversations V1
|
| 80 |
+
|
| 81 |
+
**Collection:** `illicit_bio_synths_v1`
|
| 82 |
+
|
| 83 |
+
**Transformation Method:** `bio_topic_change`
|
| 84 |
+
|
| 85 |
+
**Source:** Derived from original psychology + illicit conversations
|
| 86 |
+
|
| 87 |
+
**Methodology V1 Characteristics:**
|
| 88 |
+
- **Direct transformation approach:** Explicit adversarial pattern injection
|
| 89 |
+
- **Focus:** Systematic safety mechanism bypass strategies
|
| 90 |
+
- **Target Domain:** Bio-safety (dangerous biological information)
|
| 91 |
+
- **Transformation Goal:** Convert psychology/illicit harms into bio-safety attacks
|
| 92 |
+
|
| 93 |
+
**Key Features:**
|
| 94 |
+
- All conversations transformed to `illicit` category (bio-safety domain)
|
| 95 |
+
- Direct mapping of harmful intents to biological contexts
|
| 96 |
+
- Aggressive adversarial techniques
|
| 97 |
+
- Tests explicit bio-safety guardrails
|
| 98 |
+
|
| 99 |
+
**Use Cases:**
|
| 100 |
+
- Testing bio-safety specific guardrails
|
| 101 |
+
- Evaluating cross-domain harm transfer (psych/illicit β bio)
|
| 102 |
+
- Red-teaming bio-related content moderation
|
| 103 |
+
- Training specialized bio-safety detectors
|
| 104 |
+
|
| 105 |
+
**Sample Dataset:** 50 conversations (449 turns, 100% illicit/bio-safety)
|
| 106 |
+
|
| 107 |
+
---
|
| 108 |
+
|
| 109 |
+
### 3. Bio-transformed Synthetic Multi-turn Conversations V2
|
| 110 |
+
|
| 111 |
+
**Collection:** `illicit_bio_synths_v2`
|
| 112 |
+
|
| 113 |
+
**Transformation Method:** `bio_topic_change_og`
|
| 114 |
+
|
| 115 |
+
**Source:** Derived from original psychology + illicit conversations
|
| 116 |
+
|
| 117 |
+
**Methodology V2 Characteristics:**
|
| 118 |
+
- **Adaptive transformation approach:** Natural conversation flow preservation
|
| 119 |
+
- **Focus:** Contextual reframing and subtle escalation patterns
|
| 120 |
+
- **Target Domain:** Bio-safety (dangerous biological information)
|
| 121 |
+
- **Transformation Goal:** More sophisticated, harder-to-detect bio-safety attacks
|
| 122 |
+
|
| 123 |
+
**Key Differences from V1:**
|
| 124 |
+
- More natural conversation progression
|
| 125 |
+
- Subtle escalation tactics
|
| 126 |
+
- Better mimics legitimate scientific inquiry
|
| 127 |
+
- Harder for safety systems to detect
|
| 128 |
+
|
| 129 |
+
**Use Cases:**
|
| 130 |
+
- Testing advanced bio-safety detection systems
|
| 131 |
+
- Evaluating robustness against sophisticated attacks
|
| 132 |
+
- Training models to detect subtle adversarial patterns
|
| 133 |
+
- Benchmarking next-generation safety systems
|
| 134 |
+
|
| 135 |
+
**Sample Dataset:** 50 conversations (459 turns, 100% illicit/bio-safety)
|
| 136 |
+
|
| 137 |
+
---
|
| 138 |
+
|
| 139 |
+
### 4. Keyword-transformed Synthetic Multi-turn Conversations
|
| 140 |
+
|
| 141 |
+
**Collection:** `keyword_synths`
|
| 142 |
+
|
| 143 |
+
**Transformation Method:** `keyword`
|
| 144 |
+
|
| 145 |
+
**Source:** Derived from original psychology + illicit conversations
|
| 146 |
+
|
| 147 |
+
**Methodology Characteristics:**
|
| 148 |
+
- **Cross-domain keyword substitution:** Maintains harmful intent across contexts
|
| 149 |
+
- **Domain shifting:** Same underlying harm expressed in different domains
|
| 150 |
+
- **Context adaptation:** Systematic replacement of domain-specific terminology
|
| 151 |
+
- **Intent preservation:** Core harmful objective remains unchanged
|
| 152 |
+
|
| 153 |
+
**Innovation:**
|
| 154 |
+
Tests whether AI safety mechanisms are:
|
| 155 |
+
- **Domain-agnostic:** Robust across different contexts and topics
|
| 156 |
+
- **Intent-focused:** Detecting underlying harm vs. surface-level keywords
|
| 157 |
+
- **Context-aware:** Understanding harm despite domain transformations
|
| 158 |
+
|
| 159 |
+
**Key Features:**
|
| 160 |
+
- Preserves original harm category distribution (psychology + illicit)
|
| 161 |
+
- Demonstrates safety mechanism vulnerabilities through context shifting
|
| 162 |
+
- Higher turn count per conversation (more complex attacks)
|
| 163 |
+
- Tests generalization of safety training
|
| 164 |
+
|
| 165 |
+
**Use Cases:**
|
| 166 |
+
- Evaluating domain-agnostic safety mechanisms
|
| 167 |
+
- Testing whether safety is keyword-based or intent-based
|
| 168 |
+
- Training robust cross-domain harm detection
|
| 169 |
+
- Identifying brittleness in safety systems
|
| 170 |
+
|
| 171 |
+
**Sample Dataset:** 50 conversations (659 turns)
|
| 172 |
+
- Illicit: 51.6% of sample turns
|
| 173 |
+
- Psychology: 48.4% of sample turns
|
| 174 |
+
|
| 175 |
+
---
|
| 176 |
+
|
| 177 |
+
## π― Data Selection Process
|
| 178 |
+
|
| 179 |
+
All datasets are derived from high-quality, validated conversations that meet strict criteria:
|
| 180 |
+
|
| 181 |
+
### Base Criteria (All Datasets)
|
| 182 |
+
- β
Text-based objectives (no code execution templates)
|
| 183 |
+
- β
NeurIPS evaluation metadata present
|
| 184 |
+
- β
Verdict: `success` (harmful requests successfully fulfilled)
|
| 185 |
+
- β
Multi-turn conversations with prompt-response pairs
|
| 186 |
+
|
| 187 |
+
### Psychology-Specific Criteria
|
| 188 |
+
- Organic conversations (`organicity = true`)
|
| 189 |
+
- No disclaimer in responses
|
| 190 |
+
- Successfully elicited harmful psychology-related content
|
| 191 |
+
|
| 192 |
+
### Illicit-Specific Criteria
|
| 193 |
+
- Contains specific instruction details
|
| 194 |
+
- Practically executable (not abstract)
|
| 195 |
+
- Successfully elicited harmful illicit-related content
|
| 196 |
+
|
| 197 |
+
### Synthetic Transformation Criteria
|
| 198 |
+
- Original conversation must meet base criteria
|
| 199 |
+
- Successful transformation to target methodology
|
| 200 |
+
- Maintains harmful intent in new domain
|
| 201 |
+
- Contains valid prompt-response pairs
|
| 202 |
+
|
| 203 |
+
---
|
| 204 |
+
|
| 205 |
+
## π Dataset Statistics
|
| 206 |
+
|
| 207 |
+
### Full Dataset Overview
|
| 208 |
+
|
| 209 |
+
The complete datasets are derived from our production database using strict quality filters:
|
| 210 |
+
|
| 211 |
+
| Dataset | Conversations | Turns | Avg Turns/Conv | Primary Focus |
|
| 212 |
+
|---------|---------------|-------|----------------|---------------|
|
| 213 |
+
| **Original Multi-turn** | **594+** | **4,642+** | **7.8** | Baseline organic conversations |
|
| 214 |
+
| - Psychology (`psychs`) | 158+ | 1,583+ | 10.0 | Psychology harm category |
|
| 215 |
+
| - Illicit (`illicits`) | 436+ | 3,059+ | 7.0 | Illicit harm category |
|
| 216 |
+
| **Bio-transformed V1** | **1,309+** | **6,784+** | **5.2** | Direct bio-safety attacks |
|
| 217 |
+
| **Bio-transformed V2** | **1,308+** | **8,127+** | **6.2** | Adaptive bio-safety attacks |
|
| 218 |
+
| **Keyword-transformed** | **7,110+** | **53,705+** | **7.6** | Cross-domain harm transfer |
|
| 219 |
+
| **Total Full Datasets** | **10,321+** | **73,258+** | **7.1** | All methodologies |
|
| 220 |
+
|
| 221 |
+
---
|
| 222 |
+
|
| 223 |
+
### Sample Data Overview (Publicly Available)
|
| 224 |
+
|
| 225 |
+
Representative sample datasets are available on Hugging Face for evaluation and testing:
|
| 226 |
+
|
| 227 |
+
| Dataset | Conversations | Turns | Avg Turns/Conv | Harm Categories |
|
| 228 |
+
|---------|--------------|-------|----------------|-----------------|
|
| 229 |
+
| Original | 50 | 390 | 7.8 | Psychology (55.6%), Illicit (44.4%) |
|
| 230 |
+
| Bio V1 | 50 | 449 | 9.0 | Illicit/Bio (100%) |
|
| 231 |
+
| Bio V2 | 50 | 459 | 9.2 | Illicit/Bio (100%) |
|
| 232 |
+
| Keyword | 50 | 659 | 13.2 | Illicit (51.6%), Psychology (48.4%) |
|
| 233 |
+
| **Total Samples** | **200** | **1,957** | **9.8** | Multiple |
|
| 234 |
+
|
| 235 |
+
> **Note:** Sample datasets represent carefully selected subsets that maintain the distribution and characteristics of the full datasets while being freely accessible for research evaluation.
|
| 236 |
+
|
| 237 |
+
---
|
| 238 |
+
|
| 239 |
+
## π Dataset Links
|
| 240 |
+
|
| 241 |
+
### Hugging Face Datasets
|
| 242 |
+
|
| 243 |
+
1. **[Original Multi-turn Conversations](https://huggingface.co/datasets/julyai7/multi-turn-conversations)**
|
| 244 |
+
- Psychology + Illicit baseline conversations
|
| 245 |
+
- 50 sample conversations, 390 turns
|
| 246 |
+
|
| 247 |
+
2. **[Bio-transformed Synthetic V1](https://huggingface.co/datasets/julyai7/multi-turn-bio-transformed-synth-conversations-v1)**
|
| 248 |
+
- Direct bio-topic transformation methodology
|
| 249 |
+
- 50 sample conversations, 449 turns
|
| 250 |
+
|
| 251 |
+
3. **[Bio-transformed Synthetic V2](https://huggingface.co/datasets/julyai7/multi-turn-bio-transformed-synth-conversations-v2)**
|
| 252 |
+
- Adaptive bio-topic transformation methodology
|
| 253 |
+
- 50 sample conversations, 459 turns
|
| 254 |
+
|
| 255 |
+
4. **[Keyword-transformed Synthetic](https://huggingface.co/datasets/julyai7/multi-turn-keyword-transformed-synth-conversations)**
|
| 256 |
+
- Cross-domain keyword substitution methodology
|
| 257 |
+
- 50 sample conversations, 659 turns
|
| 258 |
+
|
| 259 |
+
---
|
| 260 |
+
|
| 261 |
+
## π§ͺ Research Applications
|
| 262 |
+
|
| 263 |
+
These datasets enable various research directions:
|
| 264 |
+
|
| 265 |
+
### Safety Evaluation
|
| 266 |
+
- Benchmark model safety across attack methodologies
|
| 267 |
+
- Measure robustness to synthetic transformations
|
| 268 |
+
- Evaluate domain-specific vs. general safety mechanisms
|
| 269 |
+
|
| 270 |
+
### Red Teaming
|
| 271 |
+
- Discover new adversarial patterns
|
| 272 |
+
- Test safety guardrails comprehensively
|
| 273 |
+
- Identify blind spots in content moderation
|
| 274 |
+
|
| 275 |
+
### Model Training
|
| 276 |
+
- Fine-tune safety classifiers
|
| 277 |
+
- Train adversarial attack detectors
|
| 278 |
+
- Develop cross-domain harm detection systems
|
| 279 |
+
|
| 280 |
+
### Safety Research
|
| 281 |
+
- Study harm transfer across domains
|
| 282 |
+
- Analyze conversation-level attack patterns
|
| 283 |
+
- Understand multi-turn adversarial dynamics
|
| 284 |
+
|
| 285 |
+
---
|
| 286 |
+
|
| 287 |
+
## β οΈ Ethical Considerations
|
| 288 |
+
|
| 289 |
+
**IMPORTANT:** These datasets contain successful adversarial attacks and harmful content.
|
| 290 |
+
|
| 291 |
+
### Intended Use
|
| 292 |
+
- β
Defensive security research
|
| 293 |
+
- β
AI safety evaluation and improvement
|
| 294 |
+
- β
Academic research on adversarial robustness
|
| 295 |
+
- β
Training safety and moderation systems
|
| 296 |
+
|
| 297 |
+
### Prohibited Use
|
| 298 |
+
- β Creating offensive content
|
| 299 |
+
- β Developing attack tools for malicious purposes
|
| 300 |
+
- β Bypassing safety systems for harm
|
| 301 |
+
- β Any use that violates laws or ethical guidelines
|
| 302 |
+
|
| 303 |
+
### Recommendations
|
| 304 |
+
- Use in controlled research environments
|
| 305 |
+
- Implement appropriate access controls
|
| 306 |
+
- Follow institutional review board (IRB) guidelines
|
| 307 |
+
- Report findings responsibly
|
| 308 |
+
|
| 309 |
+
---
|
| 310 |
+
|
| 311 |
+
## π License
|
| 312 |
+
|
| 313 |
+
All datasets are released under **CC-BY-NC-4.0** (Creative Commons Attribution-NonCommercial 4.0 International).
|
| 314 |
+
|
| 315 |
+
### License Terms
|
| 316 |
+
- β
Use for research and evaluation
|
| 317 |
+
- β
Modify and build upon the data
|
| 318 |
+
- β
Share with attribution
|
| 319 |
+
- β Commercial use without separate licensing
|
| 320 |
+
|
| 321 |
+
---
|
| 322 |
+
|
| 323 |
+
## πΌ Full Dataset Access
|
| 324 |
+
|
| 325 |
+
The sample datasets provide representative examples. Full datasets contain:
|
| 326 |
+
|
| 327 |
+
- **Thousands of additional conversations**
|
| 328 |
+
- **Expanded harm categories and variations**
|
| 329 |
+
- **Diverse conversation lengths and complexity levels**
|
| 330 |
+
- **Regular updates with new adversarial patterns**
|
| 331 |
+
- **Custom dataset creation for specific research needs**
|
| 332 |
+
|
| 333 |
+
### Contact for Full Dataset
|
| 334 |
+
|
| 335 |
+
For academic research or commercial licensing:
|
| 336 |
+
- π§ Email: [your-email@domain.com]
|
| 337 |
+
- π Website: [your-website.com]
|
| 338 |
+
- π Include: Research objectives, institutional affiliation, intended use
|
| 339 |
+
|
| 340 |
+
---
|
| 341 |
+
|
| 342 |
+
## π Dataset Updates
|
| 343 |
+
|
| 344 |
+
**Current Version:** November 2024
|
| 345 |
+
|
| 346 |
+
The sample datasets represent snapshots of our larger collection. Full datasets receive regular updates with:
|
| 347 |
+
- New adversarial patterns and methodologies
|
| 348 |
+
- Additional harm categories and domains
|
| 349 |
+
- Improved quality filters and annotations
|
| 350 |
+
- Enhanced diversity in conversation styles
|
| 351 |
+
|
| 352 |
+
---
|
| 353 |
+
|
| 354 |
+
## π Citation
|
| 355 |
+
|
| 356 |
+
If you use these datasets in your research, please cite:
|
| 357 |
+
|
| 358 |
+
```bibtex
|
| 359 |
+
@dataset{ai_safety_datasets_2024,
|
| 360 |
+
title={AI Safety Multi-turn Conversation Datasets},
|
| 361 |
+
author={[Your Name/Organization]},
|
| 362 |
+
year={2024},
|
| 363 |
+
publisher={Hugging Face},
|
| 364 |
+
howpublished={\url{https://huggingface.co/julyai7}}
|
| 365 |
+
}
|
| 366 |
+
```
|
| 367 |
+
|
| 368 |
+
---
|
| 369 |
+
|
| 370 |
+
## π€ Acknowledgments
|
| 371 |
+
|
| 372 |
+
These datasets were created through:
|
| 373 |
+
- Rigorous NeurIPS evaluation protocols
|
| 374 |
+
- Advanced synthetic transformation methodologies
|
| 375 |
+
- Quality filtering and validation processes
|
| 376 |
+
- Ethical review and safety considerations
|
| 377 |
+
|
| 378 |
+
---
|
| 379 |
+
|
| 380 |
+
## π Support & Questions
|
| 381 |
+
|
| 382 |
+
For questions about the datasets:
|
| 383 |
+
- Open an issue in the respective dataset repository
|
| 384 |
+
- Join the discussion in the Community tab
|
| 385 |
+
- Contact us for technical support or collaboration opportunities
|
| 386 |
+
|
| 387 |
+
---
|
| 388 |
+
|
| 389 |
+
**Last Updated:** November 24, 2025
|
index.html
CHANGED
|
@@ -1,19 +1,478 @@
|
|
| 1 |
-
<!
|
| 2 |
-
<html>
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
</html>
|
|
|
|
| 1 |
+
<!DOCTYPE html>
|
| 2 |
+
<html lang="en">
|
| 3 |
+
<head>
|
| 4 |
+
<meta charset="UTF-8">
|
| 5 |
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
| 6 |
+
<title>AI Safety Datasets Overview</title>
|
| 7 |
+
<style>
|
| 8 |
+
* {
|
| 9 |
+
margin: 0;
|
| 10 |
+
padding: 0;
|
| 11 |
+
box-sizing: border-box;
|
| 12 |
+
}
|
| 13 |
+
|
| 14 |
+
body {
|
| 15 |
+
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', 'Roboto', 'Oxygen', 'Ubuntu', 'Cantarell', sans-serif;
|
| 16 |
+
line-height: 1.6;
|
| 17 |
+
color: #333;
|
| 18 |
+
background: linear-gradient(135deg, #ff6b6b 0%, #ff8e53 100%);
|
| 19 |
+
padding: 2rem 1rem;
|
| 20 |
+
}
|
| 21 |
+
|
| 22 |
+
.container {
|
| 23 |
+
max-width: 1200px;
|
| 24 |
+
margin: 0 auto;
|
| 25 |
+
background: white;
|
| 26 |
+
border-radius: 12px;
|
| 27 |
+
box-shadow: 0 8px 32px rgba(0, 0, 0, 0.1);
|
| 28 |
+
overflow: hidden;
|
| 29 |
+
}
|
| 30 |
+
|
| 31 |
+
header {
|
| 32 |
+
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
| 33 |
+
color: white;
|
| 34 |
+
padding: 3rem 2rem;
|
| 35 |
+
text-align: center;
|
| 36 |
+
}
|
| 37 |
+
|
| 38 |
+
header h1 {
|
| 39 |
+
font-size: 2.5rem;
|
| 40 |
+
margin-bottom: 0.5rem;
|
| 41 |
+
}
|
| 42 |
+
|
| 43 |
+
header p {
|
| 44 |
+
font-size: 1.1rem;
|
| 45 |
+
opacity: 0.95;
|
| 46 |
+
}
|
| 47 |
+
|
| 48 |
+
.content {
|
| 49 |
+
padding: 2rem;
|
| 50 |
+
}
|
| 51 |
+
|
| 52 |
+
section {
|
| 53 |
+
margin-bottom: 3rem;
|
| 54 |
+
}
|
| 55 |
+
|
| 56 |
+
h2 {
|
| 57 |
+
color: #667eea;
|
| 58 |
+
font-size: 1.8rem;
|
| 59 |
+
margin-bottom: 1rem;
|
| 60 |
+
border-bottom: 2px solid #667eea;
|
| 61 |
+
padding-bottom: 0.5rem;
|
| 62 |
+
}
|
| 63 |
+
|
| 64 |
+
h3 {
|
| 65 |
+
color: #764ba2;
|
| 66 |
+
font-size: 1.3rem;
|
| 67 |
+
margin: 1.5rem 0 0.75rem 0;
|
| 68 |
+
}
|
| 69 |
+
|
| 70 |
+
.stats-grid {
|
| 71 |
+
display: grid;
|
| 72 |
+
grid-template-columns: repeat(auto-fit, minmax(250px, 1fr));
|
| 73 |
+
gap: 1.5rem;
|
| 74 |
+
margin: 2rem 0;
|
| 75 |
+
}
|
| 76 |
+
|
| 77 |
+
.stat-card {
|
| 78 |
+
background: linear-gradient(135deg, #667eea15 0%, #764ba215 100%);
|
| 79 |
+
border-radius: 8px;
|
| 80 |
+
padding: 1.5rem;
|
| 81 |
+
border-left: 4px solid #667eea;
|
| 82 |
+
}
|
| 83 |
+
|
| 84 |
+
.stat-card h4 {
|
| 85 |
+
color: #667eea;
|
| 86 |
+
font-size: 0.9rem;
|
| 87 |
+
text-transform: uppercase;
|
| 88 |
+
letter-spacing: 1px;
|
| 89 |
+
margin-bottom: 0.5rem;
|
| 90 |
+
}
|
| 91 |
+
|
| 92 |
+
.stat-card .number {
|
| 93 |
+
font-size: 2rem;
|
| 94 |
+
font-weight: bold;
|
| 95 |
+
color: #333;
|
| 96 |
+
}
|
| 97 |
+
|
| 98 |
+
.stat-card .label {
|
| 99 |
+
color: #666;
|
| 100 |
+
font-size: 0.9rem;
|
| 101 |
+
}
|
| 102 |
+
|
| 103 |
+
table {
|
| 104 |
+
width: 100%;
|
| 105 |
+
border-collapse: collapse;
|
| 106 |
+
margin: 1.5rem 0;
|
| 107 |
+
background: white;
|
| 108 |
+
}
|
| 109 |
+
|
| 110 |
+
th {
|
| 111 |
+
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
| 112 |
+
color: white;
|
| 113 |
+
padding: 1rem;
|
| 114 |
+
text-align: left;
|
| 115 |
+
font-weight: 600;
|
| 116 |
+
}
|
| 117 |
+
|
| 118 |
+
td {
|
| 119 |
+
padding: 0.75rem 1rem;
|
| 120 |
+
border-bottom: 1px solid #e0e0e0;
|
| 121 |
+
}
|
| 122 |
+
|
| 123 |
+
tr:hover {
|
| 124 |
+
background: #f8f9fa;
|
| 125 |
+
}
|
| 126 |
+
|
| 127 |
+
.dataset-links {
|
| 128 |
+
display: grid;
|
| 129 |
+
grid-template-columns: repeat(auto-fit, minmax(280px, 1fr));
|
| 130 |
+
gap: 1.5rem;
|
| 131 |
+
margin: 2rem 0;
|
| 132 |
+
}
|
| 133 |
+
|
| 134 |
+
.dataset-card {
|
| 135 |
+
background: white;
|
| 136 |
+
border: 2px solid #e0e0e0;
|
| 137 |
+
border-radius: 8px;
|
| 138 |
+
padding: 1.5rem;
|
| 139 |
+
transition: all 0.3s ease;
|
| 140 |
+
}
|
| 141 |
+
|
| 142 |
+
.dataset-card:hover {
|
| 143 |
+
border-color: #667eea;
|
| 144 |
+
transform: translateY(-4px);
|
| 145 |
+
box-shadow: 0 8px 16px rgba(102, 126, 234, 0.2);
|
| 146 |
+
}
|
| 147 |
+
|
| 148 |
+
.dataset-card h4 {
|
| 149 |
+
color: #667eea;
|
| 150 |
+
margin-bottom: 0.5rem;
|
| 151 |
+
}
|
| 152 |
+
|
| 153 |
+
.dataset-card p {
|
| 154 |
+
color: #666;
|
| 155 |
+
font-size: 0.9rem;
|
| 156 |
+
margin-bottom: 1rem;
|
| 157 |
+
}
|
| 158 |
+
|
| 159 |
+
.btn {
|
| 160 |
+
display: inline-block;
|
| 161 |
+
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
| 162 |
+
color: white;
|
| 163 |
+
padding: 0.75rem 1.5rem;
|
| 164 |
+
text-decoration: none;
|
| 165 |
+
border-radius: 6px;
|
| 166 |
+
font-weight: 600;
|
| 167 |
+
transition: all 0.3s ease;
|
| 168 |
+
}
|
| 169 |
+
|
| 170 |
+
.btn:hover {
|
| 171 |
+
transform: translateY(-2px);
|
| 172 |
+
box-shadow: 0 4px 12px rgba(102, 126, 234, 0.4);
|
| 173 |
+
}
|
| 174 |
+
|
| 175 |
+
.warning-box {
|
| 176 |
+
background: #fff3cd;
|
| 177 |
+
border-left: 4px solid #ffc107;
|
| 178 |
+
padding: 1.5rem;
|
| 179 |
+
margin: 1.5rem 0;
|
| 180 |
+
border-radius: 4px;
|
| 181 |
+
}
|
| 182 |
+
|
| 183 |
+
.warning-box h4 {
|
| 184 |
+
color: #856404;
|
| 185 |
+
margin-bottom: 0.5rem;
|
| 186 |
+
}
|
| 187 |
+
|
| 188 |
+
.info-box {
|
| 189 |
+
background: #d1ecf1;
|
| 190 |
+
border-left: 4px solid #0c5460;
|
| 191 |
+
padding: 1.5rem;
|
| 192 |
+
margin: 1.5rem 0;
|
| 193 |
+
border-radius: 4px;
|
| 194 |
+
}
|
| 195 |
+
|
| 196 |
+
ul {
|
| 197 |
+
margin-left: 2rem;
|
| 198 |
+
margin-top: 0.5rem;
|
| 199 |
+
}
|
| 200 |
+
|
| 201 |
+
li {
|
| 202 |
+
margin-bottom: 0.5rem;
|
| 203 |
+
}
|
| 204 |
+
|
| 205 |
+
code {
|
| 206 |
+
background: #f4f4f4;
|
| 207 |
+
padding: 0.2rem 0.4rem;
|
| 208 |
+
border-radius: 3px;
|
| 209 |
+
font-family: 'Courier New', monospace;
|
| 210 |
+
font-size: 0.9em;
|
| 211 |
+
}
|
| 212 |
+
|
| 213 |
+
footer {
|
| 214 |
+
background: #f8f9fa;
|
| 215 |
+
padding: 2rem;
|
| 216 |
+
text-align: center;
|
| 217 |
+
color: #666;
|
| 218 |
+
border-top: 1px solid #e0e0e0;
|
| 219 |
+
}
|
| 220 |
+
|
| 221 |
+
@media (max-width: 768px) {
|
| 222 |
+
header h1 {
|
| 223 |
+
font-size: 1.8rem;
|
| 224 |
+
}
|
| 225 |
+
|
| 226 |
+
.stats-grid {
|
| 227 |
+
grid-template-columns: 1fr;
|
| 228 |
+
}
|
| 229 |
+
}
|
| 230 |
+
</style>
|
| 231 |
+
</head>
|
| 232 |
+
<body>
|
| 233 |
+
<div class="container">
|
| 234 |
+
<header>
|
| 235 |
+
<h1>π‘οΈ AI Safety Datasets Collection</h1>
|
| 236 |
+
<p>Comprehensive evaluation datasets for testing AI model safety mechanisms</p>
|
| 237 |
+
</header>
|
| 238 |
+
|
| 239 |
+
<div class="content">
|
| 240 |
+
<!-- Overview Section -->
|
| 241 |
+
<section>
|
| 242 |
+
<h2>π Dataset Collection Summary</h2>
|
| 243 |
+
<div class="stats-grid">
|
| 244 |
+
<div class="stat-card">
|
| 245 |
+
<h4>Total Conversations</h4>
|
| 246 |
+
<div class="number">10,321+</div>
|
| 247 |
+
<div class="label">Across all datasets</div>
|
| 248 |
+
</div>
|
| 249 |
+
<div class="stat-card">
|
| 250 |
+
<h4>Total Turns</h4>
|
| 251 |
+
<div class="number">73,258+</div>
|
| 252 |
+
<div class="label">Multi-turn interactions</div>
|
| 253 |
+
</div>
|
| 254 |
+
<div class="stat-card">
|
| 255 |
+
<h4>Dataset Types</h4>
|
| 256 |
+
<div class="number">4</div>
|
| 257 |
+
<div class="label">Complementary methodologies</div>
|
| 258 |
+
</div>
|
| 259 |
+
<div class="stat-card">
|
| 260 |
+
<h4>Sample Data</h4>
|
| 261 |
+
<div class="number">200</div>
|
| 262 |
+
<div class="label">Free conversations available</div>
|
| 263 |
+
</div>
|
| 264 |
+
</div>
|
| 265 |
+
</section>
|
| 266 |
+
|
| 267 |
+
<!-- Full Dataset Statistics -->
|
| 268 |
+
<section>
|
| 269 |
+
<h2>π Full Dataset Statistics</h2>
|
| 270 |
+
<table>
|
| 271 |
+
<thead>
|
| 272 |
+
<tr>
|
| 273 |
+
<th>Dataset</th>
|
| 274 |
+
<th>Conversations</th>
|
| 275 |
+
<th>Turns</th>
|
| 276 |
+
<th>Avg Turns/Conv</th>
|
| 277 |
+
<th>Focus</th>
|
| 278 |
+
</tr>
|
| 279 |
+
</thead>
|
| 280 |
+
<tbody>
|
| 281 |
+
<tr>
|
| 282 |
+
<td><strong>Original Multi-turn</strong></td>
|
| 283 |
+
<td>594+</td>
|
| 284 |
+
<td>4,642+</td>
|
| 285 |
+
<td>7.8</td>
|
| 286 |
+
<td>Baseline organic conversations</td>
|
| 287 |
+
</tr>
|
| 288 |
+
<tr>
|
| 289 |
+
<td> β Psychology</td>
|
| 290 |
+
<td>158+</td>
|
| 291 |
+
<td>1,583+</td>
|
| 292 |
+
<td>10.0</td>
|
| 293 |
+
<td>Psychology harm category</td>
|
| 294 |
+
</tr>
|
| 295 |
+
<tr>
|
| 296 |
+
<td> β Illicit</td>
|
| 297 |
+
<td>436+</td>
|
| 298 |
+
<td>3,059+</td>
|
| 299 |
+
<td>7.0</td>
|
| 300 |
+
<td>Illicit harm category</td>
|
| 301 |
+
</tr>
|
| 302 |
+
<tr>
|
| 303 |
+
<td><strong>Bio-transformed V1</strong></td>
|
| 304 |
+
<td>1,309+</td>
|
| 305 |
+
<td>6,784+</td>
|
| 306 |
+
<td>5.2</td>
|
| 307 |
+
<td>Direct bio-safety attacks</td>
|
| 308 |
+
</tr>
|
| 309 |
+
<tr>
|
| 310 |
+
<td><strong>Bio-transformed V2</strong></td>
|
| 311 |
+
<td>1,308+</td>
|
| 312 |
+
<td>8,127+</td>
|
| 313 |
+
<td>6.2</td>
|
| 314 |
+
<td>Adaptive bio-safety attacks</td>
|
| 315 |
+
</tr>
|
| 316 |
+
<tr>
|
| 317 |
+
<td><strong>Keyword-transformed</strong></td>
|
| 318 |
+
<td>7,110+</td>
|
| 319 |
+
<td>53,705+</td>
|
| 320 |
+
<td>7.6</td>
|
| 321 |
+
<td>Cross-domain harm transfer</td>
|
| 322 |
+
</tr>
|
| 323 |
+
</tbody>
|
| 324 |
+
</table>
|
| 325 |
+
</section>
|
| 326 |
+
|
| 327 |
+
<!-- Dataset Links -->
|
| 328 |
+
<section>
|
| 329 |
+
<h2>π Access Datasets on Hugging Face</h2>
|
| 330 |
+
<div class="dataset-links">
|
| 331 |
+
<div class="dataset-card">
|
| 332 |
+
<h4>Original Multi-turn Conversations</h4>
|
| 333 |
+
<p>Psychology + Illicit baseline conversations<br>
|
| 334 |
+
<strong>Sample:</strong> 50 conversations, 390 turns</p>
|
| 335 |
+
<a href="https://huggingface.co/datasets/julyai7/multi-turn-conversations" class="btn" target="_blank">View Dataset β</a>
|
| 336 |
+
</div>
|
| 337 |
+
<div class="dataset-card">
|
| 338 |
+
<h4>Bio-transformed Synthetic V1</h4>
|
| 339 |
+
<p>Direct bio-topic transformation methodology<br>
|
| 340 |
+
<strong>Sample:</strong> 50 conversations, 449 turns</p>
|
| 341 |
+
<a href="https://huggingface.co/datasets/julyai7/multi-turn-bio-transformed-synth-conversations-v1" class="btn" target="_blank">View Dataset β</a>
|
| 342 |
+
</div>
|
| 343 |
+
<div class="dataset-card">
|
| 344 |
+
<h4>Bio-transformed Synthetic V2</h4>
|
| 345 |
+
<p>Adaptive bio-topic transformation methodology<br>
|
| 346 |
+
<strong>Sample:</strong> 50 conversations, 459 turns</p>
|
| 347 |
+
<a href="https://huggingface.co/datasets/julyai7/multi-turn-bio-transformed-synth-conversations-v2" class="btn" target="_blank">View Dataset β</a>
|
| 348 |
+
</div>
|
| 349 |
+
<div class="dataset-card">
|
| 350 |
+
<h4>Keyword-transformed Synthetic</h4>
|
| 351 |
+
<p>Cross-domain keyword substitution methodology<br>
|
| 352 |
+
<strong>Sample:</strong> 50 conversations, 659 turns</p>
|
| 353 |
+
<a href="https://huggingface.co/datasets/julyai7/multi-turn-keyword-transformed-synth-conversations" class="btn" target="_blank">View Dataset β</a>
|
| 354 |
+
</div>
|
| 355 |
+
</div>
|
| 356 |
+
</section>
|
| 357 |
+
|
| 358 |
+
<!-- Research Applications -->
|
| 359 |
+
<section>
|
| 360 |
+
<h2>π§ͺ Research Applications</h2>
|
| 361 |
+
<div style="display: grid; grid-template-columns: repeat(auto-fit, minmax(250px, 1fr)); gap: 1.5rem;">
|
| 362 |
+
<div>
|
| 363 |
+
<h3>Safety Evaluation</h3>
|
| 364 |
+
<ul>
|
| 365 |
+
<li>Benchmark model safety</li>
|
| 366 |
+
<li>Measure robustness</li>
|
| 367 |
+
<li>Evaluate mechanisms</li>
|
| 368 |
+
</ul>
|
| 369 |
+
</div>
|
| 370 |
+
<div>
|
| 371 |
+
<h3>Red Teaming</h3>
|
| 372 |
+
<ul>
|
| 373 |
+
<li>Discover adversarial patterns</li>
|
| 374 |
+
<li>Test safety guardrails</li>
|
| 375 |
+
<li>Identify blind spots</li>
|
| 376 |
+
</ul>
|
| 377 |
+
</div>
|
| 378 |
+
<div>
|
| 379 |
+
<h3>Model Training</h3>
|
| 380 |
+
<ul>
|
| 381 |
+
<li>Fine-tune safety classifiers</li>
|
| 382 |
+
<li>Train attack detectors</li>
|
| 383 |
+
<li>Develop harm detection</li>
|
| 384 |
+
</ul>
|
| 385 |
+
</div>
|
| 386 |
+
<div>
|
| 387 |
+
<h3>Safety Research</h3>
|
| 388 |
+
<ul>
|
| 389 |
+
<li>Study harm transfer</li>
|
| 390 |
+
<li>Analyze attack patterns</li>
|
| 391 |
+
<li>Understand dynamics</li>
|
| 392 |
+
</ul>
|
| 393 |
+
</div>
|
| 394 |
+
</div>
|
| 395 |
+
</section>
|
| 396 |
+
|
| 397 |
+
<!-- Ethical Considerations -->
|
| 398 |
+
<section>
|
| 399 |
+
<h2>β οΈ Ethical Considerations</h2>
|
| 400 |
+
<div class="warning-box">
|
| 401 |
+
<h4>β οΈ IMPORTANT</h4>
|
| 402 |
+
<p>These datasets contain successful adversarial attacks and harmful content.</p>
|
| 403 |
+
</div>
|
| 404 |
+
|
| 405 |
+
<h3>β
Intended Use</h3>
|
| 406 |
+
<ul>
|
| 407 |
+
<li>Defensive security research</li>
|
| 408 |
+
<li>AI safety evaluation and improvement</li>
|
| 409 |
+
<li>Academic research on adversarial robustness</li>
|
| 410 |
+
<li>Training safety and moderation systems</li>
|
| 411 |
+
</ul>
|
| 412 |
+
|
| 413 |
+
<h3>β Prohibited Use</h3>
|
| 414 |
+
<ul>
|
| 415 |
+
<li>Creating offensive content</li>
|
| 416 |
+
<li>Developing attack tools for malicious purposes</li>
|
| 417 |
+
<li>Bypassing safety systems for harm</li>
|
| 418 |
+
<li>Any use that violates laws or ethical guidelines</li>
|
| 419 |
+
</ul>
|
| 420 |
+
</section>
|
| 421 |
+
|
| 422 |
+
<!-- Data Selection -->
|
| 423 |
+
<section>
|
| 424 |
+
<h2>π― Data Selection Process</h2>
|
| 425 |
+
<div class="info-box">
|
| 426 |
+
<p>All datasets are derived from high-quality, validated conversations with strict quality filters including NeurIPS evaluation protocols.</p>
|
| 427 |
+
</div>
|
| 428 |
+
|
| 429 |
+
<h3>Base Criteria</h3>
|
| 430 |
+
<ul>
|
| 431 |
+
<li>Text-based objectives (no code execution templates)</li>
|
| 432 |
+
<li>NeurIPS evaluation metadata present</li>
|
| 433 |
+
<li>Verdict: <code>success</code> (harmful requests successfully fulfilled)</li>
|
| 434 |
+
<li>Multi-turn conversations with prompt-response pairs</li>
|
| 435 |
+
</ul>
|
| 436 |
+
|
| 437 |
+
<h3>Psychology-Specific Criteria</h3>
|
| 438 |
+
<ul>
|
| 439 |
+
<li>Organic conversations (<code>organicity = true</code>)</li>
|
| 440 |
+
<li>No disclaimer in responses</li>
|
| 441 |
+
<li>Successfully elicited harmful psychology-related content</li>
|
| 442 |
+
</ul>
|
| 443 |
+
|
| 444 |
+
<h3>Illicit-Specific Criteria</h3>
|
| 445 |
+
<ul>
|
| 446 |
+
<li>Contains specific instruction details</li>
|
| 447 |
+
<li>Practically executable (not abstract)</li>
|
| 448 |
+
<li>Successfully elicited harmful illicit-related content</li>
|
| 449 |
+
</ul>
|
| 450 |
+
</section>
|
| 451 |
+
|
| 452 |
+
<!-- License -->
|
| 453 |
+
<section>
|
| 454 |
+
<h2>π License</h2>
|
| 455 |
+
<p>All datasets are released under <strong>CC-BY-NC-4.0</strong> (Creative Commons Attribution-NonCommercial 4.0 International).</p>
|
| 456 |
+
<ul>
|
| 457 |
+
<li>β
Use for research and evaluation</li>
|
| 458 |
+
<li>β
Modify and build upon the data</li>
|
| 459 |
+
<li>β
Share with attribution</li>
|
| 460 |
+
<li>β Commercial use without separate licensing</li>
|
| 461 |
+
</ul>
|
| 462 |
+
</section>
|
| 463 |
+
|
| 464 |
+
<!-- Contact -->
|
| 465 |
+
<section>
|
| 466 |
+
<h2>πΌ Full Dataset Access</h2>
|
| 467 |
+
<p>The sample datasets provide representative examples. Full datasets contain thousands of additional conversations with expanded harm categories and regular updates.</p>
|
| 468 |
+
<p style="margin-top: 1rem;"><strong>For academic research or commercial licensing, please contact us with your research objectives, institutional affiliation, and intended use.</strong></p>
|
| 469 |
+
</section>
|
| 470 |
+
</div>
|
| 471 |
+
|
| 472 |
+
<footer>
|
| 473 |
+
<p><strong>Last Updated:</strong> November 24, 2025</p>
|
| 474 |
+
<p style="margin-top: 0.5rem;">For detailed documentation, visit the individual dataset repositories on Hugging Face.</p>
|
| 475 |
+
</footer>
|
| 476 |
+
</div>
|
| 477 |
+
</body>
|
| 478 |
</html>
|