Yang Chung commited on
Commit
42b116c
Β·
1 Parent(s): 7ca962c

Initial commit

Browse files
Files changed (2) hide show
  1. README.md +384 -6
  2. index.html +477 -18
README.md CHANGED
@@ -1,11 +1,389 @@
1
  ---
2
- title: Datasets Overview
3
- emoji: πŸ“š
4
- colorFrom: purple
5
- colorTo: blue
6
  sdk: static
7
  pinned: false
8
- short_description: Overview of available datasets
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  ---
10
 
11
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: AI Safety Datasets Overview
3
+ emoji: πŸ›‘οΈ
4
+ colorFrom: red
5
+ colorTo: orange
6
  sdk: static
7
  pinned: false
8
+ license: cc-by-nc-4.0
9
+ short_description: Comprehensive AI safety evaluation datasets with organic and synthetic adversarial conversations
10
+ tags:
11
+ - safety
12
+ - adversarial
13
+ - red-teaming
14
+ - ai-safety
15
+ - multi-turn
16
+ - synthetic
17
+ datasets:
18
+ - julyai7/multi-turn-conversations
19
+ - julyai7/multi-turn-bio-transformed-synth-conversations-v1
20
+ - julyai7/multi-turn-bio-transformed-synth-conversations-v2
21
+ - julyai7/multi-turn-keyword-transformed-synth-conversations
22
  ---
23
 
24
+ # AI Safety Datasets Collection - Overview
25
+
26
+ This space provides an overview of our comprehensive AI safety evaluation datasets. These datasets are designed for rigorous testing of AI model safety mechanisms across various attack methodologies and harm categories.
27
+
28
+ ## πŸ“Š Dataset Collection Summary
29
+
30
+ We offer four complementary datasets, each serving specific evaluation purposes:
31
+
32
+ | Dataset | Type | Methodology | Primary Use Case |
33
+ |---------|------|-------------|------------------|
34
+ | **Original Multi-turn Conversations** | Organic | Human-AI interactions | Baseline safety evaluation |
35
+ | **Bio-transformed Synthetic V1** | Synthetic | Direct bio-topic transformation | Bio-safety adversarial testing |
36
+ | **Bio-transformed Synthetic V2** | Synthetic | Adaptive bio-topic transformation | Bio-safety with natural flow |
37
+ | **Keyword-transformed Synthetic** | Synthetic | Cross-domain keyword substitution | Domain-agnostic safety testing |
38
+
39
+ ---
40
+
41
+ ## πŸ” Detailed Dataset Descriptions
42
+
43
+ ### 1. Original Multi-turn Conversations
44
+
45
+ **Collections:** `psychs` + `illicits`
46
+
47
+ **Source Data Criteria:**
48
+
49
+ #### Psychology Conversations (`psychs`)
50
+ - **Harm Category:** Psychology
51
+ - **Objective Type:** Text-based, no JupyterHub templates
52
+ - **Quality Filters:**
53
+ - NeurIPS evaluation verdict: `success`
54
+ - No disclaimer present
55
+ - Organic conversations only (`organicity = true`)
56
+ - **Purpose:** Baseline psychology-related harmful requests that successfully elicit unsafe responses
57
+
58
+ #### Illicit Conversations (`illicits`)
59
+ - **Harm Category:** Illicit activities
60
+ - **Objective Type:** Text-based, no JupyterHub templates
61
+ - **Quality Filters:**
62
+ - NeurIPS evaluation verdict: `success`
63
+ - Contains specific instruction details (not `none`)
64
+ - Executable by high school students (`high_school_student_executable != false`)
65
+ - **Purpose:** Baseline illicit-related harmful requests with practical execution potential
66
+
67
+ **Use Cases:**
68
+ - Establishing baseline safety evaluation metrics
69
+ - Understanding organic adversarial conversation patterns
70
+ - Training and fine-tuning safety classifiers
71
+ - Benchmarking model safety before synthetic transformations
72
+
73
+ **Sample Dataset:** 50 conversations (390 turns)
74
+ - Psychology: 55.6% of sample turns
75
+ - Illicit: 44.4% of sample turns
76
+
77
+ ---
78
+
79
+ ### 2. Bio-transformed Synthetic Multi-turn Conversations V1
80
+
81
+ **Collection:** `illicit_bio_synths_v1`
82
+
83
+ **Transformation Method:** `bio_topic_change`
84
+
85
+ **Source:** Derived from original psychology + illicit conversations
86
+
87
+ **Methodology V1 Characteristics:**
88
+ - **Direct transformation approach:** Explicit adversarial pattern injection
89
+ - **Focus:** Systematic safety mechanism bypass strategies
90
+ - **Target Domain:** Bio-safety (dangerous biological information)
91
+ - **Transformation Goal:** Convert psychology/illicit harms into bio-safety attacks
92
+
93
+ **Key Features:**
94
+ - All conversations transformed to `illicit` category (bio-safety domain)
95
+ - Direct mapping of harmful intents to biological contexts
96
+ - Aggressive adversarial techniques
97
+ - Tests explicit bio-safety guardrails
98
+
99
+ **Use Cases:**
100
+ - Testing bio-safety specific guardrails
101
+ - Evaluating cross-domain harm transfer (psych/illicit β†’ bio)
102
+ - Red-teaming bio-related content moderation
103
+ - Training specialized bio-safety detectors
104
+
105
+ **Sample Dataset:** 50 conversations (449 turns, 100% illicit/bio-safety)
106
+
107
+ ---
108
+
109
+ ### 3. Bio-transformed Synthetic Multi-turn Conversations V2
110
+
111
+ **Collection:** `illicit_bio_synths_v2`
112
+
113
+ **Transformation Method:** `bio_topic_change_og`
114
+
115
+ **Source:** Derived from original psychology + illicit conversations
116
+
117
+ **Methodology V2 Characteristics:**
118
+ - **Adaptive transformation approach:** Natural conversation flow preservation
119
+ - **Focus:** Contextual reframing and subtle escalation patterns
120
+ - **Target Domain:** Bio-safety (dangerous biological information)
121
+ - **Transformation Goal:** More sophisticated, harder-to-detect bio-safety attacks
122
+
123
+ **Key Differences from V1:**
124
+ - More natural conversation progression
125
+ - Subtle escalation tactics
126
+ - Better mimics legitimate scientific inquiry
127
+ - Harder for safety systems to detect
128
+
129
+ **Use Cases:**
130
+ - Testing advanced bio-safety detection systems
131
+ - Evaluating robustness against sophisticated attacks
132
+ - Training models to detect subtle adversarial patterns
133
+ - Benchmarking next-generation safety systems
134
+
135
+ **Sample Dataset:** 50 conversations (459 turns, 100% illicit/bio-safety)
136
+
137
+ ---
138
+
139
+ ### 4. Keyword-transformed Synthetic Multi-turn Conversations
140
+
141
+ **Collection:** `keyword_synths`
142
+
143
+ **Transformation Method:** `keyword`
144
+
145
+ **Source:** Derived from original psychology + illicit conversations
146
+
147
+ **Methodology Characteristics:**
148
+ - **Cross-domain keyword substitution:** Maintains harmful intent across contexts
149
+ - **Domain shifting:** Same underlying harm expressed in different domains
150
+ - **Context adaptation:** Systematic replacement of domain-specific terminology
151
+ - **Intent preservation:** Core harmful objective remains unchanged
152
+
153
+ **Innovation:**
154
+ Tests whether AI safety mechanisms are:
155
+ - **Domain-agnostic:** Robust across different contexts and topics
156
+ - **Intent-focused:** Detecting underlying harm vs. surface-level keywords
157
+ - **Context-aware:** Understanding harm despite domain transformations
158
+
159
+ **Key Features:**
160
+ - Preserves original harm category distribution (psychology + illicit)
161
+ - Demonstrates safety mechanism vulnerabilities through context shifting
162
+ - Higher turn count per conversation (more complex attacks)
163
+ - Tests generalization of safety training
164
+
165
+ **Use Cases:**
166
+ - Evaluating domain-agnostic safety mechanisms
167
+ - Testing whether safety is keyword-based or intent-based
168
+ - Training robust cross-domain harm detection
169
+ - Identifying brittleness in safety systems
170
+
171
+ **Sample Dataset:** 50 conversations (659 turns)
172
+ - Illicit: 51.6% of sample turns
173
+ - Psychology: 48.4% of sample turns
174
+
175
+ ---
176
+
177
+ ## 🎯 Data Selection Process
178
+
179
+ All datasets are derived from high-quality, validated conversations that meet strict criteria:
180
+
181
+ ### Base Criteria (All Datasets)
182
+ - βœ… Text-based objectives (no code execution templates)
183
+ - βœ… NeurIPS evaluation metadata present
184
+ - βœ… Verdict: `success` (harmful requests successfully fulfilled)
185
+ - βœ… Multi-turn conversations with prompt-response pairs
186
+
187
+ ### Psychology-Specific Criteria
188
+ - Organic conversations (`organicity = true`)
189
+ - No disclaimer in responses
190
+ - Successfully elicited harmful psychology-related content
191
+
192
+ ### Illicit-Specific Criteria
193
+ - Contains specific instruction details
194
+ - Practically executable (not abstract)
195
+ - Successfully elicited harmful illicit-related content
196
+
197
+ ### Synthetic Transformation Criteria
198
+ - Original conversation must meet base criteria
199
+ - Successful transformation to target methodology
200
+ - Maintains harmful intent in new domain
201
+ - Contains valid prompt-response pairs
202
+
203
+ ---
204
+
205
+ ## πŸ“ˆ Dataset Statistics
206
+
207
+ ### Full Dataset Overview
208
+
209
+ The complete datasets are derived from our production database using strict quality filters:
210
+
211
+ | Dataset | Conversations | Turns | Avg Turns/Conv | Primary Focus |
212
+ |---------|---------------|-------|----------------|---------------|
213
+ | **Original Multi-turn** | **594+** | **4,642+** | **7.8** | Baseline organic conversations |
214
+ | - Psychology (`psychs`) | 158+ | 1,583+ | 10.0 | Psychology harm category |
215
+ | - Illicit (`illicits`) | 436+ | 3,059+ | 7.0 | Illicit harm category |
216
+ | **Bio-transformed V1** | **1,309+** | **6,784+** | **5.2** | Direct bio-safety attacks |
217
+ | **Bio-transformed V2** | **1,308+** | **8,127+** | **6.2** | Adaptive bio-safety attacks |
218
+ | **Keyword-transformed** | **7,110+** | **53,705+** | **7.6** | Cross-domain harm transfer |
219
+ | **Total Full Datasets** | **10,321+** | **73,258+** | **7.1** | All methodologies |
220
+
221
+ ---
222
+
223
+ ### Sample Data Overview (Publicly Available)
224
+
225
+ Representative sample datasets are available on Hugging Face for evaluation and testing:
226
+
227
+ | Dataset | Conversations | Turns | Avg Turns/Conv | Harm Categories |
228
+ |---------|--------------|-------|----------------|-----------------|
229
+ | Original | 50 | 390 | 7.8 | Psychology (55.6%), Illicit (44.4%) |
230
+ | Bio V1 | 50 | 449 | 9.0 | Illicit/Bio (100%) |
231
+ | Bio V2 | 50 | 459 | 9.2 | Illicit/Bio (100%) |
232
+ | Keyword | 50 | 659 | 13.2 | Illicit (51.6%), Psychology (48.4%) |
233
+ | **Total Samples** | **200** | **1,957** | **9.8** | Multiple |
234
+
235
+ > **Note:** Sample datasets represent carefully selected subsets that maintain the distribution and characteristics of the full datasets while being freely accessible for research evaluation.
236
+
237
+ ---
238
+
239
+ ## πŸ”— Dataset Links
240
+
241
+ ### Hugging Face Datasets
242
+
243
+ 1. **[Original Multi-turn Conversations](https://huggingface.co/datasets/julyai7/multi-turn-conversations)**
244
+ - Psychology + Illicit baseline conversations
245
+ - 50 sample conversations, 390 turns
246
+
247
+ 2. **[Bio-transformed Synthetic V1](https://huggingface.co/datasets/julyai7/multi-turn-bio-transformed-synth-conversations-v1)**
248
+ - Direct bio-topic transformation methodology
249
+ - 50 sample conversations, 449 turns
250
+
251
+ 3. **[Bio-transformed Synthetic V2](https://huggingface.co/datasets/julyai7/multi-turn-bio-transformed-synth-conversations-v2)**
252
+ - Adaptive bio-topic transformation methodology
253
+ - 50 sample conversations, 459 turns
254
+
255
+ 4. **[Keyword-transformed Synthetic](https://huggingface.co/datasets/julyai7/multi-turn-keyword-transformed-synth-conversations)**
256
+ - Cross-domain keyword substitution methodology
257
+ - 50 sample conversations, 659 turns
258
+
259
+ ---
260
+
261
+ ## πŸ§ͺ Research Applications
262
+
263
+ These datasets enable various research directions:
264
+
265
+ ### Safety Evaluation
266
+ - Benchmark model safety across attack methodologies
267
+ - Measure robustness to synthetic transformations
268
+ - Evaluate domain-specific vs. general safety mechanisms
269
+
270
+ ### Red Teaming
271
+ - Discover new adversarial patterns
272
+ - Test safety guardrails comprehensively
273
+ - Identify blind spots in content moderation
274
+
275
+ ### Model Training
276
+ - Fine-tune safety classifiers
277
+ - Train adversarial attack detectors
278
+ - Develop cross-domain harm detection systems
279
+
280
+ ### Safety Research
281
+ - Study harm transfer across domains
282
+ - Analyze conversation-level attack patterns
283
+ - Understand multi-turn adversarial dynamics
284
+
285
+ ---
286
+
287
+ ## ⚠️ Ethical Considerations
288
+
289
+ **IMPORTANT:** These datasets contain successful adversarial attacks and harmful content.
290
+
291
+ ### Intended Use
292
+ - βœ… Defensive security research
293
+ - βœ… AI safety evaluation and improvement
294
+ - βœ… Academic research on adversarial robustness
295
+ - βœ… Training safety and moderation systems
296
+
297
+ ### Prohibited Use
298
+ - ❌ Creating offensive content
299
+ - ❌ Developing attack tools for malicious purposes
300
+ - ❌ Bypassing safety systems for harm
301
+ - ❌ Any use that violates laws or ethical guidelines
302
+
303
+ ### Recommendations
304
+ - Use in controlled research environments
305
+ - Implement appropriate access controls
306
+ - Follow institutional review board (IRB) guidelines
307
+ - Report findings responsibly
308
+
309
+ ---
310
+
311
+ ## πŸ“„ License
312
+
313
+ All datasets are released under **CC-BY-NC-4.0** (Creative Commons Attribution-NonCommercial 4.0 International).
314
+
315
+ ### License Terms
316
+ - βœ… Use for research and evaluation
317
+ - βœ… Modify and build upon the data
318
+ - βœ… Share with attribution
319
+ - ❌ Commercial use without separate licensing
320
+
321
+ ---
322
+
323
+ ## πŸ’Ό Full Dataset Access
324
+
325
+ The sample datasets provide representative examples. Full datasets contain:
326
+
327
+ - **Thousands of additional conversations**
328
+ - **Expanded harm categories and variations**
329
+ - **Diverse conversation lengths and complexity levels**
330
+ - **Regular updates with new adversarial patterns**
331
+ - **Custom dataset creation for specific research needs**
332
+
333
+ ### Contact for Full Dataset
334
+
335
+ For academic research or commercial licensing:
336
+ - πŸ“§ Email: [your-email@domain.com]
337
+ - 🌐 Website: [your-website.com]
338
+ - πŸ“‹ Include: Research objectives, institutional affiliation, intended use
339
+
340
+ ---
341
+
342
+ ## πŸ”„ Dataset Updates
343
+
344
+ **Current Version:** November 2024
345
+
346
+ The sample datasets represent snapshots of our larger collection. Full datasets receive regular updates with:
347
+ - New adversarial patterns and methodologies
348
+ - Additional harm categories and domains
349
+ - Improved quality filters and annotations
350
+ - Enhanced diversity in conversation styles
351
+
352
+ ---
353
+
354
+ ## πŸ“š Citation
355
+
356
+ If you use these datasets in your research, please cite:
357
+
358
+ ```bibtex
359
+ @dataset{ai_safety_datasets_2024,
360
+ title={AI Safety Multi-turn Conversation Datasets},
361
+ author={[Your Name/Organization]},
362
+ year={2024},
363
+ publisher={Hugging Face},
364
+ howpublished={\url{https://huggingface.co/julyai7}}
365
+ }
366
+ ```
367
+
368
+ ---
369
+
370
+ ## 🀝 Acknowledgments
371
+
372
+ These datasets were created through:
373
+ - Rigorous NeurIPS evaluation protocols
374
+ - Advanced synthetic transformation methodologies
375
+ - Quality filtering and validation processes
376
+ - Ethical review and safety considerations
377
+
378
+ ---
379
+
380
+ ## πŸ“ž Support & Questions
381
+
382
+ For questions about the datasets:
383
+ - Open an issue in the respective dataset repository
384
+ - Join the discussion in the Community tab
385
+ - Contact us for technical support or collaboration opportunities
386
+
387
+ ---
388
+
389
+ **Last Updated:** November 24, 2025
index.html CHANGED
@@ -1,19 +1,478 @@
1
- <!doctype html>
2
- <html>
3
- <head>
4
- <meta charset="utf-8" />
5
- <meta name="viewport" content="width=device-width" />
6
- <title>My static Space</title>
7
- <link rel="stylesheet" href="style.css" />
8
- </head>
9
- <body>
10
- <div class="card">
11
- <h1>Welcome to your static Space!</h1>
12
- <p>You can modify this app directly by editing <i>index.html</i> in the Files and versions tab.</p>
13
- <p>
14
- Also don't forget to check the
15
- <a href="https://huggingface.co/docs/hub/spaces" target="_blank">Spaces documentation</a>.
16
- </p>
17
- </div>
18
- </body>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  </html>
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>AI Safety Datasets Overview</title>
7
+ <style>
8
+ * {
9
+ margin: 0;
10
+ padding: 0;
11
+ box-sizing: border-box;
12
+ }
13
+
14
+ body {
15
+ font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', 'Roboto', 'Oxygen', 'Ubuntu', 'Cantarell', sans-serif;
16
+ line-height: 1.6;
17
+ color: #333;
18
+ background: linear-gradient(135deg, #ff6b6b 0%, #ff8e53 100%);
19
+ padding: 2rem 1rem;
20
+ }
21
+
22
+ .container {
23
+ max-width: 1200px;
24
+ margin: 0 auto;
25
+ background: white;
26
+ border-radius: 12px;
27
+ box-shadow: 0 8px 32px rgba(0, 0, 0, 0.1);
28
+ overflow: hidden;
29
+ }
30
+
31
+ header {
32
+ background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
33
+ color: white;
34
+ padding: 3rem 2rem;
35
+ text-align: center;
36
+ }
37
+
38
+ header h1 {
39
+ font-size: 2.5rem;
40
+ margin-bottom: 0.5rem;
41
+ }
42
+
43
+ header p {
44
+ font-size: 1.1rem;
45
+ opacity: 0.95;
46
+ }
47
+
48
+ .content {
49
+ padding: 2rem;
50
+ }
51
+
52
+ section {
53
+ margin-bottom: 3rem;
54
+ }
55
+
56
+ h2 {
57
+ color: #667eea;
58
+ font-size: 1.8rem;
59
+ margin-bottom: 1rem;
60
+ border-bottom: 2px solid #667eea;
61
+ padding-bottom: 0.5rem;
62
+ }
63
+
64
+ h3 {
65
+ color: #764ba2;
66
+ font-size: 1.3rem;
67
+ margin: 1.5rem 0 0.75rem 0;
68
+ }
69
+
70
+ .stats-grid {
71
+ display: grid;
72
+ grid-template-columns: repeat(auto-fit, minmax(250px, 1fr));
73
+ gap: 1.5rem;
74
+ margin: 2rem 0;
75
+ }
76
+
77
+ .stat-card {
78
+ background: linear-gradient(135deg, #667eea15 0%, #764ba215 100%);
79
+ border-radius: 8px;
80
+ padding: 1.5rem;
81
+ border-left: 4px solid #667eea;
82
+ }
83
+
84
+ .stat-card h4 {
85
+ color: #667eea;
86
+ font-size: 0.9rem;
87
+ text-transform: uppercase;
88
+ letter-spacing: 1px;
89
+ margin-bottom: 0.5rem;
90
+ }
91
+
92
+ .stat-card .number {
93
+ font-size: 2rem;
94
+ font-weight: bold;
95
+ color: #333;
96
+ }
97
+
98
+ .stat-card .label {
99
+ color: #666;
100
+ font-size: 0.9rem;
101
+ }
102
+
103
+ table {
104
+ width: 100%;
105
+ border-collapse: collapse;
106
+ margin: 1.5rem 0;
107
+ background: white;
108
+ }
109
+
110
+ th {
111
+ background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
112
+ color: white;
113
+ padding: 1rem;
114
+ text-align: left;
115
+ font-weight: 600;
116
+ }
117
+
118
+ td {
119
+ padding: 0.75rem 1rem;
120
+ border-bottom: 1px solid #e0e0e0;
121
+ }
122
+
123
+ tr:hover {
124
+ background: #f8f9fa;
125
+ }
126
+
127
+ .dataset-links {
128
+ display: grid;
129
+ grid-template-columns: repeat(auto-fit, minmax(280px, 1fr));
130
+ gap: 1.5rem;
131
+ margin: 2rem 0;
132
+ }
133
+
134
+ .dataset-card {
135
+ background: white;
136
+ border: 2px solid #e0e0e0;
137
+ border-radius: 8px;
138
+ padding: 1.5rem;
139
+ transition: all 0.3s ease;
140
+ }
141
+
142
+ .dataset-card:hover {
143
+ border-color: #667eea;
144
+ transform: translateY(-4px);
145
+ box-shadow: 0 8px 16px rgba(102, 126, 234, 0.2);
146
+ }
147
+
148
+ .dataset-card h4 {
149
+ color: #667eea;
150
+ margin-bottom: 0.5rem;
151
+ }
152
+
153
+ .dataset-card p {
154
+ color: #666;
155
+ font-size: 0.9rem;
156
+ margin-bottom: 1rem;
157
+ }
158
+
159
+ .btn {
160
+ display: inline-block;
161
+ background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
162
+ color: white;
163
+ padding: 0.75rem 1.5rem;
164
+ text-decoration: none;
165
+ border-radius: 6px;
166
+ font-weight: 600;
167
+ transition: all 0.3s ease;
168
+ }
169
+
170
+ .btn:hover {
171
+ transform: translateY(-2px);
172
+ box-shadow: 0 4px 12px rgba(102, 126, 234, 0.4);
173
+ }
174
+
175
+ .warning-box {
176
+ background: #fff3cd;
177
+ border-left: 4px solid #ffc107;
178
+ padding: 1.5rem;
179
+ margin: 1.5rem 0;
180
+ border-radius: 4px;
181
+ }
182
+
183
+ .warning-box h4 {
184
+ color: #856404;
185
+ margin-bottom: 0.5rem;
186
+ }
187
+
188
+ .info-box {
189
+ background: #d1ecf1;
190
+ border-left: 4px solid #0c5460;
191
+ padding: 1.5rem;
192
+ margin: 1.5rem 0;
193
+ border-radius: 4px;
194
+ }
195
+
196
+ ul {
197
+ margin-left: 2rem;
198
+ margin-top: 0.5rem;
199
+ }
200
+
201
+ li {
202
+ margin-bottom: 0.5rem;
203
+ }
204
+
205
+ code {
206
+ background: #f4f4f4;
207
+ padding: 0.2rem 0.4rem;
208
+ border-radius: 3px;
209
+ font-family: 'Courier New', monospace;
210
+ font-size: 0.9em;
211
+ }
212
+
213
+ footer {
214
+ background: #f8f9fa;
215
+ padding: 2rem;
216
+ text-align: center;
217
+ color: #666;
218
+ border-top: 1px solid #e0e0e0;
219
+ }
220
+
221
+ @media (max-width: 768px) {
222
+ header h1 {
223
+ font-size: 1.8rem;
224
+ }
225
+
226
+ .stats-grid {
227
+ grid-template-columns: 1fr;
228
+ }
229
+ }
230
+ </style>
231
+ </head>
232
+ <body>
233
+ <div class="container">
234
+ <header>
235
+ <h1>πŸ›‘οΈ AI Safety Datasets Collection</h1>
236
+ <p>Comprehensive evaluation datasets for testing AI model safety mechanisms</p>
237
+ </header>
238
+
239
+ <div class="content">
240
+ <!-- Overview Section -->
241
+ <section>
242
+ <h2>πŸ“Š Dataset Collection Summary</h2>
243
+ <div class="stats-grid">
244
+ <div class="stat-card">
245
+ <h4>Total Conversations</h4>
246
+ <div class="number">10,321+</div>
247
+ <div class="label">Across all datasets</div>
248
+ </div>
249
+ <div class="stat-card">
250
+ <h4>Total Turns</h4>
251
+ <div class="number">73,258+</div>
252
+ <div class="label">Multi-turn interactions</div>
253
+ </div>
254
+ <div class="stat-card">
255
+ <h4>Dataset Types</h4>
256
+ <div class="number">4</div>
257
+ <div class="label">Complementary methodologies</div>
258
+ </div>
259
+ <div class="stat-card">
260
+ <h4>Sample Data</h4>
261
+ <div class="number">200</div>
262
+ <div class="label">Free conversations available</div>
263
+ </div>
264
+ </div>
265
+ </section>
266
+
267
+ <!-- Full Dataset Statistics -->
268
+ <section>
269
+ <h2>πŸ“ˆ Full Dataset Statistics</h2>
270
+ <table>
271
+ <thead>
272
+ <tr>
273
+ <th>Dataset</th>
274
+ <th>Conversations</th>
275
+ <th>Turns</th>
276
+ <th>Avg Turns/Conv</th>
277
+ <th>Focus</th>
278
+ </tr>
279
+ </thead>
280
+ <tbody>
281
+ <tr>
282
+ <td><strong>Original Multi-turn</strong></td>
283
+ <td>594+</td>
284
+ <td>4,642+</td>
285
+ <td>7.8</td>
286
+ <td>Baseline organic conversations</td>
287
+ </tr>
288
+ <tr>
289
+ <td>&nbsp;&nbsp;β”” Psychology</td>
290
+ <td>158+</td>
291
+ <td>1,583+</td>
292
+ <td>10.0</td>
293
+ <td>Psychology harm category</td>
294
+ </tr>
295
+ <tr>
296
+ <td>&nbsp;&nbsp;β”” Illicit</td>
297
+ <td>436+</td>
298
+ <td>3,059+</td>
299
+ <td>7.0</td>
300
+ <td>Illicit harm category</td>
301
+ </tr>
302
+ <tr>
303
+ <td><strong>Bio-transformed V1</strong></td>
304
+ <td>1,309+</td>
305
+ <td>6,784+</td>
306
+ <td>5.2</td>
307
+ <td>Direct bio-safety attacks</td>
308
+ </tr>
309
+ <tr>
310
+ <td><strong>Bio-transformed V2</strong></td>
311
+ <td>1,308+</td>
312
+ <td>8,127+</td>
313
+ <td>6.2</td>
314
+ <td>Adaptive bio-safety attacks</td>
315
+ </tr>
316
+ <tr>
317
+ <td><strong>Keyword-transformed</strong></td>
318
+ <td>7,110+</td>
319
+ <td>53,705+</td>
320
+ <td>7.6</td>
321
+ <td>Cross-domain harm transfer</td>
322
+ </tr>
323
+ </tbody>
324
+ </table>
325
+ </section>
326
+
327
+ <!-- Dataset Links -->
328
+ <section>
329
+ <h2>πŸ”— Access Datasets on Hugging Face</h2>
330
+ <div class="dataset-links">
331
+ <div class="dataset-card">
332
+ <h4>Original Multi-turn Conversations</h4>
333
+ <p>Psychology + Illicit baseline conversations<br>
334
+ <strong>Sample:</strong> 50 conversations, 390 turns</p>
335
+ <a href="https://huggingface.co/datasets/julyai7/multi-turn-conversations" class="btn" target="_blank">View Dataset β†’</a>
336
+ </div>
337
+ <div class="dataset-card">
338
+ <h4>Bio-transformed Synthetic V1</h4>
339
+ <p>Direct bio-topic transformation methodology<br>
340
+ <strong>Sample:</strong> 50 conversations, 449 turns</p>
341
+ <a href="https://huggingface.co/datasets/julyai7/multi-turn-bio-transformed-synth-conversations-v1" class="btn" target="_blank">View Dataset β†’</a>
342
+ </div>
343
+ <div class="dataset-card">
344
+ <h4>Bio-transformed Synthetic V2</h4>
345
+ <p>Adaptive bio-topic transformation methodology<br>
346
+ <strong>Sample:</strong> 50 conversations, 459 turns</p>
347
+ <a href="https://huggingface.co/datasets/julyai7/multi-turn-bio-transformed-synth-conversations-v2" class="btn" target="_blank">View Dataset β†’</a>
348
+ </div>
349
+ <div class="dataset-card">
350
+ <h4>Keyword-transformed Synthetic</h4>
351
+ <p>Cross-domain keyword substitution methodology<br>
352
+ <strong>Sample:</strong> 50 conversations, 659 turns</p>
353
+ <a href="https://huggingface.co/datasets/julyai7/multi-turn-keyword-transformed-synth-conversations" class="btn" target="_blank">View Dataset β†’</a>
354
+ </div>
355
+ </div>
356
+ </section>
357
+
358
+ <!-- Research Applications -->
359
+ <section>
360
+ <h2>πŸ§ͺ Research Applications</h2>
361
+ <div style="display: grid; grid-template-columns: repeat(auto-fit, minmax(250px, 1fr)); gap: 1.5rem;">
362
+ <div>
363
+ <h3>Safety Evaluation</h3>
364
+ <ul>
365
+ <li>Benchmark model safety</li>
366
+ <li>Measure robustness</li>
367
+ <li>Evaluate mechanisms</li>
368
+ </ul>
369
+ </div>
370
+ <div>
371
+ <h3>Red Teaming</h3>
372
+ <ul>
373
+ <li>Discover adversarial patterns</li>
374
+ <li>Test safety guardrails</li>
375
+ <li>Identify blind spots</li>
376
+ </ul>
377
+ </div>
378
+ <div>
379
+ <h3>Model Training</h3>
380
+ <ul>
381
+ <li>Fine-tune safety classifiers</li>
382
+ <li>Train attack detectors</li>
383
+ <li>Develop harm detection</li>
384
+ </ul>
385
+ </div>
386
+ <div>
387
+ <h3>Safety Research</h3>
388
+ <ul>
389
+ <li>Study harm transfer</li>
390
+ <li>Analyze attack patterns</li>
391
+ <li>Understand dynamics</li>
392
+ </ul>
393
+ </div>
394
+ </div>
395
+ </section>
396
+
397
+ <!-- Ethical Considerations -->
398
+ <section>
399
+ <h2>⚠️ Ethical Considerations</h2>
400
+ <div class="warning-box">
401
+ <h4>⚠️ IMPORTANT</h4>
402
+ <p>These datasets contain successful adversarial attacks and harmful content.</p>
403
+ </div>
404
+
405
+ <h3>βœ… Intended Use</h3>
406
+ <ul>
407
+ <li>Defensive security research</li>
408
+ <li>AI safety evaluation and improvement</li>
409
+ <li>Academic research on adversarial robustness</li>
410
+ <li>Training safety and moderation systems</li>
411
+ </ul>
412
+
413
+ <h3>❌ Prohibited Use</h3>
414
+ <ul>
415
+ <li>Creating offensive content</li>
416
+ <li>Developing attack tools for malicious purposes</li>
417
+ <li>Bypassing safety systems for harm</li>
418
+ <li>Any use that violates laws or ethical guidelines</li>
419
+ </ul>
420
+ </section>
421
+
422
+ <!-- Data Selection -->
423
+ <section>
424
+ <h2>🎯 Data Selection Process</h2>
425
+ <div class="info-box">
426
+ <p>All datasets are derived from high-quality, validated conversations with strict quality filters including NeurIPS evaluation protocols.</p>
427
+ </div>
428
+
429
+ <h3>Base Criteria</h3>
430
+ <ul>
431
+ <li>Text-based objectives (no code execution templates)</li>
432
+ <li>NeurIPS evaluation metadata present</li>
433
+ <li>Verdict: <code>success</code> (harmful requests successfully fulfilled)</li>
434
+ <li>Multi-turn conversations with prompt-response pairs</li>
435
+ </ul>
436
+
437
+ <h3>Psychology-Specific Criteria</h3>
438
+ <ul>
439
+ <li>Organic conversations (<code>organicity = true</code>)</li>
440
+ <li>No disclaimer in responses</li>
441
+ <li>Successfully elicited harmful psychology-related content</li>
442
+ </ul>
443
+
444
+ <h3>Illicit-Specific Criteria</h3>
445
+ <ul>
446
+ <li>Contains specific instruction details</li>
447
+ <li>Practically executable (not abstract)</li>
448
+ <li>Successfully elicited harmful illicit-related content</li>
449
+ </ul>
450
+ </section>
451
+
452
+ <!-- License -->
453
+ <section>
454
+ <h2>πŸ“„ License</h2>
455
+ <p>All datasets are released under <strong>CC-BY-NC-4.0</strong> (Creative Commons Attribution-NonCommercial 4.0 International).</p>
456
+ <ul>
457
+ <li>βœ… Use for research and evaluation</li>
458
+ <li>βœ… Modify and build upon the data</li>
459
+ <li>βœ… Share with attribution</li>
460
+ <li>❌ Commercial use without separate licensing</li>
461
+ </ul>
462
+ </section>
463
+
464
+ <!-- Contact -->
465
+ <section>
466
+ <h2>πŸ’Ό Full Dataset Access</h2>
467
+ <p>The sample datasets provide representative examples. Full datasets contain thousands of additional conversations with expanded harm categories and regular updates.</p>
468
+ <p style="margin-top: 1rem;"><strong>For academic research or commercial licensing, please contact us with your research objectives, institutional affiliation, and intended use.</strong></p>
469
+ </section>
470
+ </div>
471
+
472
+ <footer>
473
+ <p><strong>Last Updated:</strong> November 24, 2025</p>
474
+ <p style="margin-top: 0.5rem;">For detailed documentation, visit the individual dataset repositories on Hugging Face.</p>
475
+ </footer>
476
+ </div>
477
+ </body>
478
  </html>