Yang Chung commited on
Commit
5fa85d7
Β·
1 Parent(s): 97b35c5

Update with illicit numbers

Browse files
Files changed (2) hide show
  1. README.md +9 -11
  2. index.html +13 -15
README.md CHANGED
@@ -28,36 +28,36 @@ Comprehensive evaluation datasets for testing AI model safety mechanisms
28
 
29
  | Metric | Value |
30
  |--------|-------|
31
- | **Total Conversations** | 979+ |
32
- | **Total Turns** | 7,706+ |
33
  | **Dataset Types** | 3 complementary methodologies |
34
- | **Sample Data Available** | 150 conversations |
35
 
36
  ## πŸ“ˆ Full Dataset Statistics
37
 
38
  | Dataset | Conversations | Turns | Avg Turns/Conv | Focus |
39
  |---------|--------------|-------|----------------|--------|
40
- | **Psychology multi-turn** | 207+ | 2,128+ | 10.3 | Psychology harmfulness such as self-harm, psychosis, anthropomorphism, etc. |
41
- | **Illicit (bioweapon) multi-turn** | 102+ | 1,038+ | 10.2 | Bio-safety harmfulness such as bioweapons, pathogens, etc. |
42
- | **Illicit (chemical, general) multi-turn** | 670+ | 4,540+ | 6.8 | Non-bio safety harmfulness such as chemical weapons, cyber threats, etc. |
43
 
44
  ## πŸ”— Access Datasets on Hugging Face
45
 
46
  ### Psychology Multi-turn Conversations
47
  Psychology harmfulness such as self-harm, psychosis, anthropomorphism, etc.
48
- **Sample:** 50 conversations, 390 turns
49
 
50
  πŸ”— **[View Dataset](https://huggingface.co/datasets/GoJulyAI/multi-turn-conversations)**
51
 
52
  ### Illicit (bioweapon) Multi-turn Conversations
53
  Bio-safety harmfulness such as bioweapons, pathogens, etc.
54
- **Sample:** 50 conversations, 449 turns
55
 
56
  πŸ”— **[View Dataset](https://huggingface.co/datasets/GoJulyAI/multi-turn-bio-transformed-synth-conversations-v1)**
57
 
58
  ### Illicit (chemical, general) Multi-turn Conversations
59
  Non-bio safety harmfulness such as chemical weapons, cyber threats, etc.
60
- **Sample:** 50 conversations, 459 turns
61
 
62
  πŸ”— **[View Dataset](https://huggingface.co/datasets/GoJulyAI/multi-turn-bio-transformed-synth-conversations-v2)**
63
 
@@ -83,13 +83,11 @@ All datasets are derived from high-quality, validated conversations with strict
83
 
84
  ### Base Criteria
85
  - Text-based objectives (no code execution templates)
86
- - NeurIPS evaluation metadata present
87
  - Verdict: `success` (harmful requests successfully fulfilled)
88
  - Multi-turn conversations with prompt-response pairs
89
 
90
  ### Psychology-Specific Criteria
91
  - Organic conversations (`organicity = true`)
92
- - No disclaimer in responses
93
  - Successfully elicited harmful psychology-related content
94
 
95
  ### Illicit-Specific Criteria
 
28
 
29
  | Metric | Value |
30
  |--------|-------|
31
+ | **Total Conversations** | 849+ |
32
+ | **Total Turns** | 6,694+ |
33
  | **Dataset Types** | 3 complementary methodologies |
34
+ | **Sample Data Available** | 15 conversations |
35
 
36
  ## πŸ“ˆ Full Dataset Statistics
37
 
38
  | Dataset | Conversations | Turns | Avg Turns/Conv | Focus |
39
  |---------|--------------|-------|----------------|--------|
40
+ | **Psychology multi-turn** | 184+ | 1,964+ | 10.3 | Psychology harmfulness such as self-harm, psychosis, anthropomorphism, etc. |
41
+ | **Illicit (bioweapon) multi-turn** | 84+ | 822+ | 9.8 | Bio-safety harmfulness such as bioweapons, pathogens, etc. |
42
+ | **Illicit (chemical, general) multi-turn** | 581+ | 3,908+ | 6.7 | Non-bio safety harmfulness such as chemical weapons, cyber threats, etc. |
43
 
44
  ## πŸ”— Access Datasets on Hugging Face
45
 
46
  ### Psychology Multi-turn Conversations
47
  Psychology harmfulness such as self-harm, psychosis, anthropomorphism, etc.
48
+ **Sample:** 5 conversations
49
 
50
  πŸ”— **[View Dataset](https://huggingface.co/datasets/GoJulyAI/multi-turn-conversations)**
51
 
52
  ### Illicit (bioweapon) Multi-turn Conversations
53
  Bio-safety harmfulness such as bioweapons, pathogens, etc.
54
+ **Sample:** 5 conversations
55
 
56
  πŸ”— **[View Dataset](https://huggingface.co/datasets/GoJulyAI/multi-turn-bio-transformed-synth-conversations-v1)**
57
 
58
  ### Illicit (chemical, general) Multi-turn Conversations
59
  Non-bio safety harmfulness such as chemical weapons, cyber threats, etc.
60
+ **Sample:** 5 conversations
61
 
62
  πŸ”— **[View Dataset](https://huggingface.co/datasets/GoJulyAI/multi-turn-bio-transformed-synth-conversations-v2)**
63
 
 
83
 
84
  ### Base Criteria
85
  - Text-based objectives (no code execution templates)
 
86
  - Verdict: `success` (harmful requests successfully fulfilled)
87
  - Multi-turn conversations with prompt-response pairs
88
 
89
  ### Psychology-Specific Criteria
90
  - Organic conversations (`organicity = true`)
 
91
  - Successfully elicited harmful psychology-related content
92
 
93
  ### Illicit-Specific Criteria
index.html CHANGED
@@ -243,12 +243,12 @@
243
  <div class="stats-grid">
244
  <div class="stat-card">
245
  <h4>Total Conversations</h4>
246
- <div class="number">979+</div>
247
  <div class="label">Across all datasets</div>
248
  </div>
249
  <div class="stat-card">
250
  <h4>Total Turns</h4>
251
- <div class="number">7,706+</div>
252
  <div class="label">Multi-turn interactions</div>
253
  </div>
254
  <div class="stat-card">
@@ -280,23 +280,23 @@
280
  <tbody>
281
  <tr>
282
  <td><strong>Psychology multi-turn</strong></td>
283
- <td>207+</td>
284
- <td>2128+</td>
285
  <td>10.3</td>
286
  <td>Psychology harmfulness such as self-harm, psychosis, anthropomorphism, etc.</td>
287
  </tr>
288
  <tr>
289
  <td><strong>Illicit (bioweapon) multi-turn</strong></td>
290
- <td>102+</td>
291
- <td>1038+</td>
292
- <td>10.2</td>
293
  <td>Bio-safety harmfulness such as bioweapons, pathogens, etc.</td>
294
  </tr>
295
  <tr>
296
  <td><strong>Illicit (chemical, general) multi-turn</strong></td>
297
- <td>670+</td>
298
- <td>4540+</td>
299
- <td>6.8</td>
300
  <td>Non-bio safety harmfulness such as chemical weapons, cyber threats, etc.</td>
301
  </tr>
302
  </tbody>
@@ -310,19 +310,19 @@
310
  <div class="dataset-card">
311
  <h4>Psychology Multi-turn Conversations</h4>
312
  <p>Psychology harmfulness such as self-harm, psychosis, anthropomorphism, etc.<br>
313
- <strong>Sample:</strong> 50 conversations, 390 turns</p>
314
  <a href="https://huggingface.co/datasets/GoJulyAI/multi-turn-conversations" class="btn" target="_blank">View Dataset β†’</a>
315
  </div>
316
  <div class="dataset-card">
317
  <h4>Illicit (bioweapon) Multi-turn Conversations</h4>
318
  <p>Bio-safety harmfulness such as bioweapons, pathogens, etc.<br>
319
- <strong>Sample:</strong> 50 conversations, 449 turns</p>
320
  <a href="https://huggingface.co/datasets/GoJulyAI/multi-turn-bio-transformed-synth-conversations-v1" class="btn" target="_blank">View Dataset β†’</a>
321
  </div>
322
  <div class="dataset-card">
323
  <h4>Illicit (chemical, general) Multi-turn Conversations</h4>
324
  <p>Non-bio safety harmfulness such as chemical weapons, cyber threats, etc.<br>
325
- <strong>Sample:</strong> 50 conversations, 459 turns</p>
326
  <a href="https://huggingface.co/datasets/GoJulyAI/multi-turn-bio-transformed-synth-conversations-v2" class="btn" target="_blank">View Dataset β†’</a>
327
  </div>
328
  </div>
@@ -363,7 +363,6 @@
363
  <h3>Base Criteria</h3>
364
  <ul>
365
  <li>Text-based objectives (no code execution templates)</li>
366
- <li>NeurIPS evaluation metadata present</li>
367
  <li>Verdict: <code>success</code> (harmful requests successfully fulfilled)</li>
368
  <li>Multi-turn conversations with prompt-response pairs</li>
369
  </ul>
@@ -371,7 +370,6 @@
371
  <h3>Psychology-Specific Criteria</h3>
372
  <ul>
373
  <li>Organic conversations (<code>organicity = true</code>)</li>
374
- <li>No disclaimer in responses</li>
375
  <li>Successfully elicited harmful psychology-related content</li>
376
  </ul>
377
 
 
243
  <div class="stats-grid">
244
  <div class="stat-card">
245
  <h4>Total Conversations</h4>
246
+ <div class="number">849+</div>
247
  <div class="label">Across all datasets</div>
248
  </div>
249
  <div class="stat-card">
250
  <h4>Total Turns</h4>
251
+ <div class="number">6694+</div>
252
  <div class="label">Multi-turn interactions</div>
253
  </div>
254
  <div class="stat-card">
 
280
  <tbody>
281
  <tr>
282
  <td><strong>Psychology multi-turn</strong></td>
283
+ <td>184+</td>
284
+ <td>1964+</td>
285
  <td>10.3</td>
286
  <td>Psychology harmfulness such as self-harm, psychosis, anthropomorphism, etc.</td>
287
  </tr>
288
  <tr>
289
  <td><strong>Illicit (bioweapon) multi-turn</strong></td>
290
+ <td>84+</td>
291
+ <td>822+</td>
292
+ <td>9.8</td>
293
  <td>Bio-safety harmfulness such as bioweapons, pathogens, etc.</td>
294
  </tr>
295
  <tr>
296
  <td><strong>Illicit (chemical, general) multi-turn</strong></td>
297
+ <td>581+</td>
298
+ <td>3908+</td>
299
+ <td>6.7</td>
300
  <td>Non-bio safety harmfulness such as chemical weapons, cyber threats, etc.</td>
301
  </tr>
302
  </tbody>
 
310
  <div class="dataset-card">
311
  <h4>Psychology Multi-turn Conversations</h4>
312
  <p>Psychology harmfulness such as self-harm, psychosis, anthropomorphism, etc.<br>
313
+ <strong>Sample:</strong> 5 conversations</p>
314
  <a href="https://huggingface.co/datasets/GoJulyAI/multi-turn-conversations" class="btn" target="_blank">View Dataset β†’</a>
315
  </div>
316
  <div class="dataset-card">
317
  <h4>Illicit (bioweapon) Multi-turn Conversations</h4>
318
  <p>Bio-safety harmfulness such as bioweapons, pathogens, etc.<br>
319
+ <strong>Sample:</strong> 5 conversations</p>
320
  <a href="https://huggingface.co/datasets/GoJulyAI/multi-turn-bio-transformed-synth-conversations-v1" class="btn" target="_blank">View Dataset β†’</a>
321
  </div>
322
  <div class="dataset-card">
323
  <h4>Illicit (chemical, general) Multi-turn Conversations</h4>
324
  <p>Non-bio safety harmfulness such as chemical weapons, cyber threats, etc.<br>
325
+ <strong>Sample:</strong> 5 conversations</p>
326
  <a href="https://huggingface.co/datasets/GoJulyAI/multi-turn-bio-transformed-synth-conversations-v2" class="btn" target="_blank">View Dataset β†’</a>
327
  </div>
328
  </div>
 
363
  <h3>Base Criteria</h3>
364
  <ul>
365
  <li>Text-based objectives (no code execution templates)</li>
 
366
  <li>Verdict: <code>success</code> (harmful requests successfully fulfilled)</li>
367
  <li>Multi-turn conversations with prompt-response pairs</li>
368
  </ul>
 
370
  <h3>Psychology-Specific Criteria</h3>
371
  <ul>
372
  <li>Organic conversations (<code>organicity = true</code>)</li>
 
373
  <li>Successfully elicited harmful psychology-related content</li>
374
  </ul>
375