๐Ÿ›ก๏ธ AI Safety Datasets Collection

Comprehensive evaluation datasets for testing AI model safety mechanisms

๐Ÿ“Š Dataset Collection Summary

Total Conversations

10,321+
Across all datasets

Total Turns

73,258+
Multi-turn interactions

Dataset Types

4
Complementary methodologies

Sample Data

200
Free conversations available

๐Ÿ“ˆ Full Dataset Statistics

Dataset Conversations Turns Avg Turns/Conv Focus
Original Multi-turn 594+ 4,642+ 7.8 Baseline organic conversations
  โ”” Psychology 158+ 1,583+ 10.0 Psychology harm category
  โ”” Illicit 436+ 3,059+ 7.0 Illicit harm category
Bio-transformed V1 1,309+ 6,784+ 5.2 Direct bio-safety attacks
Bio-transformed V2 1,308+ 8,127+ 6.2 Adaptive bio-safety attacks
Keyword-transformed 7,110+ 53,705+ 7.6 Cross-domain harm transfer

๐Ÿ”— Access Datasets on Hugging Face

๐Ÿงช Research Applications

Safety Evaluation

  • Benchmark model safety
  • Measure robustness
  • Evaluate mechanisms

Red Teaming

  • Discover adversarial patterns
  • Test safety guardrails
  • Identify blind spots

Model Training

  • Fine-tune safety classifiers
  • Train attack detectors
  • Develop harm detection

Safety Research

  • Study harm transfer
  • Analyze attack patterns
  • Understand dynamics

โš ๏ธ Ethical Considerations

โš ๏ธ IMPORTANT

These datasets contain successful adversarial attacks and harmful content.

โœ… Intended Use

  • Defensive security research
  • AI safety evaluation and improvement
  • Academic research on adversarial robustness
  • Training safety and moderation systems

โŒ Prohibited Use

  • Creating offensive content
  • Developing attack tools for malicious purposes
  • Bypassing safety systems for harm
  • Any use that violates laws or ethical guidelines

๐ŸŽฏ Data Selection Process

All datasets are derived from high-quality, validated conversations with strict quality filters including NeurIPS evaluation protocols.

Base Criteria

  • Text-based objectives (no code execution templates)
  • NeurIPS evaluation metadata present
  • Verdict: success (harmful requests successfully fulfilled)
  • Multi-turn conversations with prompt-response pairs

Psychology-Specific Criteria

  • Organic conversations (organicity = true)
  • No disclaimer in responses
  • Successfully elicited harmful psychology-related content

Illicit-Specific Criteria

  • Contains specific instruction details
  • Practically executable (not abstract)
  • Successfully elicited harmful illicit-related content

๐Ÿ“„ License

All datasets are released under CC-BY-NC-4.0 (Creative Commons Attribution-NonCommercial 4.0 International).

  • โœ… Use for research and evaluation
  • โœ… Modify and build upon the data
  • โœ… Share with attribution
  • โŒ Commercial use without separate licensing

๐Ÿ’ผ Full Dataset Access

The sample datasets provide representative examples. Full datasets contain thousands of additional conversations with expanded harm categories and regular updates.

Please contact us at info@gojuly.ai to purchase any or all of full datasets.

Include your research objectives, institutional affiliation, and intended use in your inquiry.