A newer version of the Gradio SDK is available:
6.6.0
title: Dataset Explorer
emoji: π
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: false
license: apache-2.0
tags:
- cybersecurity
- datasets
- data-explorer
- analytics
- visualization
π Cybersecurity Dataset Explorer
A comprehensive Gradio Space to explore and analyze 80+ cybersecurity datasets from HuggingFace.
Features
π Search & Filter
- Search by keyword across dataset names, descriptions, and tags
- Filter by language (English, Chinese, Korean, Italian, French, Russian, etc.)
- Filter by category (AI, Defensive, Offensive, Compliance)
- Filter by popularity (minimum downloads and likes)
- View results in interactive tables
π Dataset Details
- Comprehensive metadata for each dataset
- Statistics (downloads, likes, size, language)
- Complete tag listings
- Direct links to HuggingFace repositories
- Mock preview functionality (shows structure)
π Statistics & Visualizations
Interactive charts powered by Plotly:
- Category Distribution: Pie chart showing dataset distribution across categories
- Language Distribution: Bar chart of top 10 languages
- Top Downloads: Horizontal bar chart of most popular datasets
- Size Distribution: Distribution of dataset sizes
π₯ Export Capabilities
- Export filtered results to CSV format
- Export filtered results to JSON format
- Download data for offline analysis
π¨ Dark Theme
Beautiful dark theme optimized for readability with:
- High contrast colors
- Interactive hover effects
- Responsive layout
- Professional visualization styling
Dataset Categories
AI (27 datasets)
Datasets for training and evaluating AI/ML models in cybersecurity:
- Instruction-tuning datasets
- ShareGPT format conversations
- Question-answering pairs
- Synthetic training data
- Fine-tuning datasets
Defensive (28 datasets)
Blue team, security operations, and threat detection:
- Threat intelligence
- Incident response
- Security operations
- Detection rules (SIGMA, YARA, Suricata)
- Honeypot data
- News and threat feeds
Offensive (10 datasets)
Red team, penetration testing, and security research:
- Penetration testing techniques
- Exploit databases
- Attack scenarios
- Vulnerability data
- CVE databases
Compliance (5 datasets)
Regulatory frameworks and standards:
- NIST Cybersecurity Framework
- ISO/IEC 27001
- Taiwan Cybersecurity Law
- Compliance training data
Top Datasets
ethanolivertroy/nist-cybersecurity-training (8,000 downloads)
- Largest open-source NIST cybersecurity training dataset
- 100K-1M samples for LLM fine-tuning
clydeiii/cybersecurity (4,000 downloads)
- APT notes from GitHub
- Threat intelligence focus
vinitvek/cybersecurityattacks (2,300 downloads)
- Cybersecurity attacks dataset
- 10K-100K samples
Trendyol/Trendyol-Cybersecurity-Instruction-Tuning-Dataset (786 downloads, 78 likes)
- 53,202 instruction-tuning examples
- Defensive security focus
AlicanKiraz0/Cybersecurity-Dataset-Fenrir-v2.0 (353 downloads)
- 83,920 high-quality training triples
- Defensive cybersecurity
Statistics
- Total Datasets: 80
- Total Downloads: 18,000+
- Languages: 10+ (English, Chinese, Korean, Italian, French, Russian, etc.)
- Size Range: <1K to 10M+ samples
Usage
Search Examples
Find NIST-related datasets:
- Keyword: "NIST"
- Category: Compliance
Find penetration testing datasets:
- Keyword: "penetration" or "pentest"
- Category: Offensive
Find instruction-tuning datasets:
- Keyword: "instruction"
- Category: AI
- Min Downloads: 100
Find threat intelligence datasets:
- Keyword: "threat"
- Category: Defensive
Export Workflow
- Apply desired filters
- Click "Search Datasets"
- Click "Export to CSV" or "Export to JSON"
- Download the file from the interface
Technologies
- Gradio 4.44.1: Interactive web interface
- Pandas 2.1.4: Data manipulation and filtering
- Plotly 5.18.0: Interactive visualizations
- HuggingFace Datasets 2.16.1: Dataset metadata
Data Sources
All datasets are publicly available on HuggingFace Hub. This explorer provides:
- Curated metadata from 80 cybersecurity datasets
- Filtering and search capabilities
- Visual analytics
- Export functionality
To access actual dataset content, click the HuggingFace URL for any dataset.
Development
Local Setup
pip install -r requirements.txt
python app.py
File Structure
dataset-explorer/
βββ app.py # Main Gradio application
βββ requirements.txt # Python dependencies
βββ README.md # This file
Future Enhancements
- Live dataset preview (load actual samples)
- Full-text search within dataset content
- Advanced filtering (by date, size range)
- Dataset comparison tool
- API integration for real-time updates
- Custom visualization builder
- Dataset recommendation engine
License
Apache 2.0
Author
AYI-NEDJIMI
Acknowledgments
Special thanks to the HuggingFace community and all dataset creators who make their cybersecurity datasets publicly available.
Note: This is a metadata explorer. To download and use the actual datasets, visit the HuggingFace links provided in the interface.