YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

CTI-Bench Dataset Processing Script

This repository contains the processing script used to convert the original CTI-Bench TSV files into well-structured Hugging Face datasets with comprehensive documentation.

🎯 Overview

The script processes 6 different CTI-Bench task files and uploads them as separate, documented datasets:

  1. cti_bench_mcq - Multiple Choice Questions (2,500 entries)
  2. cti_bench_ate - Attack Technique Extraction (60 entries)
  3. cti_bench_vsp - Vulnerability Severity Prediction (1,000 entries)
  4. cti_bench_taa - Threat Actor Attribution (50 entries)
  5. cti_bench_rcm - Reverse Cyber Mapping (1,000 entries)
  6. cti_bench_rcm_2021 - Reverse Cyber Mapping 2021 (1,000 entries)

📊 Processed Datasets

All processed datasets are available at: tuandunghcmut

🚀 Usage

Prerequisites

pip install pandas datasets huggingface_hub

Authentication

Make sure you're logged in to Hugging Face:

huggingface-cli login
# or
hf auth login

Running the Script

  1. Clone the original CTI-Bench repository:
git clone https://github.com/xashru/cti-bench.git
  1. Run the processing script:
python process_cti_bench_with_docs.py --username YOUR_HF_USERNAME

Command Line Options

  • --username: Your Hugging Face username (required)
  • --token: Hugging Face token (optional if already logged in)
  • --data-dir: Path to CTI-bench data directory (default: cti-bench/data)

🔧 Features

Data Processing

  • Standardized Schema: All datasets include consistent field naming
  • Task Type Labels: Each entry includes a task_type field for identification
  • Clean Data: Proper handling of missing values and data types
  • Chunk Processing: Handles large files efficiently

Documentation

  • 📚 Comprehensive READMEs: Each dataset gets a detailed README with:
    • Dataset description and statistics
    • Field explanations
    • Usage examples
    • Citation information
    • Task categories
  • 🎯 Task-Specific Info: Tailored documentation for each CTI task type
  • 📖 Code Examples: Ready-to-use Python snippets

Upload Features

  • 🚀 Batch Processing: Processes all 6 datasets in one run
  • 📤 Auto-Upload: Automatically uploads to Hugging Face Hub
  • 📝 README Integration: Uploads documentation alongside data
  • Progress Tracking: Detailed logging and progress reports

📁 Dataset Structure

Each processed dataset follows this structure:

Multiple Choice Questions (MCQ)

{
    'url': str,           # Source MITRE ATT&CK URL
    'question': str,      # The cybersecurity question
    'option_a': str,      # Multiple choice option A
    'option_b': str,      # Multiple choice option B  
    'option_c': str,      # Multiple choice option C
    'option_d': str,      # Multiple choice option D
    'prompt': str,        # Full instruction prompt
    'ground_truth': str,  # Correct answer (A, B, C, or D)
    'task_type': str      # Always "multiple_choice_question"
}

Attack Technique Extraction (ATE)

{
    'url': str,          # Source MITRE software URL
    'platform': str,     # Target platform (Enterprise, Mobile, etc.)
    'description': str,  # Malware/attack description
    'prompt': str,       # Full instruction with MITRE reference
    'ground_truth': str, # MITRE technique IDs (e.g., "T1071, T1573")
    'task_type': str     # Always "attack_technique_extraction"
}

Vulnerability Severity Prediction (VSP)

{
    'url': str,          # CVE URL
    'description': str,  # CVE vulnerability description
    'prompt': str,       # CVSS instruction prompt
    'cvss_vector': str,  # CVSS v3.1 vector string
    'task_type': str     # Always "vulnerability_severity_prediction"
}

🎓 Original CTI-Bench Paper

This processing script is based on the CTI-Bench dataset from:

CTIBench: A Benchmark for Evaluating LLMs in Cyber Threat Intelligence
NeurIPS 2024
GitHub | Hugging Face

📄 Citation

If you use these processed datasets or this script, please cite the original paper:

@article{ctibench2024,
  title={CTIBench: A Benchmark for Evaluating LLMs in Cyber Threat Intelligence},
  author={[Authors]},
  journal={NeurIPS 2024},
  year={2024}
}

🤝 Contributing

Feel free to submit issues or pull requests to improve the processing script or documentation.

📜 License

This script is provided under the same license terms as the original CTI-Bench dataset.


Total Processed Samples: 5,610 cybersecurity evaluation examples across 6 different task types! 🎯

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support