File size: 363 Bytes
82995d1
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
See curate_dataset.py in this repository for the full dataset curation pipeline.

The dataset curation script combines:
1. AlicanKiraz0/All-CVE-Records-Training-Dataset (10K samples)
2. m-a-p/Code-Feedback (5K samples)  
3. nvidia/OpenCodeReasoning (5K samples)
4. Synthetic cybersecurity examples (JSON output, AST, GDB, ROP)

Run with: python curate_dataset.py