File size: 1,792 Bytes
792ff9a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# Credits

This dataset is a combination of three existing datasets, pre-processed with **deduplication** and **token limit of 1024 tokens per example**.

## Included Datasets

1. **[CyberNative/Code_Vulnerability_Security_DPO](https://huggingface.co/datasets/CyberNative/Code_Vulnerability_Security_DPO)**  
   - Creator: CyberNative  
   - License: Apache 2.0  
   - Description: Code dataset focused on security vulnerabilities.  

2. **[Madras1/minimax-m2.5-code-distilled-14k](https://huggingface.co/datasets/Madras1/minimax-m2.5-code-distilled-14k)**  
   - Creator: Madras1  
   - License: Apache 2.0  
   - Description: Distilled code dataset emphasizing coding patterns and representations.  

3. **[pedrodev2026/pedro-open-distil-dataset](https://huggingface.co/datasets/pedrodev2026/pedro-open-distil-dataset)**  
   - Creator: pedrodev2026  
   - License: BSD 3-Clause  
   - Description: Custom distilled code dataset created and maintained by pedrodev2026.  

## Preprocessing

The combined dataset was prepared by:

- **Deduplicating** all examples to remove redundancy.  
- Limiting examples to **1024 tokens each**.  

## License

The final combined dataset is licensed under **BSD 3-Clause**.  
Users must still respect the original licenses of the included datasets when redistributing or using the original unmodified datasets.  

- Original licenses:  
  - **[CyberNative/Code_Vulnerability_Security_DPO](https://huggingface.co/datasets/CyberNative/Code_Vulnerability_Security_DPO)**: Apache 2.0  
  - **[Madras1/minimax-m2.5-code-distilled-14k](https://huggingface.co/datasets/Madras1/minimax-m2.5-code-distilled-14k)**: Apache 2.0  
  - **[pedrodev2026/pedro-open-distil-dataset](https://huggingface.co/datasets/pedrodev2026/pedro-open-distil-dataset)**: BSD 3-Clause