| # Credits | |
| This dataset is a combination of three existing datasets, pre-processed with **deduplication** and **token limit of 1024 tokens per example**. | |
| ## Included Datasets | |
| 1. **[CyberNative/Code_Vulnerability_Security_DPO](https://huggingface.co/datasets/CyberNative/Code_Vulnerability_Security_DPO)** | |
| - Creator: CyberNative | |
| - License: Apache 2.0 | |
| - Description: Code dataset focused on security vulnerabilities. | |
| 2. **[Madras1/minimax-m2.5-code-distilled-14k](https://huggingface.co/datasets/Madras1/minimax-m2.5-code-distilled-14k)** | |
| - Creator: Madras1 | |
| - License: Apache 2.0 | |
| - Description: Distilled code dataset emphasizing coding patterns and representations. | |
| 3. **[pedrodev2026/pedro-open-distil-dataset](https://huggingface.co/datasets/pedrodev2026/pedro-open-distil-dataset)** | |
| - Creator: pedrodev2026 | |
| - License: BSD 3-Clause | |
| - Description: Custom distilled code dataset created and maintained by pedrodev2026. | |
| ## Preprocessing | |
| The combined dataset was prepared by: | |
| - **Deduplicating** all examples to remove redundancy. | |
| - Limiting examples to **1024 tokens each**. | |
| ## License | |
| The final combined dataset is licensed under **BSD 3-Clause**. | |
| Users must still respect the original licenses of the included datasets when redistributing or using the original unmodified datasets. | |
| - Original licenses: | |
| - **[CyberNative/Code_Vulnerability_Security_DPO](https://huggingface.co/datasets/CyberNative/Code_Vulnerability_Security_DPO)**: Apache 2.0 | |
| - **[Madras1/minimax-m2.5-code-distilled-14k](https://huggingface.co/datasets/Madras1/minimax-m2.5-code-distilled-14k)**: Apache 2.0 | |
| - **[pedrodev2026/pedro-open-distil-dataset](https://huggingface.co/datasets/pedrodev2026/pedro-open-distil-dataset)**: BSD 3-Clause |