# Credits This dataset is a combination of three existing datasets, pre-processed with **deduplication** and **token limit of 1024 tokens per example**. ## Included Datasets 1. **[CyberNative/Code_Vulnerability_Security_DPO](https://huggingface.co/datasets/CyberNative/Code_Vulnerability_Security_DPO)** - Creator: CyberNative - License: Apache 2.0 - Description: Code dataset focused on security vulnerabilities. 2. **[Madras1/minimax-m2.5-code-distilled-14k](https://huggingface.co/datasets/Madras1/minimax-m2.5-code-distilled-14k)** - Creator: Madras1 - License: Apache 2.0 - Description: Distilled code dataset emphasizing coding patterns and representations. 3. **[pedrodev2026/pedro-open-distil-dataset](https://huggingface.co/datasets/pedrodev2026/pedro-open-distil-dataset)** - Creator: pedrodev2026 - License: BSD 3-Clause - Description: Custom distilled code dataset created and maintained by pedrodev2026. ## Preprocessing The combined dataset was prepared by: - **Deduplicating** all examples to remove redundancy. - Limiting examples to **1024 tokens each**. ## License The final combined dataset is licensed under **BSD 3-Clause**. Users must still respect the original licenses of the included datasets when redistributing or using the original unmodified datasets. - Original licenses: - **[CyberNative/Code_Vulnerability_Security_DPO](https://huggingface.co/datasets/CyberNative/Code_Vulnerability_Security_DPO)**: Apache 2.0 - **[Madras1/minimax-m2.5-code-distilled-14k](https://huggingface.co/datasets/Madras1/minimax-m2.5-code-distilled-14k)**: Apache 2.0 - **[pedrodev2026/pedro-open-distil-dataset](https://huggingface.co/datasets/pedrodev2026/pedro-open-distil-dataset)**: BSD 3-Clause