Credits
This dataset is a combination of three existing datasets, pre-processed with deduplication and token limit of 1024 tokens per example.
Included Datasets
CyberNative/Code_Vulnerability_Security_DPO
- Creator: CyberNative
- License: Apache 2.0
- Description: Code dataset focused on security vulnerabilities.
Madras1/minimax-m2.5-code-distilled-14k
- Creator: Madras1
- License: Apache 2.0
- Description: Distilled code dataset emphasizing coding patterns and representations.
pedrodev2026/pedro-open-distil-dataset
- Creator: pedrodev2026
- License: BSD 3-Clause
- Description: Custom distilled code dataset created and maintained by pedrodev2026.
Preprocessing
The combined dataset was prepared by:
- Deduplicating all examples to remove redundancy.
- Limiting examples to 1024 tokens each.
License
The final combined dataset is licensed under BSD 3-Clause.
Users must still respect the original licenses of the included datasets when redistributing or using the original unmodified datasets.
- Original licenses:
- CyberNative/Code_Vulnerability_Security_DPO: Apache 2.0
- Madras1/minimax-m2.5-code-distilled-14k: Apache 2.0
- pedrodev2026/pedro-open-distil-dataset: BSD 3-Clause