| # Credits & Licenses |
|
|
| This dataset is a merged and standardized version of multiple public datasets. |
| All original authors retain their rights under their respective licenses. |
|
|
| --- |
|
|
| ## Sources |
|
|
| ### Vezora/Tested-22k-Python-Alpaca |
|
|
| * Author: Vezora |
| * License: Apache License 2.0 |
| * License URL: https://www.apache.org/licenses/LICENSE-2.0 |
| * Source: https://huggingface.co/datasets/Vezora/Tested-22k-Python-Alpaca |
|
|
| ### CyberNative/Code_Vulnerability_Security_DPO |
| |
| * Author: CyberNative |
| * License: Apache License 2.0 |
| * License URL: https://www.apache.org/licenses/LICENSE-2.0 |
| * Source: https://huggingface.co/datasets/CyberNative/Code_Vulnerability_Security_DPO |
|
|
| ### pedrodev2026/open-code-instruct-75k |
|
|
| * Author: pedrodev2026 |
| * License: BSD 3-Clause License |
| * License URL: https://opensource.org/licenses/BSD-3-Clause |
| * Source: https://huggingface.co/datasets/pedrodev2026/open-code-instruct-75k |
|
|
| #### Original Dataset (of the OpenCodeInstruct 75K dataset above) |
|
|
| * Name: NVIDIA OpenCodeInstruct |
| * Author: NVIDIA |
| * License: Creative Commons Attribution 4.0 (CC-BY 4.0) |
| * License URL: https://creativecommons.org/licenses/by/4.0/ |
| * Source: https://huggingface.co/datasets/nvidia/OpenCodeInstruct |
|
|
| #### Relationship to the Original Dataset |
|
|
| * **OpenCodeInstruct 75K** is derived from the **NVIDIA OpenCodeInstruct** dataset. |
| * It consists of the **first 75,000 rows** extracted from the original dataset. |
| * The underlying content of those rows was not modified during extraction. |
|
|
| #### Modifications in OpenCodeInstruct 75K |
|
|
| * The dataset was **limited to the first 75,000 rows** of the original dataset. |
| * No additional filtering or semantic modification was performed beyond this row limit. |
| * The subset was redistributed separately under the **BSD 3-Clause License**. |
|
|
| #### Use in This Dataset |
|
|
| * **OpenCodeInstruct 75K** is included as one of the sources used to build this final merged dataset. |
| * While OpenCodeInstruct 75K itself contains **75,000 rows**, the **final merged dataset contains more rows**, because it also incorporates the other datasets listed in this document. |
|
|
| --- |
|
|
| ## Notes |
|
|
| * All datasets were reformatted to a unified JSONL schema (`instruction`, `response`). |
| * This project performs aggregation, subsetting, and schema standardization. |
| * No claim of original authorship over the underlying data is made. |
| * Attribution to the original datasets is preserved in accordance with their respective licenses. |
|
|
| ## License |
|
|
| This dataset is released under the BSD 3-Clause License. |
| License URL: https://opensource.org/licenses/BSD-3-Clause |