# Credits & Licenses This dataset is a merged and standardized version of multiple public datasets. All original authors retain their rights under their respective licenses. --- ## Sources ### Vezora/Tested-22k-Python-Alpaca * Author: Vezora * License: Apache License 2.0 * License URL: https://www.apache.org/licenses/LICENSE-2.0 * Source: https://huggingface.co/datasets/Vezora/Tested-22k-Python-Alpaca ### CyberNative/Code_Vulnerability_Security_DPO * Author: CyberNative * License: Apache License 2.0 * License URL: https://www.apache.org/licenses/LICENSE-2.0 * Source: https://huggingface.co/datasets/CyberNative/Code_Vulnerability_Security_DPO ### pedrodev2026/open-code-instruct-75k * Author: pedrodev2026 * License: BSD 3-Clause License * License URL: https://opensource.org/licenses/BSD-3-Clause * Source: https://huggingface.co/datasets/pedrodev2026/open-code-instruct-75k #### Original Dataset (of the OpenCodeInstruct 75K dataset above) * Name: NVIDIA OpenCodeInstruct * Author: NVIDIA * License: Creative Commons Attribution 4.0 (CC-BY 4.0) * License URL: https://creativecommons.org/licenses/by/4.0/ * Source: https://huggingface.co/datasets/nvidia/OpenCodeInstruct #### Relationship to the Original Dataset * **OpenCodeInstruct 75K** is derived from the **NVIDIA OpenCodeInstruct** dataset. * It consists of the **first 75,000 rows** extracted from the original dataset. * The underlying content of those rows was not modified during extraction. #### Modifications in OpenCodeInstruct 75K * The dataset was **limited to the first 75,000 rows** of the original dataset. * No additional filtering or semantic modification was performed beyond this row limit. * The subset was redistributed separately under the **BSD 3-Clause License**. #### Use in This Dataset * **OpenCodeInstruct 75K** is included as one of the sources used to build this final merged dataset. * While OpenCodeInstruct 75K itself contains **75,000 rows**, the **final merged dataset contains more rows**, because it also incorporates the other datasets listed in this document. --- ## Notes * All datasets were reformatted to a unified JSONL schema (`instruction`, `response`). * This project performs aggregation, subsetting, and schema standardization. * No claim of original authorship over the underlying data is made. * Attribution to the original datasets is preserved in accordance with their respective licenses. ## License This dataset is released under the BSD 3-Clause License. License URL: https://opensource.org/licenses/BSD-3-Clause