Update README.md
Browse files
README.md
CHANGED
|
@@ -153,10 +153,16 @@ If you use this model or parts of this work, please consider citing the referenc
|
|
| 153 |
|
| 154 |
## References
|
| 155 |
|
| 156 |
-
* Qwen/Qwen2
|
| 157 |
[https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct)
|
| 158 |
|
| 159 |
-
*
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 160 |
[https://arxiv.org/abs/2402.03300](https://arxiv.org/abs/2402.03300)
|
| 161 |
|
| 162 |
* Unsloth – Fast and memory-efficient fine-tuning via QLoRA
|
|
@@ -166,7 +172,6 @@ If you use this model or parts of this work, please consider citing the referenc
|
|
| 166 |
[https://github.com/huggingface/transformers](https://github.com/huggingface/transformers)
|
| 167 |
|
| 168 |
|
| 169 |
-
|
| 170 |
## Disclaimer on Use of Proprietary Models
|
| 171 |
|
| 172 |
Some of the training data used for this model was generated or labeled using proprietary large language models, including OpenAI o3-mini and GPT-4o. These models were used to synthesize programming tasks, adapt natural language descriptions, and automatically label code solutions for supervised fine-tuning and reinforcement learning.
|
|
|
|
| 153 |
|
| 154 |
## References
|
| 155 |
|
| 156 |
+
* Qwen/Qwen2.5-Coder-3B-Instruct
|
| 157 |
[https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct)
|
| 158 |
|
| 159 |
+
* OpenAI o3-mini
|
| 160 |
+
[https://platform.openai.com/docs/models](https://platform.openai.com/docs/models)
|
| 161 |
+
|
| 162 |
+
* OpenAI GPT-4o
|
| 163 |
+
[https://openai.com/index/gpt-4o](https://openai.com/index/gpt-4o)
|
| 164 |
+
|
| 165 |
+
* Group Relative Policy Optimization (GRPO)
|
| 166 |
[https://arxiv.org/abs/2402.03300](https://arxiv.org/abs/2402.03300)
|
| 167 |
|
| 168 |
* Unsloth – Fast and memory-efficient fine-tuning via QLoRA
|
|
|
|
| 172 |
[https://github.com/huggingface/transformers](https://github.com/huggingface/transformers)
|
| 173 |
|
| 174 |
|
|
|
|
| 175 |
## Disclaimer on Use of Proprietary Models
|
| 176 |
|
| 177 |
Some of the training data used for this model was generated or labeled using proprietary large language models, including OpenAI o3-mini and GPT-4o. These models were used to synthesize programming tasks, adapt natural language descriptions, and automatically label code solutions for supervised fine-tuning and reinforcement learning.
|