GiuLeo01 commited on
Commit
4152f15
·
verified ·
1 Parent(s): 6b29cc5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -3
README.md CHANGED
@@ -153,10 +153,16 @@ If you use this model or parts of this work, please consider citing the referenc
153
 
154
  ## References
155
 
156
- * Qwen/Qwen2-5-Coder-3B-Instruct
157
  [https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct)
158
 
159
- * Group Relative Policy Optimization (GRPO):
 
 
 
 
 
 
160
  [https://arxiv.org/abs/2402.03300](https://arxiv.org/abs/2402.03300)
161
 
162
  * Unsloth – Fast and memory-efficient fine-tuning via QLoRA
@@ -166,7 +172,6 @@ If you use this model or parts of this work, please consider citing the referenc
166
  [https://github.com/huggingface/transformers](https://github.com/huggingface/transformers)
167
 
168
 
169
-
170
  ## Disclaimer on Use of Proprietary Models
171
 
172
  Some of the training data used for this model was generated or labeled using proprietary large language models, including OpenAI o3-mini and GPT-4o. These models were used to synthesize programming tasks, adapt natural language descriptions, and automatically label code solutions for supervised fine-tuning and reinforcement learning.
 
153
 
154
  ## References
155
 
156
+ * Qwen/Qwen2.5-Coder-3B-Instruct
157
  [https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct)
158
 
159
+ * OpenAI o3-mini
160
+ [https://platform.openai.com/docs/models](https://platform.openai.com/docs/models)
161
+
162
+ * OpenAI GPT-4o
163
+ [https://openai.com/index/gpt-4o](https://openai.com/index/gpt-4o)
164
+
165
+ * Group Relative Policy Optimization (GRPO)
166
  [https://arxiv.org/abs/2402.03300](https://arxiv.org/abs/2402.03300)
167
 
168
  * Unsloth – Fast and memory-efficient fine-tuning via QLoRA
 
172
  [https://github.com/huggingface/transformers](https://github.com/huggingface/transformers)
173
 
174
 
 
175
  ## Disclaimer on Use of Proprietary Models
176
 
177
  Some of the training data used for this model was generated or labeled using proprietary large language models, including OpenAI o3-mini and GPT-4o. These models were used to synthesize programming tasks, adapt natural language descriptions, and automatically label code solutions for supervised fine-tuning and reinforcement learning.