Update README.md
Browse files
README.md
CHANGED
|
@@ -152,6 +152,28 @@ A second phase followed, resetting the learning rate to `1e-6` with a linear dec
|
|
| 152 |

|
| 153 |

|
| 154 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 155 |
|
| 156 |
|
| 157 |
|
|
|
|
| 152 |

|
| 153 |

|
| 154 |
|
| 155 |
+
## Citation
|
| 156 |
+
|
| 157 |
+
If you use this model or parts of this work, please consider citing the references below.
|
| 158 |
+
|
| 159 |
+
## References
|
| 160 |
+
|
| 161 |
+
* Qwen/Qwen2-5-Coder-3B-Instruct
|
| 162 |
+
[https://huggingface.co/Qwen/Qwen2-5-Coder-3B-Instruct](https://huggingface.co/Qwen/Qwen2-5-Coder-3B-Instruct)
|
| 163 |
+
|
| 164 |
+
* Group Relative Policy Optimization (GRPO):
|
| 165 |
+
[https://arxiv.org/abs/2205.13636](https://arxiv.org/abs/2205.13636)
|
| 166 |
+
|
| 167 |
+
* Unsloth – Fast and memory-efficient fine-tuning via QLoRA
|
| 168 |
+
[https://github.com/unslothai/unsloth](https://github.com/unslothai/unsloth)
|
| 169 |
+
|
| 170 |
+
* Hugging Face Transformers
|
| 171 |
+
[https://github.com/huggingface/transformers](https://github.com/huggingface/transformers)
|
| 172 |
+
|
| 173 |
+
|
| 174 |
+
|
| 175 |
+
|
| 176 |
+
|
| 177 |
|
| 178 |
|
| 179 |
|