GiuLeo01
/

FortranCodeGen-3B-SynthData

Text Generation

reinforcement learning

text-generation-inference

Model card Files Files and versions

GiuLeo01 commited on May 19

Commit

eea4487

·

verified ·

1 Parent(s): a7252cc

Update README.md

Files changed (1) hide show

README.md +22 -0

README.md CHANGED Viewed

@@ -152,6 +152,28 @@ A second phase followed, resetting the learning rate to `1e-6` with a linear dec
 ![Correct Reward](./imgs/grpo_2_correct_reward.png)
 ![Tot Reward](./imgs/grpo_2_tot_reward.png)

 ![Correct Reward](./imgs/grpo_2_correct_reward.png)
 ![Tot Reward](./imgs/grpo_2_tot_reward.png)
+## Citation
+If you use this model or parts of this work, please consider citing the references below.
+## References
+* Qwen/Qwen2-5-Coder-3B-Instruct
+  [https://huggingface.co/Qwen/Qwen2-5-Coder-3B-Instruct](https://huggingface.co/Qwen/Qwen2-5-Coder-3B-Instruct)
+* Group Relative Policy Optimization (GRPO):
+  [https://arxiv.org/abs/2205.13636](https://arxiv.org/abs/2205.13636)
+* Unsloth – Fast and memory-efficient fine-tuning via QLoRA
+  [https://github.com/unslothai/unsloth](https://github.com/unslothai/unsloth)
+* Hugging Face Transformers
+  [https://github.com/huggingface/transformers](https://github.com/huggingface/transformers)