GiuLeo01
/

FortranCodeGen-3B-SynthData

Text Generation

reinforcement learning

text-generation-inference

Model card Files Files and versions

GiuLeo01 commited on May 19, 2025

Commit

4152f15

·

verified ·

1 Parent(s): 6b29cc5

Update README.md

Files changed (1) hide show

README.md +8 -3

README.md CHANGED Viewed

@@ -153,10 +153,16 @@ If you use this model or parts of this work, please consider citing the referenc
 ## References
-* Qwen/Qwen2-5-Coder-3B-Instruct
   [https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct)
-* Group Relative Policy Optimization (GRPO):
   [https://arxiv.org/abs/2402.03300](https://arxiv.org/abs/2402.03300)
 * Unsloth – Fast and memory-efficient fine-tuning via QLoRA
@@ -166,7 +172,6 @@ If you use this model or parts of this work, please consider citing the referenc
   [https://github.com/huggingface/transformers](https://github.com/huggingface/transformers)
 ## Disclaimer on Use of Proprietary Models
 Some of the training data used for this model was generated or labeled using proprietary large language models, including OpenAI o3-mini and GPT-4o. These models were used to synthesize programming tasks, adapt natural language descriptions, and automatically label code solutions for supervised fine-tuning and reinforcement learning.

 ## References
+* Qwen/Qwen2.5-Coder-3B-Instruct
   [https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct)
+* OpenAI o3-mini
+  [https://platform.openai.com/docs/models](https://platform.openai.com/docs/models)
+* OpenAI GPT-4o
+  [https://openai.com/index/gpt-4o](https://openai.com/index/gpt-4o)
+* Group Relative Policy Optimization (GRPO)
   [https://arxiv.org/abs/2402.03300](https://arxiv.org/abs/2402.03300)
 * Unsloth – Fast and memory-efficient fine-tuning via QLoRA
   [https://github.com/huggingface/transformers](https://github.com/huggingface/transformers)
 ## Disclaimer on Use of Proprietary Models
 Some of the training data used for this model was generated or labeled using proprietary large language models, including OpenAI o3-mini and GPT-4o. These models were used to synthesize programming tasks, adapt natural language descriptions, and automatically label code solutions for supervised fine-tuning and reinforcement learning.