GiuLeo01
/

FortranCodeGen-3B-SynthData

Text Generation

reinforcement learning

text-generation-inference

Model card Files Files and versions

GiuLeo01 commited on May 19

Commit

a8afb5e

·

verified ·

1 Parent(s): e53ae2c

Update README.md

Files changed (1) hide show

README.md +16 -4

README.md CHANGED Viewed

@@ -3,9 +3,9 @@ library_name: transformers
 tags:
 - unsloth
 - grpo
-- reinforcement_learning
 - sft
 - fortran
 base_model:
 - Qwen/Qwen2.5-Coder-3B-Instruct
 ---
@@ -54,7 +54,6 @@ output = generator([{"role": "user", "content": question}], max_new_tokens=128,
 print(output["generated_text"])
 ```
-[More Information Needed]
 ## Training Details
@@ -62,6 +61,19 @@ print(output["generated_text"])
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure

 tags:
 - unsloth
 - grpo
 - sft
 - fortran
+- rl
 base_model:
 - Qwen/Qwen2.5-Coder-3B-Instruct
 ---
 print(output["generated_text"])
 ```
 ## Training Details
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+The goal of this experiment was to specialize a model in a complex task, like Fortran code generation, without using manually annotated data, which is particularly hard to find for this programming language.
+#### Supervised Data
+1) A subset of the MBPP dataset (~600 examples) was selected.
+2) The task descriptions were automatically adapted to Fortran using precise instructions, using OpenAI o3-mini.
+3) Tasks were filtered using embeddings and manually reviewed to ensure that no examples too similar to HumanEval tasks were included in the training set.
+4) Each task was automatically labeled using three stronger (and bigger) models: OpenAI o3-mini, Qwen 2.5 Coder 32B, and OpenAI GPT-4o.
+5) Labels were automatically validated through unit tests.
+6) Only correct solutions were kept, at most one per task, prioritized in the following order: OpenAI o3-mini > Qwen 2.5 Coder 32B > OpenAI GPT-4o.
+This simple process led to the creation of a small, synthetically labeled training set used for supervised fine-tuning.
+IMPORTANT: Do not validate this model on MBPP-derived benchmarks due to the data overlap.
+### Training Procedure