GiuLeo01 commited on
Commit
a8afb5e
·
verified ·
1 Parent(s): e53ae2c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -4
README.md CHANGED
@@ -3,9 +3,9 @@ library_name: transformers
3
  tags:
4
  - unsloth
5
  - grpo
6
- - reinforcement_learning
7
  - sft
8
  - fortran
 
9
  base_model:
10
  - Qwen/Qwen2.5-Coder-3B-Instruct
11
  ---
@@ -54,7 +54,6 @@ output = generator([{"role": "user", "content": question}], max_new_tokens=128,
54
  print(output["generated_text"])
55
  ```
56
 
57
- [More Information Needed]
58
 
59
  ## Training Details
60
 
@@ -62,6 +61,19 @@ print(output["generated_text"])
62
 
63
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
64
 
65
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66
 
67
- ### Training Procedure
 
3
  tags:
4
  - unsloth
5
  - grpo
 
6
  - sft
7
  - fortran
8
+ - rl
9
  base_model:
10
  - Qwen/Qwen2.5-Coder-3B-Instruct
11
  ---
 
54
  print(output["generated_text"])
55
  ```
56
 
 
57
 
58
  ## Training Details
59
 
 
61
 
62
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
63
 
64
+ The goal of this experiment was to specialize a model in a complex task, like Fortran code generation, without using manually annotated data, which is particularly hard to find for this programming language.
65
+
66
+ #### Supervised Data
67
+ 1) A subset of the MBPP dataset (~600 examples) was selected.
68
+ 2) The task descriptions were automatically adapted to Fortran using precise instructions, using OpenAI o3-mini.
69
+ 3) Tasks were filtered using embeddings and manually reviewed to ensure that no examples too similar to HumanEval tasks were included in the training set.
70
+ 4) Each task was automatically labeled using three stronger (and bigger) models: OpenAI o3-mini, Qwen 2.5 Coder 32B, and OpenAI GPT-4o.
71
+ 5) Labels were automatically validated through unit tests.
72
+ 6) Only correct solutions were kept, at most one per task, prioritized in the following order: OpenAI o3-mini > Qwen 2.5 Coder 32B > OpenAI GPT-4o.
73
+
74
+ This simple process led to the creation of a small, synthetically labeled training set used for supervised fine-tuning.
75
+
76
+ IMPORTANT: Do not validate this model on MBPP-derived benchmarks due to the data overlap.
77
+
78
+ ### Training Procedure
79