Update README.md
Browse files
README.md
CHANGED
|
@@ -3,9 +3,9 @@ library_name: transformers
|
|
| 3 |
tags:
|
| 4 |
- unsloth
|
| 5 |
- grpo
|
| 6 |
-
- reinforcement_learning
|
| 7 |
- sft
|
| 8 |
- fortran
|
|
|
|
| 9 |
base_model:
|
| 10 |
- Qwen/Qwen2.5-Coder-3B-Instruct
|
| 11 |
---
|
|
@@ -54,7 +54,6 @@ output = generator([{"role": "user", "content": question}], max_new_tokens=128,
|
|
| 54 |
print(output["generated_text"])
|
| 55 |
```
|
| 56 |
|
| 57 |
-
[More Information Needed]
|
| 58 |
|
| 59 |
## Training Details
|
| 60 |
|
|
@@ -62,6 +61,19 @@ print(output["generated_text"])
|
|
| 62 |
|
| 63 |
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
| 64 |
|
| 65 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 66 |
|
| 67 |
-
### Training Procedure
|
|
|
|
| 3 |
tags:
|
| 4 |
- unsloth
|
| 5 |
- grpo
|
|
|
|
| 6 |
- sft
|
| 7 |
- fortran
|
| 8 |
+
- rl
|
| 9 |
base_model:
|
| 10 |
- Qwen/Qwen2.5-Coder-3B-Instruct
|
| 11 |
---
|
|
|
|
| 54 |
print(output["generated_text"])
|
| 55 |
```
|
| 56 |
|
|
|
|
| 57 |
|
| 58 |
## Training Details
|
| 59 |
|
|
|
|
| 61 |
|
| 62 |
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
| 63 |
|
| 64 |
+
The goal of this experiment was to specialize a model in a complex task, like Fortran code generation, without using manually annotated data, which is particularly hard to find for this programming language.
|
| 65 |
+
|
| 66 |
+
#### Supervised Data
|
| 67 |
+
1) A subset of the MBPP dataset (~600 examples) was selected.
|
| 68 |
+
2) The task descriptions were automatically adapted to Fortran using precise instructions, using OpenAI o3-mini.
|
| 69 |
+
3) Tasks were filtered using embeddings and manually reviewed to ensure that no examples too similar to HumanEval tasks were included in the training set.
|
| 70 |
+
4) Each task was automatically labeled using three stronger (and bigger) models: OpenAI o3-mini, Qwen 2.5 Coder 32B, and OpenAI GPT-4o.
|
| 71 |
+
5) Labels were automatically validated through unit tests.
|
| 72 |
+
6) Only correct solutions were kept, at most one per task, prioritized in the following order: OpenAI o3-mini > Qwen 2.5 Coder 32B > OpenAI GPT-4o.
|
| 73 |
+
|
| 74 |
+
This simple process led to the creation of a small, synthetically labeled training set used for supervised fine-tuning.
|
| 75 |
+
|
| 76 |
+
IMPORTANT: Do not validate this model on MBPP-derived benchmarks due to the data overlap.
|
| 77 |
+
|
| 78 |
+
### Training Procedure
|
| 79 |
|
|
|