GiuLeo01
/

FortranCodeGen-3B-SynthData

Text Generation

reinforcement learning

text-generation-inference

Model card Files Files and versions

GiuLeo01 commited on May 19, 2025

Commit

a7252cc

·

verified ·

1 Parent(s): 1a4e6bf

Update README.md

Files changed (1) hide show

README.md +9 -0

README.md CHANGED Viewed

@@ -134,6 +134,11 @@ The reward function used throughout this phase was very simple:
 The initial training phase was run for 3 epochs with:
 * batch size = 16
 * number of generations = 4
 * learning rate = 1e-5
@@ -143,6 +148,10 @@ The initial training phase was run for 3 epochs with:
 A second phase followed, resetting the learning rate to `1e-6` with a linear decay schedule.

 The initial training phase was run for 3 epochs with:
+![Compile Reward](./imgs/grpo_1_compile_reward.png)
+![Correct Reward](./imgs/grpo_1_correct_reward.png)
+![Tot Reward](./imgs/grpo_1_tot_reward.png)
 * batch size = 16
 * number of generations = 4
 * learning rate = 1e-5
 A second phase followed, resetting the learning rate to `1e-6` with a linear decay schedule.
+![Compile Reward](./imgs/grpo_2_compile_reward.png)
+![Correct Reward](./imgs/grpo_2_correct_reward.png)
+![Tot Reward](./imgs/grpo_2_tot_reward.png)