Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -10,6 +10,7 @@ https://wandb.ai/leon_at_work/soni_ablation_4b/workspace
|
|
| 10 |
| Variant Folder | Short Description | Checkpoint Step |
|
| 11 |
|---|---|---|
|
| 12 |
| `Baseline 4B` | Baseline configuration used for comparison | 200 |
|
|
|
|
| 13 |
| `a4_length_norm_sqrt` | GRPO with sqrt length normalization (plus normalized reduction) | 200 |
|
| 14 |
| `a5_sapo_on_sqrt` | SAPO on top of sqrt length-normalized setting | 200 |
|
| 15 |
| `a6_eps_clip_02` | Sqrt length-normalized setting with PPO clip set to 0.2/0.2 | 200 |
|
|
@@ -25,6 +26,7 @@ Baseline reference (`Baseline 4B`):
|
|
| 25 |
|
| 26 |
| Variant Folder | Parameter Delta vs `Baseline 4B` |
|
| 27 |
|---|---|
|
|
|
|
| 28 |
| `a4_length_norm_sqrt` | `advantage_estimator=grpo_length_norm_sqrt`; `loss_reduction=seq_mean_token_sum_norm` |
|
| 29 |
| `a5_sapo_on_sqrt` | same as `a4_length_norm_sqrt`, plus `policy_loss_type=sapo` |
|
| 30 |
| `a6_eps_clip_02` | `a4`-style setup, with `eps_clip_low=0.2`, `eps_clip_high=0.2` |
|
|
|
|
| 10 |
| Variant Folder | Short Description | Checkpoint Step |
|
| 11 |
|---|---|---|
|
| 12 |
| `Baseline 4B` | Baseline configuration used for comparison | 200 |
|
| 13 |
+
| `a2_sapo` | Vanilla SAPO (norm_by_std=true + sapo + eps=0.2 + sequence_mean) | 200 |
|
| 14 |
| `a4_length_norm_sqrt` | GRPO with sqrt length normalization (plus normalized reduction) | 200 |
|
| 15 |
| `a5_sapo_on_sqrt` | SAPO on top of sqrt length-normalized setting | 200 |
|
| 16 |
| `a6_eps_clip_02` | Sqrt length-normalized setting with PPO clip set to 0.2/0.2 | 200 |
|
|
|
|
| 26 |
|
| 27 |
| Variant Folder | Parameter Delta vs `Baseline 4B` |
|
| 28 |
|---|---|
|
| 29 |
+
| `a2_sapo` | `advantage_estimator=grpo`; `policy_loss_type=sapo`; `loss_reduction=sequence_mean`; `eps_clip_low/high=0.2/0.2`; `grpo_norm_by_std=true` |
|
| 30 |
| `a4_length_norm_sqrt` | `advantage_estimator=grpo_length_norm_sqrt`; `loss_reduction=seq_mean_token_sum_norm` |
|
| 31 |
| `a5_sapo_on_sqrt` | same as `a4_length_norm_sqrt`, plus `policy_loss_type=sapo` |
|
| 32 |
| `a6_eps_clip_02` | `a4`-style setup, with `eps_clip_low=0.2`, `eps_clip_high=0.2` |
|