fix: README.md
Browse files
README.md
CHANGED
|
@@ -58,7 +58,13 @@ Session 1 works on the problem and writes a structured handoff note. Session 2 s
|
|
| 58 |
|
| 59 |

|
| 60 |
|
| 61 |
-
*Clear sigmoid rise through 3-phase curriculum (Easy → Medium → Hard). All 4 conditions on same axes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 62 |
|
| 63 |
### Why It Works — Ablation Study
|
| 64 |
|
|
|
|
| 58 |
|
| 59 |

|
| 60 |
|
| 61 |
+
*Clear sigmoid rise through 3-phase curriculum (Easy → Medium → Hard). All 4 conditions on same axes.*
|
| 62 |
+
|
| 63 |
+
### Training Loss — Policy Loss + KL Divergence
|
| 64 |
+
|
| 65 |
+

|
| 66 |
+
|
| 67 |
+
*Policy loss decays from ~2.1 to ~0.25 over 300 steps. KL divergence stabilises below the 0.05 target after epoch 2.*
|
| 68 |
|
| 69 |
### Why It Works — Ablation Study
|
| 70 |
|