Aswini-Kumar commited on
Commit
008271f
·
verified ·
1 Parent(s): 9551003

fix: README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -1
README.md CHANGED
@@ -58,7 +58,13 @@ Session 1 works on the problem and writes a structured handoff note. Session 2 s
58
 
59
  ![Reward Curve](plots/reward_curve.png)
60
 
61
- *Clear sigmoid rise through 3-phase curriculum (Easy → Medium → Hard). All 4 conditions on same axes. Confidence band shows training stability.*
 
 
 
 
 
 
62
 
63
  ### Why It Works — Ablation Study
64
 
 
58
 
59
  ![Reward Curve](plots/reward_curve.png)
60
 
61
+ *Clear sigmoid rise through 3-phase curriculum (Easy → Medium → Hard). All 4 conditions on same axes.*
62
+
63
+ ### Training Loss — Policy Loss + KL Divergence
64
+
65
+ ![Loss Curve](plots/loss_curve.png)
66
+
67
+ *Policy loss decays from ~2.1 to ~0.25 over 300 steps. KL divergence stabilises below the 0.05 target after epoch 2.*
68
 
69
  ### Why It Works — Ablation Study
70