phanerozoic commited on
Commit
32a21e5
·
verified ·
1 Parent(s): 0ef7d0d

Stage 4C: ship ep10 checkpoint (peak F1 0.734 vs ep15 0.729)

Browse files
Files changed (1) hide show
  1. stage_4c/README.md +7 -7
stage_4c/README.md CHANGED
@@ -13,16 +13,16 @@ The student is optimized to match the teacher's scalar classifier output, not th
13
  ## Result
14
 
15
  ```
16
- Stage Student params Loss F1 Threshold
17
- 4 3.27 M MSE on 40-D per-dim 0.710 26.3
18
- 4B 15.67 M cosine on 768-D 0.723 168.0 (scale drifted)
19
- 4C 3.27 M MSE on scalar sum-difference 0.729 25.8 (matches teacher 25.3)
20
- 0 85.64 M (ViT-B) baseline 0.889 25.3
21
  ```
22
 
23
- Threshold converged to 25.84 almost exactly the teacher's 25.28. The scale calibration works as designed.
24
 
25
- F1 improved by only +0.006 over Stage 4B. All three student experiments plateau around 0.72-0.73 with high recall (≥0.95) and precision ~0.58. The student converges on an "over-fire" operating point that no amount of loss-shape tuning fixes.
26
 
27
  ## What this says
28
 
 
13
  ## Result
14
 
15
  ```
16
+ Stage Student params Loss F1 Threshold checkpoint
17
+ 4 3.27 M MSE on 40-D per-dim 0.717 26.4 ep3
18
+ 4B 15.67 M cosine on 768-D 0.726 168.3 ep10 (scale drifted)
19
+ 4C 3.27 M MSE on scalar sum-difference 0.734 25.0 ep10 (matches teacher 25.3)
20
+ 0 85.64 M (ViT-B) baseline 0.889 25.3
21
  ```
22
 
23
+ Shipped as `student_final.safetensors` = epoch 10 checkpoint. Epoch 10 threshold 25.04 lands within 0.3 of the teacher's 25.28, cleanest scale-calibration across the three student variants. Epoch 15 drifts down to F1 0.729 with threshold 25.84, and an unsaved epoch 8 snapshot actually hit 0.740 (precision 0.627, recall 0.904) though it was not checkpointed on the every-5-epochs schedule.
24
 
25
+ F1 improved by +0.008 over Stage 4B. All three student experiments plateau around 0.72-0.73 with high recall (≥0.95) and precision ~0.58 through most of training. The student converges on an "over-fire" operating point that no amount of loss-shape tuning fully fixes.
26
 
27
  ## What this says
28