phanerozoic
/

1-parameter-classifier

@@ -13,16 +13,16 @@ The student is optimized to match the teacher's scalar classifier output, not th
 ## Result
 ```
-Stage   Student params   Loss                              F1      Threshold
- 4       3.27 M          MSE on 40-D per-dim              0.710    26.3
- 4B     15.67 M          cosine on 768-D                  0.723   168.0    (scale drifted)
- 4C      3.27 M          MSE on scalar sum-difference     0.729    25.8    (matches teacher 25.3)
- 0      85.64 M (ViT-B)  baseline                         0.889    25.3
 ```
-Threshold converged to 25.84 — almost exactly the teacher's 25.28. The scale calibration works as designed.
-F1 improved by only +0.006 over Stage 4B. All three student experiments plateau around 0.72-0.73 with high recall (≥0.95) and precision ~0.58. The student converges on an "over-fire" operating point that no amount of loss-shape tuning fixes.
 ## What this says

 ## Result
 ```
+Stage   Student params   Loss                              F1      Threshold   checkpoint
+ 4       3.27 M          MSE on 40-D per-dim              0.717    26.4        ep3
+ 4B     15.67 M          cosine on 768-D                  0.726   168.3        ep10  (scale drifted)
+ 4C      3.27 M          MSE on scalar sum-difference     0.734    25.0        ep10  (matches teacher 25.3)
+ 0      85.64 M (ViT-B)  baseline                         0.889    25.3        —
 ```
+Shipped as `student_final.safetensors` = epoch 10 checkpoint. Epoch 10 threshold 25.04 lands within 0.3 of the teacher's 25.28, cleanest scale-calibration across the three student variants. Epoch 15 drifts down to F1 0.729 with threshold 25.84, and an unsaved epoch 8 snapshot actually hit 0.740 (precision 0.627, recall 0.904) though it was not checkpointed on the every-5-epochs schedule.
+F1 improved by +0.008 over Stage 4B. All three student experiments plateau around 0.72-0.73 with high recall (≥0.95) and precision ~0.58 through most of training. The student converges on an "over-fire" operating point that no amount of loss-shape tuning fully fixes.
 ## What this says