Stage 4C: ship ep10 checkpoint (peak F1 0.734 vs ep15 0.729)
Browse files- stage_4c/README.md +7 -7
stage_4c/README.md
CHANGED
|
@@ -13,16 +13,16 @@ The student is optimized to match the teacher's scalar classifier output, not th
|
|
| 13 |
## Result
|
| 14 |
|
| 15 |
```
|
| 16 |
-
Stage Student params Loss F1 Threshold
|
| 17 |
-
4 3.27 M MSE on 40-D per-dim 0.
|
| 18 |
-
4B 15.67 M cosine on 768-D 0.
|
| 19 |
-
4C 3.27 M MSE on scalar sum-difference 0.
|
| 20 |
-
0 85.64 M (ViT-B) baseline 0.889 25.3
|
| 21 |
```
|
| 22 |
|
| 23 |
-
|
| 24 |
|
| 25 |
-
F1 improved by
|
| 26 |
|
| 27 |
## What this says
|
| 28 |
|
|
|
|
| 13 |
## Result
|
| 14 |
|
| 15 |
```
|
| 16 |
+
Stage Student params Loss F1 Threshold checkpoint
|
| 17 |
+
4 3.27 M MSE on 40-D per-dim 0.717 26.4 ep3
|
| 18 |
+
4B 15.67 M cosine on 768-D 0.726 168.3 ep10 (scale drifted)
|
| 19 |
+
4C 3.27 M MSE on scalar sum-difference 0.734 25.0 ep10 (matches teacher 25.3)
|
| 20 |
+
0 85.64 M (ViT-B) baseline 0.889 25.3 —
|
| 21 |
```
|
| 22 |
|
| 23 |
+
Shipped as `student_final.safetensors` = epoch 10 checkpoint. Epoch 10 threshold 25.04 lands within 0.3 of the teacher's 25.28, cleanest scale-calibration across the three student variants. Epoch 15 drifts down to F1 0.729 with threshold 25.84, and an unsaved epoch 8 snapshot actually hit 0.740 (precision 0.627, recall 0.904) though it was not checkpointed on the every-5-epochs schedule.
|
| 24 |
|
| 25 |
+
F1 improved by +0.008 over Stage 4B. All three student experiments plateau around 0.72-0.73 with high recall (≥0.95) and precision ~0.58 through most of training. The student converges on an "over-fire" operating point that no amount of loss-shape tuning fully fixes.
|
| 26 |
|
| 27 |
## What this says
|
| 28 |
|