Stage 4B: ship ep10 checkpoint (peak F1 0.726 vs ep15 0.723)
Browse files- stage_4b/README.md +5 -5
stage_4b/README.md
CHANGED
|
@@ -13,13 +13,13 @@ Tried the natural next knobs on Stage 4's specialist student: 5× bigger model,
|
|
| 13 |
## Result
|
| 14 |
|
| 15 |
```
|
| 16 |
-
Stage Student params Loss F1
|
| 17 |
-
4 3.27 M MSE on 40-D 0.
|
| 18 |
-
4B 15.67 M cosine on 768-D 0.
|
| 19 |
-
0 85.64 M (ViT-B) baseline 0.889
|
| 20 |
```
|
| 21 |
|
| 22 |
-
Cosine loss converged in epoch 1 (0.072 → 0.061) and stayed flat through epoch 15. F1
|
| 23 |
|
| 24 |
## What this says
|
| 25 |
|
|
|
|
| 13 |
## Result
|
| 14 |
|
| 15 |
```
|
| 16 |
+
Stage Student params Loss F1 checkpoint
|
| 17 |
+
4 3.27 M MSE on 40-D 0.717 ep3
|
| 18 |
+
4B 15.67 M cosine on 768-D 0.726 ep10 (shipped)
|
| 19 |
+
0 85.64 M (ViT-B) baseline 0.889 —
|
| 20 |
```
|
| 21 |
|
| 22 |
+
Cosine loss converged in epoch 1 (0.072 → 0.061) and stayed flat through epoch 15. F1 peaked at 0.726 at epoch 10; epoch 15 drifted down to 0.723. The shipped `student_final.safetensors` is the epoch 10 checkpoint.
|
| 23 |
|
| 24 |
## What this says
|
| 25 |
|