# Stage 4C: Direct Classifier-Score Supervision Same 3.27 M student as Stage 4. Same 40-D output. Different loss: ```python student_score = student_out[pos_dims].sum() - student_out[neg_dims].sum() teacher_score = teacher_target[pos_dims].sum() - teacher_target[neg_dims].sum() loss = (student_score - teacher_score) ** 2 ``` The student is optimized to match the teacher's scalar classifier output, not the 768-D feature vector (Stage 4B) or the 40 individual dims (Stage 4A). ## Result ``` Stage Student params Loss F1 Threshold checkpoint 4 3.27 M MSE on 40-D per-dim 0.717 26.4 ep3 4B 15.67 M cosine on 768-D 0.726 165.9 ep10 (scale drifted) 4C 3.27 M MSE on scalar sum-difference 0.734 25.0 ep10 (matches teacher 25.3) 0 85.64 M (ViT-B) baseline 0.889 25.3 — ``` Shipped as `student_final.safetensors` = epoch 10 checkpoint. Epoch 10 threshold 25.04 lands within 0.3 of the teacher's 25.28, cleanest scale-calibration across the three student variants. Epoch 15 drifts down to F1 0.729 with threshold 25.84, and an unsaved epoch 8 snapshot actually hit 0.740 (precision 0.627, recall 0.904) though it was not checkpointed on the every-5-epochs schedule. F1 improved by +0.008 over Stage 4B. All three student experiments plateau around 0.72-0.73 with high recall (≥0.95) and precision ~0.58 through most of training. The student converges on an "over-fire" operating point that no amount of loss-shape tuning fully fixes. ## What this says The bottleneck is not loss choice or target geometry but the student's ability to learn the underlying scene-level signal at this scale. Closing the F1 gap to baseline 0.889 at the 3 M parameter tier probably requires: - Stronger image augmentation (mosaic, color jitter, rand-augment) - Warm-starting from a pre-trained backbone (EUPE-ViT-T already distilled) rather than from scratch - More training data beyond COCO-only (117 K images is tight for a specialist from scratch) Parameter scaling alone doesn't help; loss reshape alone doesn't help. Data and initialization are the remaining knobs. ## Files - `train.py` — training loop (direct scalar MSE) - `student_ep{5,10,15}.safetensors` — intermediate checkpoints - `student_final.safetensors` — final weights - `training_log.json` — per-epoch loss + F1 Uses the same `student.py` as Stage 4.