# Stage 2: Attention-Head Pruning Ablated each of the 144 (block, head) pairs in EUPE-ViT-B individually and measured F1 on 1000 COCO val images with the Stage 0 classifier. Ranked heads by individual F1 drop (smallest drop = most prunable), then swept the cumulative pruning curve. ## Headline result Pruning the 10 most prunable heads *improves* F1 from 0.894 to 0.916. Those heads were injecting noise that hurt the person task. Further pruning up to K=20 is still ahead of baseline. At K=30 the classifier collapses as important heads are removed. ## Pruning curve ``` K pruned F1 ΔF1 vs baseline 1 0.9037 +0.010 5 0.9086 +0.015 10 0.9159 +0.022 <- peak 15 0.8949 +0.001 20 0.8971 +0.003 30 0.3267 -0.567 (cliff) 40 0.2186 -0.675 50 0.5075 -0.386 60 0.0037 -0.890 144 0.0000 -0.894 ``` Baseline F1 = 0.8939 (measured on the 1000-image calibration pool, hence slightly above the 5000-image verification F1 of 0.8886 in Stage 1). ## What this stage ships - `head_ablation.py` — the sweep script - `head_importance.json` — per-(block, head) F1 + L2 deviation - `pruning_curve.json` — cumulative F1 at K=1, 5, 10, ..., 144 - `head_mask.json` — decision (prune top-10) + rationale - `apply_mask.py` — loader that patches Argus in place by zeroing 10 proj columns ## Parameter accounting Each attention head is ~196K params (147K in qkv + 49K in proj). At K=10, 1.97M params are effectively zeroed (2.3% of the 85.6M backbone). The checkpoint file size is unchanged; what changes is the set of nonzero weights. For a true structural reduction that collapses the tensor shapes, see Stage 3 (depth reduction) and Stage 4 (specialist backbone) which restructure the backbone end-to-end. ## Notable individual findings Heads with the largest individual F1 drops (most important for person classification) are concentrated in middle-to-late blocks. Heads with negative drops (where ablation *improved* F1) are scattered but bias toward early blocks and late-block noise-injectors. The top-10 prunable list in `head_importance.json` under `ranked_most_prunable_first` encodes the ordering used by `apply_mask.py`.