# Stage 2: Attention-Head Pruning

Ablated each of the 144 (block, head) pairs in EUPE-ViT-B individually and measured F1 on 1000 COCO val images with the Stage 0 classifier. Ranked heads by individual F1 drop (smallest drop = most prunable), then swept the cumulative pruning curve.

## Headline result

Pruning the 10 most prunable heads *improves* F1 from 0.894 to 0.916. Those heads were injecting noise that hurt the person task. Further pruning up to K=20 is still ahead of baseline. At K=30 the classifier collapses as important heads are removed.

## Pruning curve

```
K  pruned  F1        ΔF1 vs baseline
 1         0.9037    +0.010
 5         0.9086    +0.015
10         0.9159    +0.022    <- peak
15         0.8949    +0.001
20         0.8971    +0.003
30         0.3267    -0.567    (cliff)
40         0.2186    -0.675
50         0.5075    -0.386
60         0.0037    -0.890
144        0.0000    -0.894
```

Baseline F1 = 0.8939 (measured on the 1000-image calibration pool, hence slightly above the 5000-image verification F1 of 0.8886 in Stage 1).

## What this stage ships

- `head_ablation.py` — the sweep script
- `head_importance.json` — per-(block, head) F1 + L2 deviation
- `pruning_curve.json` — cumulative F1 at K=1, 5, 10, ..., 144
- `head_mask.json` — decision (prune top-10) + rationale
- `apply_mask.py` — loader that patches Argus in place by zeroing 10 proj columns

## Parameter accounting

Each attention head is ~196K params (147K in qkv + 49K in proj). At K=10, 1.97M params are effectively zeroed (2.3% of the 85.6M backbone). The checkpoint file size is unchanged; what changes is the set of nonzero weights. For a true structural reduction that collapses the tensor shapes, see Stage 3 (depth reduction) and Stage 4 (specialist backbone) which restructure the backbone end-to-end.

## Notable individual findings

Heads with the largest individual F1 drops (most important for person classification) are concentrated in middle-to-late blocks. Heads with negative drops (where ablation *improved* F1) are scattered but bias toward early blocks and late-block noise-injectors. The top-10 prunable list in `head_importance.json` under `ranked_most_prunable_first` encodes the ordering used by `apply_mask.py`.