1-parameter-classifier / stage_2 /head_mask.json
phanerozoic's picture
Stage 2: attention-head pruning results + mask + apply_mask.py
a7e09b2 verified
raw
history blame contribute delete
646 Bytes
{
"rationale": "From head_ablation.py sweep on 1000 COCO val images: 10 heads produced the largest individual F1 drops when ablated. Cumulative pruning of those 10 yields peak F1=0.9159 (vs baseline 0.8939); K=20 cumulative still ahead at F1=0.8971; K=30+ cliff-drops below 0.35. Peak pruning point is K=10.",
"peak_K": 10,
"baseline_F1": 0.8939,
"pruned_K10_F1": 0.9159,
"pruned_K20_F1": 0.8971,
"heads_pruned_K10": "top 10 of ranked_most_prunable_first from head_importance.json",
"how_to_apply": "At load time, for each (block, head) in the pruned list, zero block.attn.proj.weight[:, head*64:(head+1)*64]. See apply_mask.py."
}