Stage 3: depth reduction results (only 1 block cleanly prunable)

Browse files

Files changed (4) hide show

stage_3/README.md +50 -2
stage_3/block_ablation.py +171 -0
stage_3/block_importance.json +139 -0
stage_3/block_pruning_curve.json +137 -0

stage_3/README.md CHANGED Viewed

@@ -1,5 +1,53 @@
 # Stage 3: Depth Reduction
-Reserved. See repo root README for plan.
-Scope: drop transformer blocks that do not route signal to the 100 Stage 0 dims.

 # Stage 3: Depth Reduction
+Attempted block-level pruning analogous to Stage 2 but at the block granularity. For each of EUPE-ViT-B's 12 transformer blocks, zeroed both `block.attn.proj.weight` and `block.mlp.fc2.weight`, which because of the residual structure degenerates the block to a pass-through identity. Measured F1 on 1000 COCO val images with the Stage 0 classifier.
+## Headline result
+Only one block is cleanly prunable. Block 11 (the final block) can be removed with F1 dropping from 0.894 to 0.876. Block 6 is borderline (drop 0.030). All other blocks are structurally critical: ablation collapses the classifier to near-zero F1. Cumulative pruning past K=1 drops fast: K=2 loses 12 F1 points, K=3 destroys the classifier.
+## Per-block importance
+```
+Block  F1        ΔF1 vs baseline
+ 0     0.000    +0.89   (critical)
+ 1     0.011    +0.88   (critical)
+ 2     0.000    +0.89   (critical)
+ 3     0.783    +0.11
+ 4     0.765    +0.13
+ 5     0.599    +0.29   (important)
+ 6     0.864    +0.03   (borderline)
+ 7     0.152    +0.74   (critical)
+ 8     0.430    +0.46
+ 9     0.674    +0.22
+10     0.743    +0.15
+11     0.876    +0.02   (most prunable)
+```
+Baseline F1 = 0.8939 (1000-image calibration pool).
+## Cumulative pruning
+```
+K pruned  F1
+  1       0.876   [block 11]
+  2       0.770   [11, 6]
+  3       0.000   [11, 6, 3]
+  4+      0.000
+```
+## Interpretation
+Transformer blocks cascade information through residual updates. Unlike individual attention heads (which can be redundant within a single block), blocks build the representation incrementally; removing any middle or early block breaks the chain that produces the person-discriminative dims by the final layer. Block 11 is post-hoc refinement that the classifier can survive without. Everything else is load-bearing.
+The takeaway for backbone compression: **naive block skipping on a frozen pretrained ViT-B reaches a hard ceiling at one block**. To get a shallower model, we need Stage 4 — train a new shallower student that learns a compact representation directly, rather than trying to strip layers from the existing one.
+## What this stage ships
+- `block_ablation.py` — the sweep script
+- `block_importance.json` — per-block F1 + L2 deviation
+- `block_pruning_curve.json` — cumulative F1 at K=1, 2, 3, …, 12
+## Parameter accounting
+Each block is ~7.08M params (1.77M qkv + 589K proj + 4.72M MLP + LN + LayerScale). At K=1, ~7.1M params are effectively zeroed (8.3% of the 85.6M backbone). At K=2 with a small F1 cost, ~14.2M (16.6%) — the 0.13 F1 drop makes this generally not worth it for a person detector where 0.87 is the current baseline. Further compression should come from Stages 2 + 4 + 5 combined, not depth alone.

stage_3/block_ablation.py ADDED Viewed

	@@ -0,0 +1,171 @@

+"""Stage 3: depth reduction via block ablation.
+For each of the 12 transformer blocks, zero both the attention proj and the
+MLP fc2 output projections. Because each block is x + attn(x) + mlp(x), this
+degenerates the block to an identity (residual pass-through). Measure F1 on
+the Stage 0 classifier. Rank blocks by smallest F1 drop, sweep cumulative
+skipping, identify how many blocks can be dropped without collapsing.
+Output:
+  block_importance.json
+  block_pruning_curve.json
+"""
+import os, sys, json, time
+import torch
+import torch.nn.functional as F
+import numpy as np
+from PIL import Image
+from pycocotools.coco import COCO
+from transformers import AutoModel
+COCO_ROOT = '/home/zootest/datasets/coco'
+STAGE0_CLASSIFIER = '/mnt/d/_tmp/1pc_repo/stage_0/classifier.json'
+N_CALIBRATION = 1000
+N_BLOCKS = 12
+RES = 768
+D = 768
+OUT_DIR = '/mnt/d/_tmp/1pc_repo/stage_3'
+DEVICE = 'cuda'
+def load_classifier():
+    with open(STAGE0_CLASSIFIER) as f:
+        c = json.load(f)
+    pos = torch.tensor(c['pos_dims'], dtype=torch.long, device=DEVICE)
+    neg = torch.tensor(c['neg_dims'], dtype=torch.long, device=DEVICE)
+    return pos, neg, float(c['threshold'])
+@torch.inference_mode()
+def score_images(argus, imgs, pos, neg):
+    scores = []
+    for x in imgs:
+        with torch.autocast('cuda', dtype=torch.bfloat16):
+            out = argus.backbone.forward_features(x)
+        patches = out['x_norm_patchtokens'].float().squeeze(0)
+        ln = F.layer_norm(patches, [D])
+        pooled = ln.max(dim=0).values
+        scores.append((pooled[pos].sum() - pooled[neg].sum()).item())
+    return torch.tensor(scores, device=DEVICE)
+def f1_of(scores, labels, thr):
+    pred = scores > thr
+    tp = (pred & labels).sum().float()
+    fp = (pred & ~labels).sum().float()
+    fn = (~pred & labels).sum().float()
+    prec = tp / (tp + fp).clamp(min=1)
+    rec = tp / (tp + fn).clamp(min=1)
+    f1 = 2 * prec * rec / (prec + rec).clamp(min=1e-9)
+    return float(f1), float(prec), float(rec)
+def ablate_block(model, block_idx, zero=True):
+    """Zero attn.proj and mlp.fc2 of the given block so the block degenerates
+    to an identity via residual. Returns (orig_proj, orig_fc2) for restoring."""
+    block = model.backbone.blocks[block_idx]
+    orig_proj = block.attn.proj.weight.detach().clone()
+    orig_fc2 = block.mlp.fc2.weight.detach().clone()
+    if zero:
+        with torch.no_grad():
+            block.attn.proj.weight.data.zero_()
+            block.mlp.fc2.weight.data.zero_()
+    return orig_proj, orig_fc2
+def restore_block(model, block_idx, orig_proj, orig_fc2):
+    block = model.backbone.blocks[block_idx]
+    block.attn.proj.weight.data.copy_(orig_proj)
+    block.mlp.fc2.weight.data.copy_(orig_fc2)
+def load_calibration(coco, n, MEAN, STD):
+    img_ids = sorted(coco.getImgIds())[:n]
+    tensors, labels = [], []
+    for img_id in img_ids:
+        info = coco.loadImgs(img_id)[0]
+        path = f"{COCO_ROOT}/val2017/{info['file_name']}"
+        img = Image.open(path).convert('RGB').resize((RES, RES), Image.BILINEAR)
+        arr = np.asarray(img, dtype=np.uint8).copy()
+        x = torch.from_numpy(arr).permute(2, 0, 1).unsqueeze(0).cuda().float() / 255.0
+        tensors.append((x - MEAN) / STD)
+        labels.append(any(a['category_id'] == 1
+                           for a in coco.loadAnns(coco.getAnnIds(imgIds=img_id, iscrowd=False))))
+    return tensors, torch.tensor(labels, dtype=torch.bool, device=DEVICE)
+def main():
+    os.makedirs(OUT_DIR, exist_ok=True)
+    print('[init] loading Argus', flush=True)
+    model = AutoModel.from_pretrained('/mnt/d/Argus', trust_remote_code=True).to(DEVICE).eval()
+    pos, neg, thr = load_classifier()
+    MEAN = torch.tensor([0.485, 0.456, 0.406]).view(1, 3, 1, 1).cuda()
+    STD = torch.tensor([0.229, 0.224, 0.225]).view(1, 3, 1, 1).cuda()
+    print(f'[calib] loading {N_CALIBRATION} COCO val images', flush=True)
+    coco = COCO(f'{COCO_ROOT}/annotations/instances_val2017.json')
+    imgs, labels = load_calibration(coco, N_CALIBRATION, MEAN, STD)
+    print('[baseline]', flush=True)
+    base_scores = score_images(model, imgs, pos, neg)
+    base_f1, base_p, base_r = f1_of(base_scores, labels, thr)
+    print(f'  baseline F1={base_f1:.4f}  P={base_p:.4f}  R={base_r:.4f}', flush=True)
+    # Per-block individual ablation
+    per_block = []
+    t0 = time.time()
+    for b in range(N_BLOCKS):
+        op, of = ablate_block(model, b)
+        scores = score_images(model, imgs, pos, neg)
+        restore_block(model, b, op, of)
+        f1, p, r = f1_of(scores, labels, thr)
+        drop = base_f1 - f1
+        per_block.append({'block': b, 'F1': f1, 'precision': p, 'recall': r,
+                          'F1_drop': drop})
+        print(f'  block {b:>2}  F1={f1:.4f}  drop={drop:+.4f}  '
+              f'{(time.time()-t0):.1f}s', flush=True)
+    ranked = sorted(per_block, key=lambda x: x['F1_drop'])
+    # Cumulative ablation curve
+    print('[curve] cumulative block ablation', flush=True)
+    curve = []
+    backups = {b: ablate_block(model, b) for b in range(N_BLOCKS)}
+    for b, (op, of) in backups.items():
+        restore_block(model, b, op, of)  # ensure clean start
+    for K in [1, 2, 3, 4, 5, 6, 8, 10, 12]:
+        # Restore all
+        for b in range(N_BLOCKS):
+            op, of = backups[b]
+            restore_block(model, b, op, of)
+        # Ablate top-K most-prunable
+        for r in ranked[:K]:
+            b = r['block']
+            with torch.no_grad():
+                model.backbone.blocks[b].attn.proj.weight.data.zero_()
+                model.backbone.blocks[b].mlp.fc2.weight.data.zero_()
+        scores = score_images(model, imgs, pos, neg)
+        f1, p, rr = f1_of(scores, labels, thr)
+        curve.append({'blocks_pruned': K, 'F1': f1, 'F1_drop': base_f1 - f1,
+                      'precision': p, 'recall': rr,
+                      'pruned_list': [r['block'] for r in ranked[:K]]})
+        print(f'  K={K:>2}  F1={f1:.4f}  drop={base_f1-f1:+.4f}  '
+              f'blocks pruned={[r["block"] for r in ranked[:K]]}', flush=True)
+    # Restore
+    for b in range(N_BLOCKS):
+        op, of = backups[b]
+        restore_block(model, b, op, of)
+    with open(f'{OUT_DIR}/block_importance.json', 'w') as f:
+        json.dump({'baseline_F1': base_f1, 'per_block': per_block,
+                   'ranked_most_prunable_first': [(r['block'], r['F1_drop'])
+                                                    for r in ranked]},
+                  f, indent=2)
+    with open(f'{OUT_DIR}/block_pruning_curve.json', 'w') as f:
+        json.dump({'baseline_F1': base_f1, 'curve': curve}, f, indent=2)
+    print(f'[done] -> {OUT_DIR}', flush=True)
+if __name__ == '__main__':
+    main()

stage_3/block_importance.json ADDED Viewed

	@@ -0,0 +1,139 @@

+{
+  "baseline_F1": 0.8939393758773804,
+  "per_block": [
+    {
+      "block": 0,
+      "F1": 0.0,
+      "precision": 0.0,
+      "recall": 0.0,
+      "F1_drop": 0.8939393758773804
+    },
+    {
+      "block": 1,
+      "F1": 0.010928962379693985,
+      "precision": 1.0,
+      "recall": 0.005494505632668734,
+      "F1_drop": 0.8830104134976864
+    },
+    {
+      "block": 2,
+      "F1": 0.0,
+      "precision": 0.0,
+      "recall": 0.0,
+      "F1_drop": 0.8939393758773804
+    },
+    {
+      "block": 3,
+      "F1": 0.7833163142204285,
+      "precision": 0.8810068368911743,
+      "recall": 0.7051281929016113,
+      "F1_drop": 0.1106230616569519
+    },
+    {
+      "block": 4,
+      "F1": 0.76458340883255,
+      "precision": 0.8864734172821045,
+      "recall": 0.6721611618995667,
+      "F1_drop": 0.12935596704483032
+    },
+    {
+      "block": 5,
+      "F1": 0.5989974737167358,
+      "precision": 0.9484127163887024,
+      "recall": 0.4377289414405823,
+      "F1_drop": 0.29494190216064453
+    },
+    {
+      "block": 6,
+      "F1": 0.864454984664917,
+      "precision": 0.8958742618560791,
+      "recall": 0.8351648449897766,
+      "F1_drop": 0.02948439121246338
+    },
+    {
+      "block": 7,
+      "F1": 0.15228426456451416,
+      "precision": 1.0,
+      "recall": 0.08241758495569229,
+      "F1_drop": 0.7416551113128662
+    },
+    {
+      "block": 8,
+      "F1": 0.430379718542099,
+      "precision": 0.9272727370262146,
+      "recall": 0.28021979331970215,
+      "F1_drop": 0.46355965733528137
+    },
+    {
+      "block": 9,
+      "F1": 0.674500584602356,
+      "precision": 0.9409835934638977,
+      "recall": 0.5256410241127014,
+      "F1_drop": 0.21943879127502441
+    },
+    {
+      "block": 10,
+      "F1": 0.7431092262268066,
+      "precision": 0.9335179924964905,
+      "recall": 0.6172161102294922,
+      "F1_drop": 0.15083014965057373
+    },
+    {
+      "block": 11,
+      "F1": 0.8757709264755249,
+      "precision": 0.8438030481338501,
+      "recall": 0.9102563858032227,
+      "F1_drop": 0.01816844940185547
+    }
+  ],
+  "ranked_most_prunable_first": [
+    [
+      11,
+      0.01816844940185547
+    ],
+    [
+      6,
+      0.02948439121246338
+    ],
+    [
+      3,
+      0.1106230616569519
+    ],
+    [
+      4,
+      0.12935596704483032
+    ],
+    [
+      10,
+      0.15083014965057373
+    ],
+    [
+      9,
+      0.21943879127502441
+    ],
+    [
+      5,
+      0.29494190216064453
+    ],
+    [
+      8,
+      0.46355965733528137
+    ],
+    [
+      7,
+      0.7416551113128662
+    ],
+    [
+      1,
+      0.8830104134976864
+    ],
+    [
+      0,
+      0.8939393758773804
+    ],
+    [
+      2,
+      0.8939393758773804
+    ]
+  ]
+}

stage_3/block_pruning_curve.json ADDED Viewed

	@@ -0,0 +1,137 @@

+{
+  "baseline_F1": 0.8939393758773804,
+  "curve": [
+    {
+      "blocks_pruned": 1,
+      "F1": 0.8757709264755249,
+      "F1_drop": 0.01816844940185547,
+      "precision": 0.8438030481338501,
+      "recall": 0.9102563858032227,
+      "pruned_list": [
+        11
+      ]
+    },
+    {
+      "blocks_pruned": 2,
+      "F1": 0.7702127695083618,
+      "F1_drop": 0.12372660636901855,
+      "precision": 0.9187816977500916,
+      "recall": 0.66300368309021,
+      "pruned_list": [
+        11,
+        6
+      ]
+    },
+    {
+      "blocks_pruned": 3,
+      "F1": 0.0,
+      "F1_drop": 0.8939393758773804,
+      "precision": 0.0,
+      "recall": 0.0,
+      "pruned_list": [
+        11,
+        6,
+        3
+      ]
+    },
+    {
+      "blocks_pruned": 4,
+      "F1": 0.0,
+      "F1_drop": 0.8939393758773804,
+      "precision": 0.0,
+      "recall": 0.0,
+      "pruned_list": [
+        11,
+        6,
+        3,
+        4
+      ]
+    },
+    {
+      "blocks_pruned": 5,
+      "F1": 0.0,
+      "F1_drop": 0.8939393758773804,
+      "precision": 0.0,
+      "recall": 0.0,
+      "pruned_list": [
+        11,
+        6,
+        3,
+        4,
+        10
+      ]
+    },
+    {
+      "blocks_pruned": 6,
+      "F1": 0.0,
+      "F1_drop": 0.8939393758773804,
+      "precision": 0.0,
+      "recall": 0.0,
+      "pruned_list": [
+        11,
+        6,
+        3,
+        4,
+        10,
+        9
+      ]
+    },
+    {
+      "blocks_pruned": 8,
+      "F1": 0.0,
+      "F1_drop": 0.8939393758773804,
+      "precision": 0.0,
+      "recall": 0.0,
+      "pruned_list": [
+        11,
+        6,
+        3,
+        4,
+        10,
+        9,
+        5,
+        8
+      ]
+    },
+    {
+      "blocks_pruned": 10,
+      "F1": 0.0,
+      "F1_drop": 0.8939393758773804,
+      "precision": 0.0,
+      "recall": 0.0,
+      "pruned_list": [
+        11,
+        6,
+        3,
+        4,
+        10,
+        9,
+        5,
+        8,
+        7,
+        1
+      ]
+    },
+    {
+      "blocks_pruned": 12,
+      "F1": 0.0,
+      "F1_drop": 0.8939393758773804,
+      "precision": 0.0,
+      "recall": 0.0,
+      "pruned_list": [
+        11,
+        6,
+        3,
+        4,
+        10,
+        9,
+        5,
+        8,
+        7,
+        1,
+        0,
+        2
+      ]
+    }
+  ]
+}