verify two reviewer-probe claims: (1) measured lesion spectra REFUTE 'low internal rank' (RankMe 339>307) -> correct attribution to RARITY across papers #1/#2/NEGATIVE_RESULT; (2) verified MedDINOv3/DINOv3=RoPE vs DINOv2=learned-absolute, paper #3 §3 stated precisely

Browse files

Files changed (6) hide show

gate_reports/NEGATIVE_RESULT.md +12 -7
jobs/lesion_spectrum_job.py +129 -0
paper/paper2_rank_objectives_draft.md +24 -16
paper/paper3_midlayer_draft.md +11 -4
paper/working_draft.md +13 -6
research_v4/lesion_spectrum.json +24 -0

gate_reports/NEGATIVE_RESULT.md CHANGED Viewed

@@ -9,15 +9,20 @@ them as the pruning objective is worse than simply ranking tokens by lesion-subs
 ## Mechanism (the transferable part)
 A rank-based coverage functional `C(S) = effrank(P_L Z_S)` is maximized by a retained set that
-**diversely spans** the lesion subspace's directions. But a small lesion is the opposite
-geometry: a **few** tokens with **high** membership pointing in a **similar** subspace direction
-(low diversity). Maximizing rank/coverage therefore prefers a spread of moderate-membership
-tokens over the concentrated lesion cluster — and drops the lesion. Concentration, not spanning,
 is what rare-pathology retention needs.
-Formally: rank coverage rewards the *entropy of the retained singular spectrum*; lesion retention
-rewards *mass on the top membership tokens*. These objectives diverge precisely when the signal
-is rare and low-rank — i.e., exactly the clinically important small lesions.
 ## Three independent lines of evidence (same verdict)

 ## Mechanism (the transferable part)
 A rank-based coverage functional `C(S) = effrank(P_L Z_S)` is maximized by a retained set that
+**diversely spans** the lesion subspace's directions. The decisive property of a small lesion is
+that it is **rare** — a **few** high-membership tokens out of ~196. A *set*-level rank/coverage
+objective is insensitive to such a cluster: a handful of tokens cannot materially raise the retained
+set's effective rank, so the objective spends budget on abundant background directions and drops the
+lesion. This is a **rarity** mechanism, not low internal geometry — measured at the operating layer,
+lesion tokens are *not* low-rank relative to background (pooled effective rank 339 vs 307;
+participation ratio 18.9 vs 13.9; `research_v4/lesion_spectrum.json`). Concentration, not spanning,
 is what rare-pathology retention needs.
+Formally: rank coverage rewards the *entropy of the retained set's singular spectrum*; lesion
+retention rewards *mass on the top membership tokens*. These objectives diverge whenever the
+critical signal is a **rare** cluster — of any internal rank. (The synthetic closed-form law of the
+companion paper isolates a second, distinct route — a genuinely low-rank signal, gap `(m-r)/m`; real
+lesions fail via rarity, not low rank.)
 ## Three independent lines of evidence (same verdict)

jobs/lesion_spectrum_job.py ADDED Viewed

	@@ -0,0 +1,129 @@

+# /// script
+# requires-python = ">=3.10"
+# dependencies = [
+#   "torch", "torchvision", "numpy", "pillow", "scipy",
+#   "huggingface_hub>=0.34", "dinov3 @ git+https://github.com/facebookresearch/dinov3",
+# ]
+# ///
+"""Measure the REAL lesion-token spectrum to verify the paper-2 rank attribution. HF Job (GPU).
+Paper #2 attributes the retention gap to lesion signal being LOW-RANK relative to the high-
+dimensional background. That attribution was written because the law requires it -- here we CHECK it
+against measured spectra. At the operating layer (block 3, MedDINOv3), using masks ANALYSIS-ONLY
+(no subspace construction), we compute effective rank (RankMe) and participation ratio for:
+  - POOLED lesion tokens vs an equal-count random background pool  (is the lesion SUBSPACE low-rank
+    relative to background?)
+  - WITHIN-IMAGE (lesions with m>=4 tokens): effrank(lesion)/m vs effrank(random m bg)/m  (are
+    lesion tokens CONCENTRATED within a lesion, i.e. internal rank < m?)
+Verdict CONFIRMS the attribution iff lesion effective rank is materially LOWER than background
+(relative). If not, the attribution must change and the paper should say so. Emits LES_SPEC_RESULT.
+"""
+from __future__ import annotations
+import json, os, sys, time
+from pathlib import Path
+import numpy as np, torch
+from PIL import Image
+from huggingface_hub import hf_hub_download
+sys.path.insert(0,"/mnt/processed/covtoken_code")
+from dinov3.models.vision_transformer import vit_base          # noqa: E402
+BACKBONE_REPO="ricklisz123/MedDINOv3-ViTB-16-CT-3M"; MNT=Path("/mnt")
+MASK_ROOT=MNT/"processed"/"lidc_v2"; OUT=MNT/"processed"/"covtoken"
+N_PATCH,CLS_OFF=196,5; LAYER=int(os.environ.get("LAYER","2"))         # block 3
+EVAL_SLICES=int(os.environ.get("EVAL_SLICES","1200"))
+CT_MEAN=np.array([0.485,0.456,0.406],np.float32); CT_STD=np.array([0.229,0.224,0.225],np.float32)
+def log(m): print(f"[lesspec] {m}", flush=True)
+def load_backbone(device):
+    ck=hf_hub_download(BACKBONE_REPO,"model.pth",token=os.environ.get("HF_TOKEN"))
+    m=vit_base(drop_path_rate=0.0,layerscale_init=1e-5,n_storage_tokens=4,qkv_bias=False,mask_k_bias=True)
+    raw=torch.load(ck,map_location="cpu"); sd=raw.get("teacher",raw)
+    sd={(k[9:] if k.startswith("backbone.") else k):v for k,v in sd.items()}
+    m.load_state_dict(sd,strict=False); m.eval().to(device)
+    for p in m.parameters(): p.requires_grad_(False)
+    feats={}
+    def h(_m,_i,out):
+        while isinstance(out,(list,tuple)): out=out[0]
+        feats[0]=out.detach()
+    m.blocks[LAYER].register_forward_hook(h)
+    return m,feats
+def to_t(p):
+    im=Image.open(p).convert("RGB").resize((224,224),Image.BILINEAR)
+    return torch.from_numpy(((np.asarray(im,np.float32)/255.0-CT_MEAN)/CT_STD)).permute(2,0,1)
+@torch.inference_mode()
+def tok(model,feats,imgs,device):
+    model.forward_features(imgs.to(device,torch.float32))
+    return feats[0][:,CLS_OFF:CLS_OFF+N_PATCH,:].float().cpu().numpy()
+def svals(X):
+    Xc=X-X.mean(0,keepdims=True)
+    return np.linalg.svd(Xc,compute_uv=False)
+def rankme(s):
+    p=s/(s.sum()+1e-12); return float(np.exp(-(p*np.log(p+1e-12)).sum()))
+def part_ratio(s):
+    l=s**2; return float((l.sum()**2)/((l**2).sum()+1e-12))
+def main():
+    t0=time.time(); device=torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    model,feats=load_backbone(device); rng=np.random.default_rng(0)
+    ev=[]
+    for cd in sorted((MASK_ROOT/"test").iterdir()):
+        npz=cd/"patch_masks.npz"
+        if cd.is_dir() and npz.exists():
+            pm=np.load(npz)["patch_masks"]
+            for idx in range(len(pm)):
+                if pm[idx].sum()>0: ev.append((cd/f"slice_{idx:04d}.png", pm[idx].reshape(-1)))
+    ev=[ev[i] for i in rng.choice(len(ev),min(EVAL_SLICES,len(ev)),replace=False)]
+    log(f"device={device.type}; layer block {LAYER+1}; lesion-positive eval slices={len(ev)}")
+    les_pool=[]; bg_pool=[]; within=[]   # within: (effrank_les/m, effrank_bg/m) for m>=4
+    for i in range(0,len(ev),48):
+        ch=ev[i:i+48]; T=tok(model,feats,torch.stack([to_t(p) for p,_ in ch]),device)
+        for b,(_,m) in enumerate(ch):
+            li=np.where(m>0)[0]; bi=np.where(m==0)[0]
+            les_pool.append(T[b,li]);
+            bg_pool.append(T[b, rng.choice(bi, min(len(li),len(bi)), replace=False)])  # matched-count bg
+            if len(li)>=4 and len(bi)>=len(li):
+                rl=rankme(svals(T[b,li])); rb=rankme(svals(T[b, rng.choice(bi,len(li),replace=False)]))
+                within.append((rl/len(li), rb/len(li), rl, rb, len(li)))
+    L=np.concatenate(les_pool); B=np.concatenate(bg_pool)
+    n=min(len(L),len(B), 20000)
+    L=L[rng.choice(len(L),n,replace=False)]; B=B[rng.choice(len(B),n,replace=False)]
+    sL=svals(L); sB=svals(B)
+    res={"backbone":"MedDINOv3","layer_block":LAYER+1,"n_lesion_tokens_total":int(len(les_pool) and sum(len(x) for x in les_pool)),
+         "pooled":{"n_per_pool":int(n),"ambient_dim":768,
+            "lesion_rankme":round(rankme(sL),2),"background_rankme":round(rankme(sB),2),
+            "lesion_participation_ratio":round(part_ratio(sL),2),"background_participation_ratio":round(part_ratio(sB),2),
+            "lesion_top10_sv_frac":round(float(sL[:10].sum()/sL.sum()),3),"background_top10_sv_frac":round(float(sB[:10].sum()/sB.sum()),3),
+            "rankme_ratio_lesion_over_bg":round(rankme(sL)/max(rankme(sB),1e-9),3)}}
+    if within:
+        w=np.array(within);
+        res["within_image_m_ge_4"]={"n_images":len(within),"mean_effrank_lesion_over_m":round(float(w[:,0].mean()),3),
+            "mean_effrank_random_bg_over_m":round(float(w[:,1].mean()),3),"mean_m":round(float(w[:,4].mean()),2)}
+    pr=res["pooled"]
+    confirmed = pr["lesion_rankme"] < pr["background_rankme"]*0.85
+    res["verdict"]={"attribution":"low internal rank relative to background",
+        "CONFIRMED": bool(confirmed),
+        "statement": (f"Lesion-token pooled effective rank {pr['lesion_rankme']} vs background {pr['background_rankme']} "
+            f"(ratio {pr['rankme_ratio_lesion_over_bg']}); lesion top-10 SVs capture {pr['lesion_top10_sv_frac']} of variance "
+            f"vs background {pr['background_top10_sv_frac']}. "
+            + ("ATTRIBUTION CONFIRMED: lesion subspace is materially lower-rank than background."
+               if confirmed else
+               "ATTRIBUTION NOT CONFIRMED at the chosen threshold: rewrite the paper-2 scope accordingly."))}
+    res["elapsed_s"]=round(time.time()-t0,1)
+    OUT.mkdir(parents=True,exist_ok=True); (OUT/"lesion_spectrum.json").write_text(json.dumps(res,indent=2))
+    print("LES_SPEC_RESULT "+json.dumps(res),flush=True)
+if __name__=="__main__": main()

paper/paper2_rank_objectives_draft.md CHANGED Viewed

@@ -11,15 +11,18 @@ venue_targets: [NeurIPS/ICML/TMLR, MIDL negative-results, ML4H]
 Effective-rank / coding-rate objectives — RankMe, MCR2, coding rate, and the variance terms in
 VICReg-style methods — are a popular proxy for representation "quality" and an increasingly common
 regularizer in self-supervised learning, including medical SSL. We show, with a mechanism and a
-closed-form law, that **these objectives are structurally mismatched to rare, low-rank,
-concentrated signal under token/feature SELECTION** — the regime of small-lesion detection,
-anomaly detection, and thin-structure retention. A rank/spanning objective rewards a retained set
-that *diversely spans* a subspace; a rare signal is the opposite geometry — a few high-membership
-tokens pointing in a similar direction. We prove (synthetic, closed form) that a spanning
-objective retains a rank-r, m-token signal in proportion `min(r,m)/m`, while a concentration
-objective retains all of it, so the retention gap is `(m-r)/m` and the crossover is at `r*=m`. We
-confirm the law's qualitative content on real medical images (small lesions: concentration 0.81 vs
-spanning 0.46 retention) and map the alignment functional `A(rank, SNR)`. A controlled probe
 isolates the failure to rank as a SELECTION objective: rank as a representation *scaling*
 (whitening) leaves localizability nearly unchanged. The practical consequence: for rare-pathology
 and rare-event tasks, prefer concentration (energy/membership) objectives over rank/spanning ones.
@@ -59,13 +62,18 @@ token/feature SELECTION** is mismatched to rare signal — not all rank pressure
 ## 4. Real-data validation and scope
-On real medical images (small lesions, frozen SSL backbone), the qualitative law holds robustly:
-concentration retains 0.81 of small-lesion mass vs spanning 0.46 (gap ~0.35, CI excludes 0), and a
-constrained coverage-floor pruner built on the rank functional retains 0.22 vs 0.82 for a
-membership rule — it actively hurts. The exact `(m-r)/m` closed form is a clean-background
-idealization: real lesions are few tokens, near-full-rank among themselves but low-rank relative to
-the high-dimensional background, so the operative quantity is lesion rank *relative to background*.
-This bounds the theory honestly without weakening its direction.
 ## 5. Rigor

 Effective-rank / coding-rate objectives — RankMe, MCR2, coding rate, and the variance terms in
 VICReg-style methods — are a popular proxy for representation "quality" and an increasingly common
 regularizer in self-supervised learning, including medical SSL. We show, with a mechanism and a
+closed-form law, that **these objectives are structurally mismatched to retaining a rare critical
+cluster under token/feature SELECTION** — the regime of small-lesion detection, anomaly detection,
+and thin-structure retention. A rank/spanning objective optimizes the *retained set's* spectrum,
+which is insensitive to a rare cluster by either of two routes: the cluster is internally low-rank,
+*or* it is simply too few tokens to move set-level coverage. We prove (synthetic, closed form) the
+low-rank route: a spanning objective retains a rank-r, m-token signal in proportion `min(r,m)/m`, a
+concentration objective retains all of it, gap `(m-r)/m`, crossover `r*=m`. On real medical images
+the failure is large (small lesions: concentration 0.81 vs spanning 0.46 retention) but — and we
+**measured** this — driven by the **rarity** route, not the low-rank one: lesion tokens are *not*
+low-rank relative to background (pooled effective rank 339 vs 307), yet a few of them cannot raise
+the retained set's rank, so coverage drops them. We map the alignment functional `A(rank, SNR)`. A
+controlled probe
 isolates the failure to rank as a SELECTION objective: rank as a representation *scaling*
 (whitening) leaves localizability nearly unchanged. The practical consequence: for rare-pathology
 and rare-event tasks, prefer concentration (energy/membership) objectives over rank/spanning ones.
 ## 4. Real-data validation and scope
+On real medical images (small lesions, frozen SSL backbone), the predicted failure is large and
+robust: concentration retains 0.81 of small-lesion mass vs spanning 0.46 (gap ~0.35, CI excludes 0),
+and a constrained coverage-floor pruner built on the rank functional retains 0.22 vs 0.82 for a
+membership rule — it actively hurts. **But we measured the mechanism, and it is rarity, not low
+rank.** At the operating layer, lesion tokens are *not* low-rank relative to background: pooled
+effective rank (RankMe) **339 vs 307**, participation ratio 18.9 vs 13.9, within-image internal-
+rank/m ≈ equal (0.737 vs 0.715). The exact `(m-r)/m` closed form is therefore a clean-background
+*idealization* of one route to the failure; real lesions take the other — they are too **few** to
+move a set-level coverage statistic (aggregate coverage is identical on lesion-positive vs -negative
+slices, 250.4 vs 247.2), regardless of their internal rank. Both routes are the same principle: a
+*set*-coverage/rank objective is insensitive to a rare critical cluster. We report the measured
+spectra explicitly so the theory is not over-attributed. [`research_v4/lesion_spectrum.json`]
 ## 5. Rigor

paper/paper3_midlayer_draft.md CHANGED Viewed

@@ -43,10 +43,17 @@ annotation.
 ## 3. The mechanism (why mid-layer)
 We disentangle two candidate causes per layer: spatial information (position-probe accuracy) and
-globalization (flip-invariance). **Localizability anti-correlates with flip-invariance (ρ=−0.94),
-not with spatial information** — position is near-perfectly decodable at *every* layer (RoPE), so
-the loss with depth is not positional. As features become invariant to augmentation (the
-self-distillation goal), they trade away the fine local discrimination small lesions need.
 ## 4. Cross-objective: the mechanism is causal-by-comparison

 ## 3. The mechanism (why mid-layer)
 We disentangle two candidate causes per layer: spatial information (position-probe accuracy) and
+globalization (flip-invariance), measured on MedDINOv3. **Localizability anti-correlates with
+flip-invariance (ρ=−0.94), not with spatial information** — patch position is near-perfectly
+decodable at *every* block, so the loss with depth is not positional. This is architecturally
+expected for this backbone: MedDINOv3/DINOv3 encode position with **axial RoPE applied to the
+queries/keys of every attention block** (patch tokens only; CLS and register/storage tokens
+excluded), with *no* learned absolute position embedding — so positional information is re-injected
+at all depths by construction (verified against the DINOv3 source). DINOv2, our ultrasound backbone,
+differs: it uses a **learned absolute position embedding added once at the input** (not RoPE); there
+we confirm the invariance–localizability coupling empirically (§4, ρ=−0.93) but do not import the
+RoPE-based positional control. As features become invariant to augmentation (the self-distillation
+goal), they trade away the fine local discrimination small lesions need.
 ## 4. Cross-objective: the mechanism is causal-by-comparison

paper/working_draft.md CHANGED Viewed

@@ -178,12 +178,19 @@ subspace-only (−0.60 / −0.52, CI excludes 0). The constraint does not add va
 ### 6.2 Mechanism (transferable)
 `C(S)=effrank(P_L Z_S)` is maximized by a retained set that **diversely spans** the subspace's
-directions. A small lesion is the opposite geometry: a few tokens with high membership pointing in
-a similar direction (low diversity). Maximizing rank therefore prefers a spread of moderate tokens
-over the concentrated lesion cluster, and drops the lesion. Rank coverage rewards the entropy of
-the retained spectrum; lesion retention rewards mass on the top membership tokens; these diverge
-precisely when the signal is rare and low-rank. **For rare-pathology tasks, prefer concentration
-objectives (energy / membership mass) over rank/spanning objectives (RankMe, coding rate, MCR2).**
 ### 6.3 Convergent evidence
 Three independent lines reach the same verdict: (a) the ablation above; (b) principled Gate-2

 ### 6.2 Mechanism (transferable)
 `C(S)=effrank(P_L Z_S)` is maximized by a retained set that **diversely spans** the subspace's
+directions. The decisive property of a lesion is that it is **rare** — a handful of tokens out of
+~196. A set-level rank/coverage objective is therefore *insensitive* to it: a few tokens cannot
+materially raise the retained set's effective rank, so the objective spends the budget on abundant
+background directions and drops the lesion. This is a **rarity** mechanism, not an internal-geometry
+one — and we checked: measured at the operating layer, lesion tokens are *not* low-rank relative to
+background (pooled effective rank 339 vs 307; participation ratio 18.9 vs 13.9; within-image
+internal-rank/m ≈ equal). Lesion tokens are in fact diverse; the set-coverage objective is blind to
+them anyway because they are few. (The synthetic law of the companion paper reaches the same failure
+via a genuinely low-rank signal; real lesions reach it via rarity — two routes to one principle.)
+Rank coverage rewards the entropy of the retained *set* spectrum; lesion retention rewards mass on
+the top membership tokens; these diverge whenever the critical signal is a **rare** cluster, of any
+internal rank. **For rare-pathology tasks, prefer concentration objectives (energy / membership
+mass) over rank/spanning objectives (RankMe, coding rate, MCR2).**
 ### 6.3 Convergent evidence
 Three independent lines reach the same verdict: (a) the ablation above; (b) principled Gate-2

research_v4/lesion_spectrum.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "study": "Lesion-token spectrum — verifies (and REFUTES) the paper-2 'low internal rank' attribution",
+  "backbone": "MedDINOv3", "layer_block": 3, "dataset": "LIDC", "n_lesion_positive_slices": 1200,
+  "pooled": {
+    "n_per_pool": 2269, "ambient_dim": 768,
+    "lesion_rankme": 338.97, "background_rankme": 307.00, "rankme_ratio_lesion_over_bg": 1.104,
+    "lesion_participation_ratio": 18.91, "background_participation_ratio": 13.89,
+    "lesion_top10_sv_frac": 0.176, "background_top10_sv_frac": 0.203
+  },
+  "within_image_m_ge_4": {"n_images": 145, "mean_m": 4.3,
+    "mean_effrank_lesion_over_m": 0.737, "mean_effrank_random_bg_over_m": 0.715},
+  "verdict": {
+    "attribution_tested": "lesion signal is low internal rank relative to background",
+    "CONFIRMED": false,
+    "finding": "REFUTED. By every measure at the operating layer, lesion tokens are NOT low-rank relative to background: pooled effective rank 339 vs 307 (1.10x HIGHER), participation ratio 18.9 vs 13.9, top-10 SVs capture LESS variance (0.176 vs 0.203 = more spread), within-image internal-rank/m ~equal (0.737 vs 0.715). The 'few tokens in a similar direction / low diversity' framing is false on real data."
+  },
+  "corrected_mechanism": {
+    "real_failure_driver": "RARITY, not low internal rank",
+    "statement": "The coverage/rank objective fails on real lesions because they are FEW (a handful of tokens cannot materially raise the RETAINED SET's effective rank), so a set-coverage-maximizing selection is indifferent to them regardless of their internal diversity -- and they are in fact diverse. The synthetic S1 law produces the SAME failure via the low-rank knob (gap=(m-r)/m); real lesions reach it via the rarity knob. Both are consequences of optimizing the retained SET's spectrum, which is blind to a rare critical cluster.",
+    "convergent_confirmation": "Aggregate coverage is identical on lesion-positive vs -negative slices (250.4 vs 247.2) -- a 1-3 token cluster cannot move a set-level statistic over ~196 tokens. (paper #1 evidence (c))",
+    "impact": "S1 closed-form law unchanged (it is a valid synthetic existence proof for the low-rank route). The REAL-DATA scope in paper #2 (and the mechanism prose in paper #1) is corrected from 'lesions are low-rank relative to background' to 'lesions are rare; set-coverage is insensitive to a rare cluster of any internal rank'."
+  },
+  "human_signoff": null
+}