Chucks90
/

covtoken

+# Data & Code Availability
+This repository (`Chucks90/covtoken` on the Hugging Face Hub) holds the **code and experiment
+artifacts** for the covtoken study (label-free mid-layer lesion subspaces for token-economical
+medical imaging). All compute was run as Hugging Face Jobs; every reported number is reproducible
+from the scripts here against the public backbones and datasets listed below.
+## What is in this repository
+- **Code** — `jobs/` (PEP-723 `uv` job scripts, one per experiment), `subspace/`, `coverage/`,
+  `gate/`, `arch/`, `eval/`, `data/`, `backbone/`, and `tests/` (incl. the label-leak guard).
+- **Decision records** — `gate_reports/` (per-gate JSON with metric, comparator, threshold, and
+  statistical test; `NEGATIVE_RESULT.md`; `SUMMARY.md`).
+- **Research-program results** — `research_v2/` (S1–S5), `research_v3/` (F1–F4), `research_v4/`
+  (G1/spectra/rarity-route) as JSON + summaries.
+- **Manuscripts & figures** — `paper/` (three drafts, `make_figures.py`, `figures/`).
+- **Specs** — `research_specs/`, `configs/thresholds.lock.json`.
+## What is NOT in this repository (and why)
+Raw token banks, model weights, and materialized image/mask tensors are **not** included: they are
+large, and the imaging data are governed by their original third-party licenses. They are
+regenerated deterministically by the scripts in `jobs/` from the public sources below. Reported
+metrics depend only on those public sources + the scripts here.
+## Backbones (public, frozen — no fine-tuning)
+- **MedDINOv3 ViT-B/16 (CT-3M)** — `ricklisz123/MedDINOv3-ViTB-16-CT-3M` (Hugging Face)
+- **DINOv2-base** — `facebook/dinov2-base`
+- Cross-objective controls: `google/vit-base-patch16-224` (supervised), `facebook/vit-mae-base` (MAE)
+## Imaging datasets (public, third-party — used eval-only; labels never touch subspace construction)
+- **LIDC-IDRI** (lung CT) — The Cancer Imaging Archive: https://www.cancerimagingarchive.net/collection/lidc-idri/
+- **KiTS23** (kidney CT) — https://kits-challenge.org/kits23/
+- **Medical Segmentation Decathlon** — Task03 Liver, Task07 Pancreas (CT) — http://medicaldecathlon.com/
+- **BUSI** (breast ultrasound) — Al-Dhabyani et al., *Data in Brief* 2020 (Dataset of breast ultrasound images)
+Each dataset retains its original license/terms; obtain it from the source above.
+## Reproducing a result
+Every experiment is a self-contained job script. Example:
+```bash
+hf jobs uv run --flavor t4-medium --timeout 2h --secrets HF_TOKEN \
+  -v hf://buckets/<your-bucket>:/mnt --detach jobs/<experiment>_job.py
+```
+Each script declares its inline dependencies (PEP 723), reads inputs from the mounted bucket,
+writes a result JSON, and prints a `*_RESULT` line. The mapping from claims to scripts/artifacts is
+in each `gate_reports/*.json` and the `research_v*/SUMMARY.md` files.
+## Citing this repository
+A DOI for the archival snapshot is available via the repository's **Settings → Generate DOI** on the
+Hugging Face Hub; cite that DOI in the manuscript's Data Availability Statement.