Cofiber Detection Circuit
A depth-3 threshold gate network for multi-scale object detection on frozen vision transformer features. 61,520 INT8 learned parameters. 2,184,000 gates. The multi-scale decomposition is analytic (zero parameters); only the classification layer is learned.
The reference backbone is EUPE-ViT-B (86M parameters, frozen), the same encoder used by Argus. The circuit consumes the backbone's stride-16 patch features and produces per-class detections without modifying any backbone weights.
Circuit
Input: feature grid [768, 40, 40] from frozen ViT at stride 16
Layer 0 β Pool (fixed, 0 learned params)
pool(x)_{i,j} = 0.25 * (x_{2i,2j} + x_{2i+1,2j} + x_{2i,2j+1} + x_{2i+1,2j+1})
Layer 1 β Cofiber (fixed, 0 learned params)
cofib(x)_{i,j} = x_{i,j} - upsample(pool(x))_{i,j}
Layer 2 β Classify (61,520 INT8 params)
detect(i,j,c) = H( Ξ£_d w_{c,d} Β· cofib(x)_{i,j,d} + b_c )
Output: per-location, per-class binary detection decisions
The cofiber x - upsample(pool(x)) isolates information present at a given spatial scale but absent from the next coarser scale. Applied iteratively, it decomposes the feature grid into three scale bands (strides 16, 32, 64) with zero learned parameters. The classification layer operates on each band independently.
Equations
The decomposition satisfies three properties, proven in CofiberDecomposition.v:
Block diagonal: The low-frequency block of any morphism between decomposed features equals the functorial low-frequency component. Classification on cofibers is equivalent to multi-scale classification on the original features.
Cross-term vanishing (highβlow): Low-frequency input produces zero high-frequency output. A large object detected at stride 32 creates no signal in the stride-16 cofiber.
Cross-term vanishing (lowβhigh): High-frequency input produces zero low-frequency output. Scale bands do not interfere.
These properties are consequences of the adjoint pair (bilinear upsample β£ average pool) forming a counit in a semi-additive category. The cofiber of the counit decomposes objects along an exact sequence, guaranteeing lossless scale separation.
Parameters
| Layer | Operation | Weights | Learned |
|---|---|---|---|
| 0 | Average pool 2x | {0.25, 0.25, 0.25, 0.25} | No |
| 1 | Subtract: x - upsample(pool(x)) | {1, -1} | No |
| 2 | Classify: H(w Β· cofib + b) | 80 Γ 768 + 80 | Yes (INT8) |
| Total | 61,520 |
Gates
| Scale | Stride | Spatial | Pool gates | Subtract gates | Classify gates |
|---|---|---|---|---|---|
| 0 | 16 | 40 Γ 40 | 307,200 | 1,228,800 | 128,000 |
| 1 | 32 | 20 Γ 20 | 76,800 | 307,200 | 32,000 |
| 2 | 64 | 10 Γ 10 | β | β | 8,000 |
| Total | 2,184,000 |
All layer 0β1 gates use integer weights from {-1, 0, 1}. Layer 2 gates use INT8 quantized weights. INT8 quantization produces 99.7% detection agreement with FP32.
COCO val2017 Results
Two variants trained on COCO 2017 train (117,266 images), 8 epochs, batch 64, frozen EUPE-ViT-B backbone. Evaluated with pycocotools on the full 5000-image val set.
| Variant | Architecture | Params | Nonzero | mAP@[0.5:0.95] | mAP@0.50 | mAP@0.75 |
|---|---|---|---|---|---|---|
| linear_70k | 768β4 box regression | 69,976 | 69,976 | 4.0 | 15.8 | 0.8 |
| box32_92k | 768β32β4 box regression | 91,640 | 91,640 | 5.7 | 20.6 | 1.3 |
| box32 pruned R1 | 768β32β4, 15K weights zeroed | 91,640 | ~76,640 | 5.7 | 20.7 | 1.3 |
| box32 pruned R2 | 768β32β4, 30K weights zeroed | 91,640 | ~62,000 | 5.9 | 20.4 | 1.5 |
| box32 pruned R3 | 768β32β4, 45K weights zeroed | 91,640 | ~47,000 | 5.1 | 17.1 | 1.4 |
| dim20 | 768β20β80 bottleneck, SVD-init | 22,076 | 22,076 | 3.9 | 14.8 | 0.9 |
| dim20 R1 (project 25.7% sparse) | dim20 with 3,955 project weights zeroed | 22,076 | 18,121 | 3.9 | 14.6 | 0.8 |
| dim20 R2 (project 26.6% sparse) | dim20 with 4,088 project weights zeroed | 22,076 | 17,988 | 3.8 | 14.5 | 0.7 |
| dim20 cls_weight pruned (37%) | 596 of 1600 cls weights zeroed | 22,076 | 21,480 | 3.8 | 14.4 | 0.7 |
| dim20 reg_hidden pruned (17%) | 55 of 320 reg_hidden weights zeroed | 22,076 | 22,021 | 3.8 | 14.5 | 0.7 |
| dim20 reg_out pruned (12%) | 8 of 64 reg_out weights zeroed | 22,076 | 22,068 | 3.8 | 14.5 | 0.7 |
| dim20 ctr_weight pruned (90%) | 18 of 20 centerness weights zeroed | 22,076 | 22,058 | 3.7 | 14.2 | 0.7 |
| dim20 R1 + cls greedy | project 25.7% + cls_weight 45% sparse | 22,076 | 17,406 | 3.5 | 13.4 | 0.6 |
| dim20 joint (from R1) | whole-head magnitude pruning from R1 | 22,076 | 17,129 | 3.6 | 13.7 | 0.6 |
| dim15 | 768β15β80 bottleneck, SVD-init | 17,751 | 17,751 | 3.0 | 11.5 | 0.7 |
| dim10 | 768β10β80 bottleneck, SVD-init | 13,426 | 13,426 | 1.5 | 5.6 | 0.4 |
| dim5 | 768β5β80 bottleneck, SVD-init | 9,101 | 9,101 | 0.3 | 1.3 | 0.1 |
Pruning improved mAP from 5.7 to 5.9 by removing noisy prototype weights (box32 R2). Further pruning degraded performance (box32 R3). The dim20/dim15/dim10/dim5 variants project features to 20, 15, 10, and 5 dimensions before classifying, with each projection initialized from the top-K right singular vectors of the pruned box32 R2 prototype matrix. Dim20 retains 72% of the SVD energy and produces 3.9 mAP. Dim15 retains 67% and produces 3.0 mAP. Dim10 retains 61% and produces 1.5 mAP β the smallest 80-class COCO detector to clear the 1.0 mAP threshold. Dim5 retains 53% and drops to 0.3 mAP. The mAP scaling across dim20 β dim15 β dim10 is roughly geometric (3.9 β 3.0 β 1.5), but reverses sharply between 10 and 5 dimensions where the curve falls off a cliff. Five directions sit below the intrinsic capacity needed for 80-class separation; the floor lies between 5 and 10 bottleneck dimensions.
The dim20 head was then itself pruned. The mAP-driven pruner bisects over the magnitude-sorted weight list of a target parameter, uses full pycocotools mAP@[0.5:0.95] as the retention metric (1000 val images), and rolls back any pass that fails the 95% retention floor on full verification. It was run separately on each learned parameter of dim20 plus a joint-magnitude variant that ranks every weight in the head against every other.
The leading point is R1 (project layer 25.7% sparse, 18,121 nonzero, 3.9 mAP) β same mAP as unpruned dim20 with 18% fewer effective parameters, the highest mAP-per-10K-parameter ratio in the table at 2.15. R2 pushes to 26.6% project sparsity (17,988 nonzero) at a small mAP cost (3.8). Per-parameter slack measurements ran independently against the unpruned dim20 baseline: project 26.6%, cls_weight 37%, reg_hidden 17%, reg_out 12%, ctr_weight 90% (only 2 of 20 centerness weights load-bear). Greedy stacking of cls_weight pruning on top of R1 reaches 17,406 nonzero but drops to 3.5 mAP β interaction between parameters: cls_weight slack measured on unpruned dim20 partly compensates for the surviving project subspace, so removing it after pruning project costs more mAP than the per-parameter measurement suggested. Joint magnitude pruning across all 22K head weights (starting from R1) finds 17,129 nonzero at 3.6 mAP, which is the smallest dim20 found but does not Pareto-dominate R1 β the bisection's 1000-image mAP proxy was systematically optimistic relative to the full 5000-image eval, so the 95% retention floor measured during pruning gave more aggressive cuts than the full eval would have accepted. R1 remains the leading point of the dim20 pruning Pareto.
All these variants are the smallest detection heads to produce standard COCO mAP numbers on the 80-class benchmark.
Training
| Variant | Epochs | Batch | Optimizer | LR | Schedule | Initialization |
|---|---|---|---|---|---|---|
| linear_70k | 8 | 64 | AdamW (wd 1e-4) | 1e-3 | cosine, 3% warmup | random |
| box32_92k | 8 | 64 | AdamW (wd 1e-4) | 1e-3 | cosine, 3% warmup | random |
| box32 pruned R1/R2/R3 | β | β | β | β | β | from box32_92k checkpoint |
| dim20 | 8 | 64 | AdamW (wd 1e-4) | 1e-3 | cosine, 3% warmup | SVD of pruned R2 prototypes |
| dim15 | 8 | 128 | AdamW (wd 1e-4) | 1e-3 | cosine, 3% warmup | SVD of pruned R2 prototypes + analytical least-squares cls init |
| dim10 | 8 | 128 | AdamW (wd 1e-4) | 1e-3 | cosine, 3% warmup | SVD of pruned R2 prototypes + analytical least-squares cls init |
| dim5 | 8 | 128 | AdamW (wd 1e-4) | 1e-3 | cosine, 3% warmup | SVD of pruned R2 prototypes + analytical least-squares cls init |
| dim20 pruned R1/R2 | β | β | β | β | β | from dim20 checkpoint, mAP-driven bisection on project layer |
All trained variants use the same FCOS-style loss: focal classification (alpha 0.25, gamma 2.0), GIoU box regression, and BCE centerness, summed over three cofiber scales at strides 16, 32, 64. The backbone is frozen throughout β gradients flow only through the head. Gradient clipping is set to 5.0.
Pruning is iterative magnitude reduction with a 95% TP-retention threshold on 1000 COCO val images. Each pass tests up to 5000 weights for zeroing (or halving as a fallback) and verifies on the full 1000-image set. R1, R2, R3 are successive passes from the same starting checkpoint.
The training and pruning scripts that produced these checkpoints live in the phanerozoic/detection-heads repository under heads/cofiber_threshold/<variant>/train.py and prune.py.
Usage
from model import CofiberDetector
detector = CofiberDetector.from_safetensors("model.safetensors")
# features: [768, 40, 40] numpy array from any frozen ViT at stride 16
detections = detector.detect(features, score_thresh=0.3)
for d in detections:
print(f"class {d['label']} at {d['box']} score {d['score']:.3f} scale {d['scale']}")
Proof
CofiberDecomposition.v contains a machine-checked proof (Coq/HoTT) of the three cross-term vanishing theorems. The proof establishes that the block structure of the decomposition is exact in any semi-additive category with a suspension-loop adjunction. The concrete instantiation (average pool, bilinear upsample, float32 tensors) satisfies the hypotheses up to machine precision (reconstruction error < 3e-7).
Files
threshold-cofiber-detection/
βββ model.safetensors # 241 KB trained INT8 circuit (linear_70k)
βββ model_untrained.safetensors # 241 KB untrained INT8 circuit
βββ model.py # standalone circuit inference
βββ model_box32.py # box32 variant architecture
βββ cofiber_threshold_coco_8ep_70k.pth # trained PyTorch weights (linear_70k, 4.0 mAP)
βββ cofiber_threshold_untrained_70k.pth # untrained PyTorch weights
βββ cofiber_threshold_coco_8ep_70k_eval.json # pycocotools eval (linear_70k)
βββ cofiber_threshold_box32_coco_8ep_92k.pth # trained (box32, 5.7 mAP)
βββ cofiber_threshold_box32_coco_8ep_92k_eval.json # pycocotools eval (box32)
βββ cofiber_threshold_box32_coco_8ep_92k_pruned76k.pth # pruned R1 (5.7 mAP, ~76K nonzero)
βββ cofiber_threshold_box32_coco_8ep_92k_pruned76k_eval.json # pycocotools eval (pruned R1)
βββ cofiber_threshold_box32_coco_8ep_pruned_62k.pth # pruned R2 (5.9 mAP, ~62K nonzero, best)
βββ cofiber_threshold_box32_coco_8ep_pruned_62k_eval.json # pycocotools eval (pruned R2)
βββ cofiber_threshold_box32_pruned_46k.pth # pruned R3 (5.1 mAP, ~46K nonzero)
βββ cofiber_threshold_box32_pruned_46k_eval.json # pycocotools eval (pruned R3)
βββ cofiber_threshold_dim20_coco_8ep_22k.pth # dim20 (3.9 mAP, 22K params)
βββ cofiber_threshold_dim20_coco_8ep_22k_eval.json # pycocotools eval (dim20)
βββ cofiber_threshold_dim20_pruned_R1_18k.pth # dim20 pruned R1 (3.9 mAP, 18K nonzero)
βββ cofiber_threshold_dim20_pruned_R1_18k_eval.json # pycocotools eval (dim20 R1)
βββ cofiber_threshold_dim20_pruned_R2_18k.pth # dim20 pruned R2 (3.8 mAP, 18K nonzero)
βββ cofiber_threshold_dim20_pruned_R2_18k_eval.json # pycocotools eval (dim20 R2)
βββ cofiber_threshold_dim20_pruning_pareto.json # full Pareto front + pruner config
βββ cofiber_threshold_dim15_coco_8ep_17k.pth # dim15 (3.0 mAP, 17K params)
βββ cofiber_threshold_dim15_coco_8ep_17k_eval.json # pycocotools eval (dim15)
βββ cofiber_threshold_dim10_coco_8ep_13k.pth # dim10 (1.5 mAP, 13K params)
βββ cofiber_threshold_dim10_coco_8ep_13k_eval.json # pycocotools eval (dim10)
βββ cofiber_threshold_dim5_coco_8ep_9k.pth # dim5 (0.3 mAP, 9K params)
βββ cofiber_threshold_dim5_coco_8ep_9k_eval.json # pycocotools eval (dim5)
βββ config.json # architecture metadata
βββ CofiberDecomposition.v # machine-checked proof
βββ TODO.md # research directions
βββ README.md
License
Apache 2.0
- Downloads last month
- 51
Model tree for phanerozoic/threshold-cofiber-detection
Base model
facebook/EUPE-ViT-B