nms-verified
FCOS-ResNet50 trained so that the one-peak invariant proved in
CharlesCNorton/nms-verified
holds on its dense output. Under that invariant, greedy NMS is
provably equivalent to a sort + threshold filter on the score axis.
The deployment graph can drop NMS at zero AP cost.
Theorems
The companion Rocq library is structured into four modules
(Core.v, Bridge.v, Probability.v, Pipeline.v). The keystone
collapse theorem in Core.v is
Theorem nms_collapse_onepeak :
forall D,
NoDup D -> sorted_desc D -> one_peak D -> no_tie_clash D ->
filter_above (nms_sorted D) = filter_above D.
instantiated for box-NMS, soft-NMS, mask-NMS, heatmap local-NMS, and DETR-style filtering.
The bridge from a Lipschitz score head to Separated is in Bridge.v
(lipschitz_bridge_substantive, SepRespectingHead). Architectural
discharges of the bridge precondition appear in Pipeline.v:
Theorem detr_equilibrium_yields_separated (* DETR / set-prediction *)
Theorem anchor_stride_yields_separated (* FCOS / RetinaNet / ATSS *)
The training-side chain from SGD to the Separated locus is closed end
to end in Pipeline.v:
Theorem squared_hinge_sgd_pl_convergence
Theorem L_separated_sq_zero_iff_separated_general
Theorem real_training_to_collapse
PAC generalization from a finite training sample is in Probability.v
via Hoeffding's lemma (hoeffding_lemma_symmetric,
hoeffding_iid_finite_class_one_sided).
All proofs closed in Rocq 9 with no admits and no axioms beyond
Stdlib's standard R foundations.
COCO val2017 mAP
| model | std AP | NMS-free AP |
|---|---|---|
| pretrained FCOS_ResNet50_FPN_Weights.COCO_V1 | 0.391 | 0.174 |
| this checkpoint | 0.265 | 0.264 |
Without the invariant, NMS carries ~22 AP. With it, the deployment graph can drop NMS at 0.001 AP cost.
Recipe
Self-distillation from the pretrained teacher on COCO train2017, frozen backbone, image size 640, no spectral-norm parametrisation, IoU-selectivity + ratio penalty + dense-output mimic loss.
python experiments/train_with_spectral_norm.py \
--iters 120000 \
--batch-size 1 --image-size 640 --lr 1e-4 \
--peak-lambda 0.0 \
--iou-selectivity-lambda 200.0 \
--iou-selectivity-tau 0.5 --iou-selectivity-theta 0.3 \
--iou-selectivity-topk 2000 \
--iou-ratio-lambda 200.0 --iou-ratio-target 0.3 \
--center-sampling-radius 0.5 \
--no-spectral-norm \
--self-distill --teacher-theta 0.5 \
--mimic-lambda 1.0 \
--coco-image-dir /path/to/coco/train2017 --coco-image-limit 0
Load and run
import torch
from safetensors.torch import load_file
from torchvision.models.detection import (
fcos_resnet50_fpn, FCOS_ResNet50_FPN_Weights,
)
from torchvision.ops import boxes as box_ops
model = fcos_resnet50_fpn(weights=FCOS_ResNet50_FPN_Weights.COCO_V1)
model.load_state_dict(load_file("fcos_certified.safetensors"))
model.eval().to("cuda")
# Standard inference: torchvision's stock postprocess includes NMS.
detections = model([image_tensor])
# NMS-free inference: replace postprocess_detections with the same code
# minus the batched_nms call. Output equals the standard path under the
# one-peak invariant (Theorem nms_collapse_onepeak in the Rocq library).
def nms_free_postproc(self, head_outputs, anchors, image_shapes):
cl, br, bc = head_outputs["cls_logits"], head_outputs["bbox_regression"], head_outputs["bbox_ctrness"]
out = []
for i, shape in enumerate(image_shapes):
boxes_i, scores_i, labels_i = [], [], []
for br_l, cl_l, bc_l, anc_l in zip(br, cl, bc, anchors[i]):
num_classes = cl_l.shape[-1]
scores = torch.sqrt(cl_l[i].sigmoid() * bc_l[i].sigmoid()).flatten()
keep = scores > self.score_thresh
scores = scores[keep]
topk = torch.where(keep)[0]
k = min(topk.numel(), self.topk_candidates)
scores, order = scores.topk(k)
topk = topk[order]
anc_idx = torch.div(topk, num_classes, rounding_mode="floor")
labels = topk % num_classes
boxes = self.box_coder.decode(br_l[i][anc_idx], anc_l[anc_idx])
boxes_i.append(box_ops.clip_boxes_to_image(boxes, shape))
scores_i.append(scores); labels_i.append(labels)
boxes_i = torch.cat(boxes_i); scores_i = torch.cat(scores_i); labels_i = torch.cat(labels_i)
order = scores_i.argsort(descending=True)[:self.detections_per_img]
out.append({"boxes": boxes_i[order], "scores": scores_i[order], "labels": labels_i[order]})
return out
import types
model.postprocess_detections = types.MethodType(nms_free_postproc, model)
detections_nms_free = model([image_tensor])
Reproducing the eval
python experiments/coco_eval.py \
--ckpt fcos_certified.safetensors \
--theta 0.05 --top-k 100
Reports both standard and NMS-free AP on COCO val2017.
Certification
nms_cert.ml is the OCaml extraction of the decidable certifier
(Separated_check, Separated_dec, sep_certify_finite) from the
Rocq library. nms_cert_main.ml is a small CLI driver: detections in,
CERTIFIED <slack> or REJECTED out. Build:
ocamlfind ocamlopt nms_cert.ml nms_cert_main.ml -o nms_cert
Usage:
./nms_cert TAU THETA SLACK < detections.txt
Each input line is whitespace-separated score x1 y1 x2 y2 with
integer scores and integer bounding-box pixels. CERTIFIED means
the corresponding Separated predicate holds on the input list at
that slack โ by nms_collapse_onepeak, NMS is then equivalent to a
threshold filter on the same list.
Files
fcos_certified.safetensorsโ model weights (FCOS-ResNet50-FPN, 319 tensors).nms_cert.mlโ extracted certifier (Separated_check, Separated_dec, sep_certify_finite).nms_cert_main.mlโ CLI driver around the extracted certifier.README.mdโ this card.
Citation
@misc{nms_verified,
author = {Norton, Charles C.},
title = {nms-verified: a Rocq formalization of NMS collapse for dense detection},
howpublished = {\url{https://github.com/CharlesCNorton/nms-verified}},
note = {Model checkpoint at \url{https://huggingface.co/phanerozoic/nms-verified}},
}
License
MIT.
Evaluation results
- AP (standard NMS) on COCO val2017validation set self-reported0.265
- AP (NMS-free deployment) on COCO val2017validation set self-reported0.264