DeepTFUS: variant B (soft-argmax cranked, anchor dropped)

A reproduction attempt of DeepTFUS, proposed by Srivastav et al. (arXiv:2505.12998).

Fine-tune of masonwang025/deeptfus-base that scales A's soft-argmax recipe up (5× weight, sharper temperature) and drops the paper's gradient-L1 anchor, to test whether (a) more aggressive position pressure breaks A's plateau and (b) the anchor was diluting the focal signal.

⭐ Strongest soft-argmax-only focal_mm improvement (−19% median, no relative_l2 cost), but degrades max_p and produces more secondary hot-spots than the anchored variants.

Modification (vs base)

loss.focal_weight       0   to 5e-5     (+soft-argmax term)
loss.focal_temperature  :   to 0.03     (sharper than A's 0.05)
loss.grad_weight        0.1 to 0        (paper anchor DROPPED)

3-epoch warmup ramp on focal_weight. Fine-tune ran 12 epochs from base ckpt at lr=3e-5; shipped ckpt is ckpt_epoch_006.pt (best val_focal_mm in plateau).

Test results (n = 597)

metric	paper	base	B (this model)	Δ vs base
`relative_l2` mean ± std	0.414 ± 0.086	0.384 ± 0.078	0.388 ± 0.077	+0.005 (in budget)
`relative_l2` median	0.394	0.369	0.372	+0.003
`focal_position_error_mm` mean ± std	2.89 ± 2.14	6.49 ± 4.58	5.06 ± 3.57	−1.43 mm (−22%)
`focal_position_error_mm` median	2.45	5.15	4.18	−0.97 mm (−19%)
`max_pressure_error` mean ± std	0.199 ± 0.158	0.225 ± 0.116	0.240 ± 0.106	+0.015 (worse)
`max_pressure_error` median	0.166	0.217	0.239	+0.022
`focal_pressure_error` median	:	0.528	0.502	−0.026
`focal_iou_fwhm` median	:	0.143	0.136	−0.007
`inference_latency_s` median	:	0.233	0.232	unchanged

Other variants and discussion

See the Collection for the other 5 variants, and the project page for the full reproduction story, interactive viewer, and discussion of trade-offs.

Usage

from huggingface_hub import hf_hub_download
import torch

ckpt = torch.load(
    hf_hub_download("masonwang025/deeptfus-ft-b-softargmax-cranked", "ckpt_best.pt"),
    map_location="cpu", weights_only=False,
)

Model code: github.com/masonwang025/deeptfus.