RF-DETR Segmentation NVFP4 Experimental Pack

This repository contains experimental NVFP4-packed variants of Roboflow/rf-detr-segmentation, a Transformers RF-DETR instance segmentation checkpoint trained on COCO.

The goal was to test whether RF-DETR segmentation could get the "brrr factor" from FP4-style storage on Blackwell-class NVIDIA hardware. The answer from the current post-training pack is: storage compression works, but output fidelity does not pass yet.

Status

Do not treat this as production-ready. The artifacts load and run through a dequantizing loader, but source-vs-quant validation on an RTX 5080 shows large drift in bounding boxes and masks. The packed weights are useful as an experiment, a starting point for calibration-aware/native FP4 work, and a record of what naive NVFP4 packing does to RF-DETR.

Included Artifacts

The root of this repo contains the most compressed variant:

Variant	Path	Packed Params	File Size	Notes
Full NVFP4	`./`	31,841,152 / 34,153,555 (93.23%)	26 MB	Most compressed; fails mask fidelity
Backbone-only NVFP4	`variants/backbone-only/`	21,234,048 / 34,153,555 (62.17%)	61 MB	Keeps heads high precision; still fails mask fidelity
Heads/decoder NVFP4	`variants/heads-decoder/`	10,607,104 / 34,153,555 (31.06%)	96 MB	Keeps ViT backbone high precision; still fails mask fidelity

Original source checkpoint size was about 130 MB for model.safetensors.

Validation

Validation was run on rezo@stallion:

GPU: NVIDIA GeForce RTX 5080 Laptop GPU, 16 GB
Driver: 580.159.03
Runtime: PyTorch 2.11.0+cu130
Transformers: 5.10.2
dtype: bfloat16
Test: one synthetic 432x432 RGB image, source model vs dequantized NVFP4 model

Full NVFP4

Output	Shape	Rel L2	Cosine	Max Abs
logits	`[1, 200, 91]`	0.1387	0.9906	4.7031
pred_boxes	`[1, 200, 4]`	0.5793	0.8245	0.9744
pred_masks	`[1, 200, 108, 108]`	1.0144	0.6042	115.1270

Backbone-only NVFP4

Output	Shape	Rel L2	Cosine	Max Abs
logits	`[1, 200, 91]`	0.1377	0.9905	4.5469
pred_boxes	`[1, 200, 4]`	0.5668	0.8335	0.9606
pred_masks	`[1, 200, 108, 108]`	0.9849	0.5992	108.3125

Heads/decoder NVFP4

Output	Shape	Rel L2	Cosine	Max Abs
logits	`[1, 200, 91]`	0.1223	0.9925	4.2031
pred_boxes	`[1, 200, 4]`	0.5620	0.8394	0.9595
pred_masks	`[1, 200, 108, 108]`	0.9607	0.5827	119.4375

The validation JSON files are included next to each variant.

Format

This is a custom storage pack, not a native Transformers quantization format:

2D floating tensors are packed as NVFP4 E2M1 codes.
Per-block scales are stored as FP8 E4M3 bytes.
Block size is 16 along the reduction dimension.
Non-2D tensors, convolutional tensors, norms, biases, and incompatible shapes are kept in source dtype.

The root file is:

nvfp4_model.safetensors

Quantization metadata is in:

quantization_config.json
quant_error_report.json
validation.json

Loading

Use the included loader, which dequantizes the packed tensors into a temporary HF-style checkpoint and then lets the official Transformers RF-DETR loader perform its key conversion:

from scripts.load_rf_detr_nvfp4 import load_model

model, processor = load_model(".", dtype_name="bfloat16")
model = model.to("cuda")

This path is correct for validation, but it does not provide native FP4 inference speed. It dequantizes before running. Native Blackwell acceleration will require a runtime adapter using torchao/Transformer Engine/ModelOpt-style FP4 kernels and a calibration-aware export.

Recommended Next Step

For a production RF-DETR quant, do not continue with blind per-tensor packing. The validation results point to sensitivity in the segmentation path. The next useful path is:

Use current Transformers RF-DETR support.
Apply native FP4/NVFP4 quantization through torchao or NVIDIA tooling.
Calibrate on real mukbang/food-delivery frames, not synthetic noise.
Validate with actual post-processed masks and COCO-style detection metrics, not only raw tensor cosine.
Keep mask/head layers in higher precision if needed.

Source Model

Base: Roboflow/rf-detr-segmentation
License: Apache 2.0
Task: COCO instance segmentation
Architecture: RF-DETR instance segmentation in Transformers

Citation

@misc{robinson2026rfdetrneuralarchitecturesearch,
      title={RF-DETR: Neural Architecture Search for Real-Time Detection Transformers},
      author={Isaac Robinson and Peter Robicheaux and Matvei Popov and Deva Ramanan and Neehar Peri},
      year={2026},
      eprint={2511.09554},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://huggingface.co/papers/2511.09554},
}

Downloads last month: 21

Model tree for Reza2kn/rf-detr-segmentation-NVFP4

Base model

Roboflow/rf-detr-segmentation

Quantized

(1)

this model

Paper for Reza2kn/rf-detr-segmentation-NVFP4

RF-DETR: Neural Architecture Search for Real-Time Detection Transformers

Paper • 2511.09554 • Published Nov 12, 2025 • 13