LibreRFDETRn-sem

RF-DETR-Nano semantic segmentation model, trained by LibreYOLO on COCO-Stuff (182-class, stuff + things).

Model Details

Property	Value
Architecture	RF-DETR (DINOv2-small backbone + multi-scale projector + dense decode head)
Task	Semantic segmentation
Backbone	DINOv2-small (`facebook/dinov2-small`)
Input size	518×518
Classes	182 (COCO-Stuff)
mIoU (COCO-Stuff val2017)	40.7%
Pixel accuracy (val2017)	67.6%
Parameters	~24M
License	Apache-2.0

Source

The architecture derives from roboflow/rf-detr (Apache-2.0) and uses a DINOv2 backbone from facebookresearch/dinov2 (Apache-2.0).

Unlike the RF-DETR detection and instance-segmentation weights, these are not a repackaged upstream checkpoint — they were trained by LibreYOLO: a pretrained DINOv2-small backbone fine-tuned together with a clean-room dense semantic head.

Training

Dataset: COCO-Stuff-164k — train2017 (118,287 images), 182 classes.
Evaluation: COCO-Stuff val2017 (5,000 images), single-scale.
Backbone: pretrained DINOv2-small, fine-tuned.
Recipe: AdamW, discriminative LR (head 1e-3, backbone 1e-4), cosine schedule with warmup, input 518×518, AMP, EMA.
Result: mIoU 0.407, pixel accuracy 0.676 on val2017 (best epoch).

Usage

from libreyolo import LibreRFDETR

model = LibreRFDETR("LibreRFDETRn-sem.pt", task="semantic")
result = model.predict("image.jpg")  # per-pixel COCO-Stuff class map

Limitations

This is a compact (~24M) model with a lightweight decode head; it captures the dominant scene regions well but boundaries are coarser and rare classes are harder than for larger seg-specialised decoders. For best quality, fine-tune on your own data.

License

Apache License 2.0. See the LICENSE and NOTICE files. Training annotations are from COCO-Stuff (CC BY 4.0); the underlying images are from the COCO dataset.

Downloads last month: -; Downloads are not tracked for this model. How to track