LibreRFDETRn-sem

RF-DETR-Nano semantic segmentation model, trained by LibreYOLO on COCO-Stuff (182-class, stuff + things).

Model Details

Property Value
Architecture RF-DETR (DINOv2-small backbone + multi-scale projector + dense decode head)
Task Semantic segmentation
Backbone DINOv2-small (facebook/dinov2-small)
Input size 518ร—518
Classes 182 (COCO-Stuff)
mIoU (COCO-Stuff val2017) 40.7%
Pixel accuracy (val2017) 67.6%
Parameters ~24M
License Apache-2.0

Source

The architecture derives from roboflow/rf-detr (Apache-2.0) and uses a DINOv2 backbone from facebookresearch/dinov2 (Apache-2.0).

Unlike the RF-DETR detection and instance-segmentation weights, these are not a repackaged upstream checkpoint โ€” they were trained by LibreYOLO: a pretrained DINOv2-small backbone fine-tuned together with a clean-room dense semantic head.

Training

  • Dataset: COCO-Stuff-164k โ€” train2017 (118,287 images), 182 classes.
  • Evaluation: COCO-Stuff val2017 (5,000 images), single-scale.
  • Backbone: pretrained DINOv2-small, fine-tuned.
  • Recipe: AdamW, discriminative LR (head 1e-3, backbone 1e-4), cosine schedule with warmup, input 518ร—518, AMP, EMA.
  • Result: mIoU 0.407, pixel accuracy 0.676 on val2017 (best epoch).

Usage

from libreyolo import LibreRFDETR

model = LibreRFDETR("LibreRFDETRn-sem.pt", task="semantic")
result = model.predict("image.jpg")  # per-pixel COCO-Stuff class map

Limitations

This is a compact (~24M) model with a lightweight decode head; it captures the dominant scene regions well but boundaries are coarser and rare classes are harder than for larger seg-specialised decoders. For best quality, fine-tune on your own data.

License

Apache License 2.0. See the LICENSE and NOTICE files. Training annotations are from COCO-Stuff (CC BY 4.0); the underlying images are from the COCO dataset.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support