LibreRFDETRn-sem
RF-DETR-Nano semantic segmentation model, trained by LibreYOLO on COCO-Stuff (182-class, stuff + things).
Model Details
| Property | Value |
|---|---|
| Architecture | RF-DETR (DINOv2-small backbone + multi-scale projector + dense decode head) |
| Task | Semantic segmentation |
| Backbone | DINOv2-small (facebook/dinov2-small) |
| Input size | 518ร518 |
| Classes | 182 (COCO-Stuff) |
| mIoU (COCO-Stuff val2017) | 40.7% |
| Pixel accuracy (val2017) | 67.6% |
| Parameters | ~24M |
| License | Apache-2.0 |
Source
The architecture derives from roboflow/rf-detr (Apache-2.0) and uses a DINOv2 backbone from facebookresearch/dinov2 (Apache-2.0).
Unlike the RF-DETR detection and instance-segmentation weights, these are not a repackaged upstream checkpoint โ they were trained by LibreYOLO: a pretrained DINOv2-small backbone fine-tuned together with a clean-room dense semantic head.
Training
- Dataset: COCO-Stuff-164k โ
train2017(118,287 images), 182 classes. - Evaluation: COCO-Stuff
val2017(5,000 images), single-scale. - Backbone: pretrained DINOv2-small, fine-tuned.
- Recipe: AdamW, discriminative LR (head 1e-3, backbone 1e-4), cosine schedule with warmup, input 518ร518, AMP, EMA.
- Result: mIoU 0.407, pixel accuracy 0.676 on
val2017(best epoch).
Usage
from libreyolo import LibreRFDETR
model = LibreRFDETR("LibreRFDETRn-sem.pt", task="semantic")
result = model.predict("image.jpg") # per-pixel COCO-Stuff class map
Limitations
This is a compact (~24M) model with a lightweight decode head; it captures the dominant scene regions well but boundaries are coarser and rare classes are harder than for larger seg-specialised decoders. For best quality, fine-tune on your own data.
License
Apache License 2.0. See the LICENSE and NOTICE files.
Training annotations are from COCO-Stuff (CC BY 4.0); the underlying images are
from the COCO dataset.