nielsr HF Staff

Improve model card and add metadata

ddc04a7 verified 3 months ago

2.91 kB

license: apache-2.0
library_name: pytorch
pipeline_tag: object-detection
tags:
  - model_hub_mixin
  - pytorch_model_hub_mixin
  - object-detection
  - detrs
  - dinov3

DEIMv2: Real-Time Object Detection Meets DINOv3

DEIMv2 is an evolution of the DEIM framework that leverages features from DINOv3. It spans eight model sizes (from Atto to X), covering GPU, edge, and mobile deployment scenarios. DEIMv2 achieves state-of-the-art results by combining DINOv3-pretrained backbones with a Spatial Tuning Adapter (STA) for larger models, and using pruned HGNetv2 for ultra-lightweight variants.

Paper: Real-Time Object Detection Meets DINOv3
Repository: GitHub - DEIMv2
Project Page: DEIMv2 Project

Model Zoo (COCO)

Model	AP	#Params	GFLOPs
Atto	23.8	0.5M	0.8
Femto	31.0	1.0M	1.7
Pico	38.5	1.5M	5.2
N	43.0	3.6M	6.8
S	50.9	9.7M	25.6
M	53.0	18.1M	52.2
L	56.0	32.2M	96.7
X	57.8	50.3M	151.6

Usage

This model can be loaded using the PyTorchModelHubMixin integration. Please ensure you have the necessary components from the official DEIMv2 repository in your Python path.

import torch.nn as nn
from huggingface_hub import PyTorchModelHubMixin

from engine.backbone import HGNetv2, DINOv3STAs
from engine.deim import HybridEncoder, LiteEncoder
from engine.deim import DFINETransformer, DEIMTransformer
from engine.deim.postprocessor import PostProcessor

class DEIMv2(nn.Module, PyTorchModelHubMixin):
    def __init__(self, config):
        super().__init__()
        # Select backbone based on the configuration
        if "HGNetv2" in config:
            self.backbone = HGNetv2(**config["HGNetv2"])
        else:
            self.backbone = DINOv3STAs(**config["DINOv3STAs"])
            
        self.encoder = HybridEncoder(**config["HybridEncoder"])
        self.decoder = DEIMTransformer(**config["DEIMTransformer"])
        self.postprocessor = PostProcessor(**config["PostProcessor"])

    def forward(self, x, orig_target_sizes):
        x = self.backbone(x)
        x = self.encoder(x)
        x = self.decoder(x)
        x = self.postprocessor(x, orig_target_sizes)

        return x

# Load the model from the Hub
# Replace the model ID with the specific variant you wish to use
model = DEIMv2.from_pretrained("Intellindust/DEIMv2_DINOv3_S_COCO")

Citation

@article{huang2025deimv2,
  title={Real-Time Object Detection Meets DINOv3},
  author={Huang, Shihua and Hou, Yongjie and Liu, Longfei and Yu, Xuanlong and Shen, Xi},
  journal={arXiv preprint arXiv:2509.20787},
  year={2025}
}