File size: 10,331 Bytes
0e51862
e2d5f98
 
0e51862
 
e1571ff
b4562a0
3577555
e1571ff
cee084c
5da9e4f
0e51862
cee084c
 
0e51862
 
cbb9fd4
fccf2c8
22ade2f
 
 
 
 
cbb9fd4
 
 
694db6f
 
 
cbb9fd4
 
 
 
b89b418
fc04234
56cb816
 
 
 
 
 
 
 
 
 
 
 
bb36f8e
 
368d516
bb36f8e
27e576d
368d516
 
 
27e576d
bb36f8e
3a80764
 
cbb9fd4
 
aebb7ea
5724596
 
 
 
 
 
162da29
3a80764
90dad3e
cbb9fd4
 
 
 
 
 
 
7e2021c
cbb9fd4
90dad3e
6e74e31
 
 
 
ea4122a
6e74e31
 
 
 
 
 
 
 
 
 
 
 
 
 
bb36f8e
6e74e31
 
ea4122a
 
 
6e74e31
ea4122a
 
 
 
 
 
 
 
 
 
 
6e74e31
 
cbb9fd4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3577555
6aa54e1
 
 
3577555
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
---
language:
  - en
license: apache-2.0
tags:
    - object-detection
    - onnx
    - safetensors
    - AgTech
    - transformers
library_name: pytorch
inference: false
datasets:
  - Laudando-Associates-LLC/pucks
---

<h1 align="center"><strong>D-FINE</strong></h1>

<p align="center">
  <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine">
    <img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface&style=for-the-badge">
  </a>
</p>

<div align="justify">

[D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement](https://arxiv.org/abs/2410.13842) is a family of real-time object detectors that improve localization accuracy by rethinking how bounding boxes are predicted in DETR-style models. Instead of directly regressing box coordinates, D-FINE introduces a distribution based refinement approach that progressively sharpens predictions over multiple stages.

It also includes a self-distillation mechanism that passes refined localization knowledge to earlier layers, improving training efficiency and model robustness. Combined with lightweight architectural optimizations, D-FINE achieves a strong balance between speed and accuracy.

This repository provides five pretrained variants — Nano, Small, Medium, Large, and Extra Large — offering a trade-off between speed and accuracy for different deployment needs.

</div>

<h3 align="left">Sample Predictions Across D-FINE Variants</h3>

<table align="center">
  <tr>
    <td align="center"><img src="assets/nano.png" alt="Nano" style="width:100%; max-width:300px;"><br><strong>Nano</strong></td>
    <td align="center"><img src="assets/small.png" alt="Small" style="width:100%; max-width:300px;"><br><strong>Small</strong></td>
  </tr>
  <tr>
    <td align="center"><img src="assets/medium.png" alt="Medium" style="width:100%; max-width:300px;"><br><strong>Medium</strong></td>
    <td align="center"><img src="assets/large.png" alt="Large" style="width:100%; max-width:300px;"><br><strong>Large</strong></td>
  </tr>
</table>

## Try it in the Browser

You can test the model(s) using our interactive Gradio demo:

<p align="center">
  <a href="https://huggingface.co/spaces/Laudando-Associates-LLC/d-fine-demo">
    <img src="https://img.shields.io/badge/Launch%20Demo-Gradio-FF4B4B?logo=gradio&logoColor=white&style=for-the-badge">
  </a>
</p>

## D-FINE Variants

The D-FINE family includes five model sizes trained on the [L&A Pucks Dataset](https://huggingface.co/datasets/Laudando-Associates-LLC/pucks), each offering a different balance between model size and detection accuracy.

| Variant      | Parameters | mAP@[0.50:0.95] | Model Card | ONNX | PyTorch |
|:------------:|:----------:|:---------------:|:-----------:|:--------------:|:-------:|
| Nano         | 3.76M      | 0.825           | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-nano"><img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface&style=for-the-badge"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-nano/resolve/main/model.onnx"><img src="https://img.shields.io/badge/-ONNX-005CED?style=for-the-badge&logo=onnx&logoColor=white"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-nano/resolve/main/pytorch_model.bin"><img src="https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white"></a> |
| Small        | 10.3M      | 0.816           | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-small"><img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface&style=for-the-badge"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-small/resolve/main/model.onnx"><img src="https://img.shields.io/badge/-ONNX-005CED?style=for-the-badge&logo=onnx&logoColor=white"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-small/resolve/main/pytorch_model.bin"><img src="https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white"></a> |
| Medium       | 19.6M      | 0.840           | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-medium"><img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface&style=for-the-badge"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-medium/resolve/main/model.onnx"><img src="https://img.shields.io/badge/-ONNX-005CED?style=for-the-badge&logo=onnx&logoColor=white"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-medium/resolve/main/pytorch_model.bin"><img src="https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white"></a> |
| Large        | 31.2M      | 0.828           | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-large"><img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface&style=for-the-badge"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-large/resolve/main/model.onnx"><img src="https://img.shields.io/badge/-ONNX-005CED?style=for-the-badge&logo=onnx&logoColor=white"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-large/resolve/main/pytorch_model.bin"><img src="https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white"></a> |
| Extra Large  | 62.7M      | 0.803           | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-xlarge"><img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface&style=for-the-badge"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-xlarge/resolve/main/model.onnx"><img src="https://img.shields.io/badge/-ONNX-005CED?style=for-the-badge&logo=onnx&logoColor=white"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-xlarge/resolve/main/pytorch_model.bin"><img src="https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white"></a> |


> mAP values are evaluated on the validation set of the [L&A Pucks Dataset](https://huggingface.co/datasets/Laudando-Associates-LLC/pucks).

## Installation

```bash
pip install -r requirements.txt
```

> Tip: Use a virtual environment (venv or conda) to avoid dependency conflicts.

## Quick start on [L&A Pucks Dataset](https://huggingface.co/datasets/Laudando-Associates-LLC/pucks)

```python
from datasets import load_dataset
from transformers import AutoProcessor, AutoModel
from PIL import ImageDraw, ImageFont

# Load the validation split (or 'train')
ds = load_dataset("Laudando-Associates-LLC/pucks", split="test")

# Access the first example
image = ds[1]["image"]

# Load processor and model
processor = AutoProcessor.from_pretrained("Laudando-Associates-LLC/d-fine", trust_remote_code=True)
model = AutoModel.from_pretrained(f"Laudando-Associates-LLC/d-fine-nano", trust_remote_code=True)

# Process the image, reize and pad
inputs = processor(image)

# Run inference
outputs = model(**inputs, conf_threshold=0.4)

# Draw boxes
draw = ImageDraw.Draw(image)
font = ImageFont.truetype("DejaVuSans-Bold.ttf", size=24)
for result in outputs:
    boxes = result["boxes"]
    labels = result["labels"]
    scores = result["scores"]

    for box, label, score in zip(boxes, labels, scores):
        x1, y1, x2, y2 = box.tolist()
        draw.rectangle([x1, y1, x2, y2], outline="blue", width=5)
        draw.text((x1, max(0, y1 - 25)), f"{score:.2f}", fill="blue", font=font)

# Save result
image.save("output.jpg")
```

## How to Use

The D-FINE model family uses a shared processor and variant-specific models. All components are compatible with Hugging Face's ```transformers``` library via ```trust_remote_code=True```.

### Step 1: Load the Preprocessor

The preprocessor is common to all D-FINE variants and handles resizing and padding.

```python
from transformers import AutoProcessor

# Load the shared D-FINE processor
processor = AutoProcessor.from_pretrained("Laudando-Associates-LLC/d-fine", trust_remote_code=True)
```

### Step 2: Load a D-FINE model variant

You can choose from any of the five variants: Nano, Small, Medium, Large, or Extra Large.

```python
from transformers import AutoModel

model_variant = "nano" # small, medium, large, xlarge

# Load the D-FINE model variant
model = AutoModel.from_pretrained(f"Laudando-Associates-LLC/d-fine-{model_variant}", trust_remote_code=True)
```

### Step 3: Run Inference

Using Pillow with a single or batch images:

```python
from PIL import Image

# Single image
image = Image.open("your_image.jpg").convert("RGB")
inputs = processor(image)

# Batch of images
batch_images = [
    Image.open("image1.jpg").convert("RGB"),
    Image.open("image2.jpg").convert("RGB")
]
inputs = processor(batch_images)

# Run inference
outputs = model(**inputs, conf_threshold=0.4)

for result in outputs:
  boxes  = result["boxes"]   # [N, 4] bounding boxes (x1, y1, x2, y2)
  labels = result["labels"]  # [N] class indices
  scores = result["scores"]  # [N] confidence scores
```

Using OpenCV with a single or batch images:

```python
import cv2

# Single OpenCV image (BGR)
image = cv2.imread("your_image.jpg")
inputs = processor(image)

# Batch of OpenCV images
batch_images = [
    cv2.imread("image1.jpg"),
    cv2.imread("image2.jpg")
]
inputs = processor(batch_images)

# Run inference
outputs = model(**inputs, conf_threshold=0.4)

for result in outputs:
  boxes  = result["boxes"]   # [N, 4] bounding boxes (x1, y1, x2, y2)
  labels = result["labels"]  # [N] class indices
  scores = result["scores"]  # [N] confidence scores
```

## License
The D-FINE models use [Apache License 2.0](https://github.com/Peterande/D-FINE/blob/master/LICENSE). The L&A Pucks Dataset which the models have been trained on use [L&Aser Dataset Replication License (Version 1.0)](https://huggingface.co/datasets/Laudando-Associates-LLC/pucks/blob/main/LICENSE).

## Citation
If you use `D-FINE` or its methods in your work, please cite the following BibTeX entries:

```latex
@misc{peng2024dfine,
      title={D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement},
      author={Yansong Peng and Hebei Li and Peixi Wu and Yueyi Zhang and Xiaoyan Sun and Feng Wu},
      year={2024},
      eprint={2410.13842},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
```