File size: 10,331 Bytes
0e51862 e2d5f98 0e51862 e1571ff b4562a0 3577555 e1571ff cee084c 5da9e4f 0e51862 cee084c 0e51862 cbb9fd4 fccf2c8 22ade2f cbb9fd4 694db6f cbb9fd4 b89b418 fc04234 56cb816 bb36f8e 368d516 bb36f8e 27e576d 368d516 27e576d bb36f8e 3a80764 cbb9fd4 aebb7ea 5724596 162da29 3a80764 90dad3e cbb9fd4 7e2021c cbb9fd4 90dad3e 6e74e31 ea4122a 6e74e31 bb36f8e 6e74e31 ea4122a 6e74e31 ea4122a 6e74e31 cbb9fd4 3577555 6aa54e1 3577555 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 |
---
language:
- en
license: apache-2.0
tags:
- object-detection
- onnx
- safetensors
- AgTech
- transformers
library_name: pytorch
inference: false
datasets:
- Laudando-Associates-LLC/pucks
---
<h1 align="center"><strong>D-FINE</strong></h1>
<p align="center">
<a href="https://huggingface.co/Laudando-Associates-LLC/d-fine">
<img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface&style=for-the-badge">
</a>
</p>
<div align="justify">
[D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement](https://arxiv.org/abs/2410.13842) is a family of real-time object detectors that improve localization accuracy by rethinking how bounding boxes are predicted in DETR-style models. Instead of directly regressing box coordinates, D-FINE introduces a distribution based refinement approach that progressively sharpens predictions over multiple stages.
It also includes a self-distillation mechanism that passes refined localization knowledge to earlier layers, improving training efficiency and model robustness. Combined with lightweight architectural optimizations, D-FINE achieves a strong balance between speed and accuracy.
This repository provides five pretrained variants — Nano, Small, Medium, Large, and Extra Large — offering a trade-off between speed and accuracy for different deployment needs.
</div>
<h3 align="left">Sample Predictions Across D-FINE Variants</h3>
<table align="center">
<tr>
<td align="center"><img src="assets/nano.png" alt="Nano" style="width:100%; max-width:300px;"><br><strong>Nano</strong></td>
<td align="center"><img src="assets/small.png" alt="Small" style="width:100%; max-width:300px;"><br><strong>Small</strong></td>
</tr>
<tr>
<td align="center"><img src="assets/medium.png" alt="Medium" style="width:100%; max-width:300px;"><br><strong>Medium</strong></td>
<td align="center"><img src="assets/large.png" alt="Large" style="width:100%; max-width:300px;"><br><strong>Large</strong></td>
</tr>
</table>
## Try it in the Browser
You can test the model(s) using our interactive Gradio demo:
<p align="center">
<a href="https://huggingface.co/spaces/Laudando-Associates-LLC/d-fine-demo">
<img src="https://img.shields.io/badge/Launch%20Demo-Gradio-FF4B4B?logo=gradio&logoColor=white&style=for-the-badge">
</a>
</p>
## D-FINE Variants
The D-FINE family includes five model sizes trained on the [L&A Pucks Dataset](https://huggingface.co/datasets/Laudando-Associates-LLC/pucks), each offering a different balance between model size and detection accuracy.
| Variant | Parameters | mAP@[0.50:0.95] | Model Card | ONNX | PyTorch |
|:------------:|:----------:|:---------------:|:-----------:|:--------------:|:-------:|
| Nano | 3.76M | 0.825 | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-nano"><img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface&style=for-the-badge"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-nano/resolve/main/model.onnx"><img src="https://img.shields.io/badge/-ONNX-005CED?style=for-the-badge&logo=onnx&logoColor=white"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-nano/resolve/main/pytorch_model.bin"><img src="https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white"></a> |
| Small | 10.3M | 0.816 | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-small"><img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface&style=for-the-badge"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-small/resolve/main/model.onnx"><img src="https://img.shields.io/badge/-ONNX-005CED?style=for-the-badge&logo=onnx&logoColor=white"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-small/resolve/main/pytorch_model.bin"><img src="https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white"></a> |
| Medium | 19.6M | 0.840 | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-medium"><img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface&style=for-the-badge"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-medium/resolve/main/model.onnx"><img src="https://img.shields.io/badge/-ONNX-005CED?style=for-the-badge&logo=onnx&logoColor=white"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-medium/resolve/main/pytorch_model.bin"><img src="https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white"></a> |
| Large | 31.2M | 0.828 | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-large"><img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface&style=for-the-badge"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-large/resolve/main/model.onnx"><img src="https://img.shields.io/badge/-ONNX-005CED?style=for-the-badge&logo=onnx&logoColor=white"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-large/resolve/main/pytorch_model.bin"><img src="https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white"></a> |
| Extra Large | 62.7M | 0.803 | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-xlarge"><img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface&style=for-the-badge"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-xlarge/resolve/main/model.onnx"><img src="https://img.shields.io/badge/-ONNX-005CED?style=for-the-badge&logo=onnx&logoColor=white"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-xlarge/resolve/main/pytorch_model.bin"><img src="https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white"></a> |
> mAP values are evaluated on the validation set of the [L&A Pucks Dataset](https://huggingface.co/datasets/Laudando-Associates-LLC/pucks).
## Installation
```bash
pip install -r requirements.txt
```
> Tip: Use a virtual environment (venv or conda) to avoid dependency conflicts.
## Quick start on [L&A Pucks Dataset](https://huggingface.co/datasets/Laudando-Associates-LLC/pucks)
```python
from datasets import load_dataset
from transformers import AutoProcessor, AutoModel
from PIL import ImageDraw, ImageFont
# Load the validation split (or 'train')
ds = load_dataset("Laudando-Associates-LLC/pucks", split="test")
# Access the first example
image = ds[1]["image"]
# Load processor and model
processor = AutoProcessor.from_pretrained("Laudando-Associates-LLC/d-fine", trust_remote_code=True)
model = AutoModel.from_pretrained(f"Laudando-Associates-LLC/d-fine-nano", trust_remote_code=True)
# Process the image, reize and pad
inputs = processor(image)
# Run inference
outputs = model(**inputs, conf_threshold=0.4)
# Draw boxes
draw = ImageDraw.Draw(image)
font = ImageFont.truetype("DejaVuSans-Bold.ttf", size=24)
for result in outputs:
boxes = result["boxes"]
labels = result["labels"]
scores = result["scores"]
for box, label, score in zip(boxes, labels, scores):
x1, y1, x2, y2 = box.tolist()
draw.rectangle([x1, y1, x2, y2], outline="blue", width=5)
draw.text((x1, max(0, y1 - 25)), f"{score:.2f}", fill="blue", font=font)
# Save result
image.save("output.jpg")
```
## How to Use
The D-FINE model family uses a shared processor and variant-specific models. All components are compatible with Hugging Face's ```transformers``` library via ```trust_remote_code=True```.
### Step 1: Load the Preprocessor
The preprocessor is common to all D-FINE variants and handles resizing and padding.
```python
from transformers import AutoProcessor
# Load the shared D-FINE processor
processor = AutoProcessor.from_pretrained("Laudando-Associates-LLC/d-fine", trust_remote_code=True)
```
### Step 2: Load a D-FINE model variant
You can choose from any of the five variants: Nano, Small, Medium, Large, or Extra Large.
```python
from transformers import AutoModel
model_variant = "nano" # small, medium, large, xlarge
# Load the D-FINE model variant
model = AutoModel.from_pretrained(f"Laudando-Associates-LLC/d-fine-{model_variant}", trust_remote_code=True)
```
### Step 3: Run Inference
Using Pillow with a single or batch images:
```python
from PIL import Image
# Single image
image = Image.open("your_image.jpg").convert("RGB")
inputs = processor(image)
# Batch of images
batch_images = [
Image.open("image1.jpg").convert("RGB"),
Image.open("image2.jpg").convert("RGB")
]
inputs = processor(batch_images)
# Run inference
outputs = model(**inputs, conf_threshold=0.4)
for result in outputs:
boxes = result["boxes"] # [N, 4] bounding boxes (x1, y1, x2, y2)
labels = result["labels"] # [N] class indices
scores = result["scores"] # [N] confidence scores
```
Using OpenCV with a single or batch images:
```python
import cv2
# Single OpenCV image (BGR)
image = cv2.imread("your_image.jpg")
inputs = processor(image)
# Batch of OpenCV images
batch_images = [
cv2.imread("image1.jpg"),
cv2.imread("image2.jpg")
]
inputs = processor(batch_images)
# Run inference
outputs = model(**inputs, conf_threshold=0.4)
for result in outputs:
boxes = result["boxes"] # [N, 4] bounding boxes (x1, y1, x2, y2)
labels = result["labels"] # [N] class indices
scores = result["scores"] # [N] confidence scores
```
## License
The D-FINE models use [Apache License 2.0](https://github.com/Peterande/D-FINE/blob/master/LICENSE). The L&A Pucks Dataset which the models have been trained on use [L&Aser Dataset Replication License (Version 1.0)](https://huggingface.co/datasets/Laudando-Associates-LLC/pucks/blob/main/LICENSE).
## Citation
If you use `D-FINE` or its methods in your work, please cite the following BibTeX entries:
```latex
@misc{peng2024dfine,
title={D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement},
author={Yansong Peng and Hebei Li and Peixi Wu and Yueyi Zhang and Xiaoyan Sun and Feng Wu},
year={2024},
eprint={2410.13842},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
``` |