|
|
--- |
|
|
language: |
|
|
- en |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- object-detection |
|
|
- onnx |
|
|
- safetensors |
|
|
- AgTech |
|
|
- transformers |
|
|
library_name: pytorch |
|
|
inference: false |
|
|
datasets: |
|
|
- Laudando-Associates-LLC/pucks |
|
|
--- |
|
|
|
|
|
<h1 align="center"><strong>D-FINE</strong></h1> |
|
|
|
|
|
<p align="center"> |
|
|
<a href="https://huggingface.co/Laudando-Associates-LLC/d-fine"> |
|
|
<img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface&style=for-the-badge"> |
|
|
</a> |
|
|
</p> |
|
|
|
|
|
<div align="justify"> |
|
|
|
|
|
[D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement](https://arxiv.org/abs/2410.13842) is a family of real-time object detectors that improve localization accuracy by rethinking how bounding boxes are predicted in DETR-style models. Instead of directly regressing box coordinates, D-FINE introduces a distribution based refinement approach that progressively sharpens predictions over multiple stages. |
|
|
|
|
|
It also includes a self-distillation mechanism that passes refined localization knowledge to earlier layers, improving training efficiency and model robustness. Combined with lightweight architectural optimizations, D-FINE achieves a strong balance between speed and accuracy. |
|
|
|
|
|
This repository provides five pretrained variants — Nano, Small, Medium, Large, and Extra Large — offering a trade-off between speed and accuracy for different deployment needs. |
|
|
|
|
|
</div> |
|
|
|
|
|
<h3 align="left">Sample Predictions Across D-FINE Variants</h3> |
|
|
|
|
|
<table align="center"> |
|
|
<tr> |
|
|
<td align="center"><img src="assets/nano.png" alt="Nano" style="width:100%; max-width:300px;"><br><strong>Nano</strong></td> |
|
|
<td align="center"><img src="assets/small.png" alt="Small" style="width:100%; max-width:300px;"><br><strong>Small</strong></td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td align="center"><img src="assets/medium.png" alt="Medium" style="width:100%; max-width:300px;"><br><strong>Medium</strong></td> |
|
|
<td align="center"><img src="assets/large.png" alt="Large" style="width:100%; max-width:300px;"><br><strong>Large</strong></td> |
|
|
</tr> |
|
|
</table> |
|
|
|
|
|
## Try it in the Browser |
|
|
|
|
|
You can test the model(s) using our interactive Gradio demo: |
|
|
|
|
|
<p align="center"> |
|
|
<a href="https://huggingface.co/spaces/Laudando-Associates-LLC/d-fine-demo"> |
|
|
<img src="https://img.shields.io/badge/Launch%20Demo-Gradio-FF4B4B?logo=gradio&logoColor=white&style=for-the-badge"> |
|
|
</a> |
|
|
</p> |
|
|
|
|
|
## D-FINE Variants |
|
|
|
|
|
The D-FINE family includes five model sizes trained on the [L&A Pucks Dataset](https://huggingface.co/datasets/Laudando-Associates-LLC/pucks), each offering a different balance between model size and detection accuracy. |
|
|
|
|
|
| Variant | Parameters | mAP@[0.50:0.95] | Model Card | ONNX | PyTorch | |
|
|
|:------------:|:----------:|:---------------:|:-----------:|:--------------:|:-------:| |
|
|
| Nano | 3.76M | 0.825 | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-nano"><img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface&style=for-the-badge"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-nano/resolve/main/model.onnx"><img src="https://img.shields.io/badge/-ONNX-005CED?style=for-the-badge&logo=onnx&logoColor=white"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-nano/resolve/main/pytorch_model.bin"><img src="https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white"></a> | |
|
|
| Small | 10.3M | 0.816 | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-small"><img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface&style=for-the-badge"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-small/resolve/main/model.onnx"><img src="https://img.shields.io/badge/-ONNX-005CED?style=for-the-badge&logo=onnx&logoColor=white"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-small/resolve/main/pytorch_model.bin"><img src="https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white"></a> | |
|
|
| Medium | 19.6M | 0.840 | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-medium"><img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface&style=for-the-badge"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-medium/resolve/main/model.onnx"><img src="https://img.shields.io/badge/-ONNX-005CED?style=for-the-badge&logo=onnx&logoColor=white"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-medium/resolve/main/pytorch_model.bin"><img src="https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white"></a> | |
|
|
| Large | 31.2M | 0.828 | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-large"><img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface&style=for-the-badge"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-large/resolve/main/model.onnx"><img src="https://img.shields.io/badge/-ONNX-005CED?style=for-the-badge&logo=onnx&logoColor=white"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-large/resolve/main/pytorch_model.bin"><img src="https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white"></a> | |
|
|
| Extra Large | 62.7M | 0.803 | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-xlarge"><img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface&style=for-the-badge"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-xlarge/resolve/main/model.onnx"><img src="https://img.shields.io/badge/-ONNX-005CED?style=for-the-badge&logo=onnx&logoColor=white"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-xlarge/resolve/main/pytorch_model.bin"><img src="https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white"></a> | |
|
|
|
|
|
|
|
|
> mAP values are evaluated on the validation set of the [L&A Pucks Dataset](https://huggingface.co/datasets/Laudando-Associates-LLC/pucks). |
|
|
|
|
|
## Installation |
|
|
|
|
|
```bash |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
> Tip: Use a virtual environment (venv or conda) to avoid dependency conflicts. |
|
|
|
|
|
## Quick start on [L&A Pucks Dataset](https://huggingface.co/datasets/Laudando-Associates-LLC/pucks) |
|
|
|
|
|
```python |
|
|
from datasets import load_dataset |
|
|
from transformers import AutoProcessor, AutoModel |
|
|
from PIL import ImageDraw, ImageFont |
|
|
|
|
|
# Load the validation split (or 'train') |
|
|
ds = load_dataset("Laudando-Associates-LLC/pucks", split="test") |
|
|
|
|
|
# Access the first example |
|
|
image = ds[1]["image"] |
|
|
|
|
|
# Load processor and model |
|
|
processor = AutoProcessor.from_pretrained("Laudando-Associates-LLC/d-fine", trust_remote_code=True) |
|
|
model = AutoModel.from_pretrained(f"Laudando-Associates-LLC/d-fine-nano", trust_remote_code=True) |
|
|
|
|
|
# Process the image, reize and pad |
|
|
inputs = processor(image) |
|
|
|
|
|
# Run inference |
|
|
outputs = model(**inputs, conf_threshold=0.4) |
|
|
|
|
|
# Draw boxes |
|
|
draw = ImageDraw.Draw(image) |
|
|
font = ImageFont.truetype("DejaVuSans-Bold.ttf", size=24) |
|
|
for result in outputs: |
|
|
boxes = result["boxes"] |
|
|
labels = result["labels"] |
|
|
scores = result["scores"] |
|
|
|
|
|
for box, label, score in zip(boxes, labels, scores): |
|
|
x1, y1, x2, y2 = box.tolist() |
|
|
draw.rectangle([x1, y1, x2, y2], outline="blue", width=5) |
|
|
draw.text((x1, max(0, y1 - 25)), f"{score:.2f}", fill="blue", font=font) |
|
|
|
|
|
# Save result |
|
|
image.save("output.jpg") |
|
|
``` |
|
|
|
|
|
## How to Use |
|
|
|
|
|
The D-FINE model family uses a shared processor and variant-specific models. All components are compatible with Hugging Face's ```transformers``` library via ```trust_remote_code=True```. |
|
|
|
|
|
### Step 1: Load the Preprocessor |
|
|
|
|
|
The preprocessor is common to all D-FINE variants and handles resizing and padding. |
|
|
|
|
|
```python |
|
|
from transformers import AutoProcessor |
|
|
|
|
|
# Load the shared D-FINE processor |
|
|
processor = AutoProcessor.from_pretrained("Laudando-Associates-LLC/d-fine", trust_remote_code=True) |
|
|
``` |
|
|
|
|
|
### Step 2: Load a D-FINE model variant |
|
|
|
|
|
You can choose from any of the five variants: Nano, Small, Medium, Large, or Extra Large. |
|
|
|
|
|
```python |
|
|
from transformers import AutoModel |
|
|
|
|
|
model_variant = "nano" # small, medium, large, xlarge |
|
|
|
|
|
# Load the D-FINE model variant |
|
|
model = AutoModel.from_pretrained(f"Laudando-Associates-LLC/d-fine-{model_variant}", trust_remote_code=True) |
|
|
``` |
|
|
|
|
|
### Step 3: Run Inference |
|
|
|
|
|
Using Pillow with a single or batch images: |
|
|
|
|
|
```python |
|
|
from PIL import Image |
|
|
|
|
|
# Single image |
|
|
image = Image.open("your_image.jpg").convert("RGB") |
|
|
inputs = processor(image) |
|
|
|
|
|
# Batch of images |
|
|
batch_images = [ |
|
|
Image.open("image1.jpg").convert("RGB"), |
|
|
Image.open("image2.jpg").convert("RGB") |
|
|
] |
|
|
inputs = processor(batch_images) |
|
|
|
|
|
# Run inference |
|
|
outputs = model(**inputs, conf_threshold=0.4) |
|
|
|
|
|
for result in outputs: |
|
|
boxes = result["boxes"] # [N, 4] bounding boxes (x1, y1, x2, y2) |
|
|
labels = result["labels"] # [N] class indices |
|
|
scores = result["scores"] # [N] confidence scores |
|
|
``` |
|
|
|
|
|
Using OpenCV with a single or batch images: |
|
|
|
|
|
```python |
|
|
import cv2 |
|
|
|
|
|
# Single OpenCV image (BGR) |
|
|
image = cv2.imread("your_image.jpg") |
|
|
inputs = processor(image) |
|
|
|
|
|
# Batch of OpenCV images |
|
|
batch_images = [ |
|
|
cv2.imread("image1.jpg"), |
|
|
cv2.imread("image2.jpg") |
|
|
] |
|
|
inputs = processor(batch_images) |
|
|
|
|
|
# Run inference |
|
|
outputs = model(**inputs, conf_threshold=0.4) |
|
|
|
|
|
for result in outputs: |
|
|
boxes = result["boxes"] # [N, 4] bounding boxes (x1, y1, x2, y2) |
|
|
labels = result["labels"] # [N] class indices |
|
|
scores = result["scores"] # [N] confidence scores |
|
|
``` |
|
|
|
|
|
## License |
|
|
The D-FINE models use [Apache License 2.0](https://github.com/Peterande/D-FINE/blob/master/LICENSE). The L&A Pucks Dataset which the models have been trained on use [L&Aser Dataset Replication License (Version 1.0)](https://huggingface.co/datasets/Laudando-Associates-LLC/pucks/blob/main/LICENSE). |
|
|
|
|
|
## Citation |
|
|
If you use `D-FINE` or its methods in your work, please cite the following BibTeX entries: |
|
|
|
|
|
```latex |
|
|
@misc{peng2024dfine, |
|
|
title={D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement}, |
|
|
author={Yansong Peng and Hebei Li and Peixi Wu and Yueyi Zhang and Xiaoyan Sun and Feng Wu}, |
|
|
year={2024}, |
|
|
eprint={2410.13842}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CV} |
|
|
} |
|
|
``` |