d-fine / README.md
jagennath-hari's picture
Update README.md
aebb7ea verified
---
language:
- en
license: apache-2.0
tags:
- object-detection
- onnx
- safetensors
- AgTech
- transformers
library_name: pytorch
inference: false
datasets:
- Laudando-Associates-LLC/pucks
---
<h1 align="center"><strong>D-FINE</strong></h1>
<p align="center">
<a href="https://huggingface.co/Laudando-Associates-LLC/d-fine">
<img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface&style=for-the-badge">
</a>
</p>
<div align="justify">
[D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement](https://arxiv.org/abs/2410.13842) is a family of real-time object detectors that improve localization accuracy by rethinking how bounding boxes are predicted in DETR-style models. Instead of directly regressing box coordinates, D-FINE introduces a distribution based refinement approach that progressively sharpens predictions over multiple stages.
It also includes a self-distillation mechanism that passes refined localization knowledge to earlier layers, improving training efficiency and model robustness. Combined with lightweight architectural optimizations, D-FINE achieves a strong balance between speed and accuracy.
This repository provides five pretrained variants — Nano, Small, Medium, Large, and Extra Large — offering a trade-off between speed and accuracy for different deployment needs.
</div>
<h3 align="left">Sample Predictions Across D-FINE Variants</h3>
<table align="center">
<tr>
<td align="center"><img src="assets/nano.png" alt="Nano" style="width:100%; max-width:300px;"><br><strong>Nano</strong></td>
<td align="center"><img src="assets/small.png" alt="Small" style="width:100%; max-width:300px;"><br><strong>Small</strong></td>
</tr>
<tr>
<td align="center"><img src="assets/medium.png" alt="Medium" style="width:100%; max-width:300px;"><br><strong>Medium</strong></td>
<td align="center"><img src="assets/large.png" alt="Large" style="width:100%; max-width:300px;"><br><strong>Large</strong></td>
</tr>
</table>
## Try it in the Browser
You can test the model(s) using our interactive Gradio demo:
<p align="center">
<a href="https://huggingface.co/spaces/Laudando-Associates-LLC/d-fine-demo">
<img src="https://img.shields.io/badge/Launch%20Demo-Gradio-FF4B4B?logo=gradio&logoColor=white&style=for-the-badge">
</a>
</p>
## D-FINE Variants
The D-FINE family includes five model sizes trained on the [L&A Pucks Dataset](https://huggingface.co/datasets/Laudando-Associates-LLC/pucks), each offering a different balance between model size and detection accuracy.
| Variant | Parameters | mAP@[0.50:0.95] | Model Card | ONNX | PyTorch |
|:------------:|:----------:|:---------------:|:-----------:|:--------------:|:-------:|
| Nano | 3.76M | 0.825 | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-nano"><img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface&style=for-the-badge"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-nano/resolve/main/model.onnx"><img src="https://img.shields.io/badge/-ONNX-005CED?style=for-the-badge&logo=onnx&logoColor=white"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-nano/resolve/main/pytorch_model.bin"><img src="https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white"></a> |
| Small | 10.3M | 0.816 | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-small"><img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface&style=for-the-badge"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-small/resolve/main/model.onnx"><img src="https://img.shields.io/badge/-ONNX-005CED?style=for-the-badge&logo=onnx&logoColor=white"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-small/resolve/main/pytorch_model.bin"><img src="https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white"></a> |
| Medium | 19.6M | 0.840 | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-medium"><img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface&style=for-the-badge"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-medium/resolve/main/model.onnx"><img src="https://img.shields.io/badge/-ONNX-005CED?style=for-the-badge&logo=onnx&logoColor=white"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-medium/resolve/main/pytorch_model.bin"><img src="https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white"></a> |
| Large | 31.2M | 0.828 | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-large"><img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface&style=for-the-badge"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-large/resolve/main/model.onnx"><img src="https://img.shields.io/badge/-ONNX-005CED?style=for-the-badge&logo=onnx&logoColor=white"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-large/resolve/main/pytorch_model.bin"><img src="https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white"></a> |
| Extra Large | 62.7M | 0.803 | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-xlarge"><img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface&style=for-the-badge"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-xlarge/resolve/main/model.onnx"><img src="https://img.shields.io/badge/-ONNX-005CED?style=for-the-badge&logo=onnx&logoColor=white"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-xlarge/resolve/main/pytorch_model.bin"><img src="https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white"></a> |
> mAP values are evaluated on the validation set of the [L&A Pucks Dataset](https://huggingface.co/datasets/Laudando-Associates-LLC/pucks).
## Installation
```bash
pip install -r requirements.txt
```
> Tip: Use a virtual environment (venv or conda) to avoid dependency conflicts.
## Quick start on [L&A Pucks Dataset](https://huggingface.co/datasets/Laudando-Associates-LLC/pucks)
```python
from datasets import load_dataset
from transformers import AutoProcessor, AutoModel
from PIL import ImageDraw, ImageFont
# Load the validation split (or 'train')
ds = load_dataset("Laudando-Associates-LLC/pucks", split="test")
# Access the first example
image = ds[1]["image"]
# Load processor and model
processor = AutoProcessor.from_pretrained("Laudando-Associates-LLC/d-fine", trust_remote_code=True)
model = AutoModel.from_pretrained(f"Laudando-Associates-LLC/d-fine-nano", trust_remote_code=True)
# Process the image, reize and pad
inputs = processor(image)
# Run inference
outputs = model(**inputs, conf_threshold=0.4)
# Draw boxes
draw = ImageDraw.Draw(image)
font = ImageFont.truetype("DejaVuSans-Bold.ttf", size=24)
for result in outputs:
boxes = result["boxes"]
labels = result["labels"]
scores = result["scores"]
for box, label, score in zip(boxes, labels, scores):
x1, y1, x2, y2 = box.tolist()
draw.rectangle([x1, y1, x2, y2], outline="blue", width=5)
draw.text((x1, max(0, y1 - 25)), f"{score:.2f}", fill="blue", font=font)
# Save result
image.save("output.jpg")
```
## How to Use
The D-FINE model family uses a shared processor and variant-specific models. All components are compatible with Hugging Face's ```transformers``` library via ```trust_remote_code=True```.
### Step 1: Load the Preprocessor
The preprocessor is common to all D-FINE variants and handles resizing and padding.
```python
from transformers import AutoProcessor
# Load the shared D-FINE processor
processor = AutoProcessor.from_pretrained("Laudando-Associates-LLC/d-fine", trust_remote_code=True)
```
### Step 2: Load a D-FINE model variant
You can choose from any of the five variants: Nano, Small, Medium, Large, or Extra Large.
```python
from transformers import AutoModel
model_variant = "nano" # small, medium, large, xlarge
# Load the D-FINE model variant
model = AutoModel.from_pretrained(f"Laudando-Associates-LLC/d-fine-{model_variant}", trust_remote_code=True)
```
### Step 3: Run Inference
Using Pillow with a single or batch images:
```python
from PIL import Image
# Single image
image = Image.open("your_image.jpg").convert("RGB")
inputs = processor(image)
# Batch of images
batch_images = [
Image.open("image1.jpg").convert("RGB"),
Image.open("image2.jpg").convert("RGB")
]
inputs = processor(batch_images)
# Run inference
outputs = model(**inputs, conf_threshold=0.4)
for result in outputs:
boxes = result["boxes"] # [N, 4] bounding boxes (x1, y1, x2, y2)
labels = result["labels"] # [N] class indices
scores = result["scores"] # [N] confidence scores
```
Using OpenCV with a single or batch images:
```python
import cv2
# Single OpenCV image (BGR)
image = cv2.imread("your_image.jpg")
inputs = processor(image)
# Batch of OpenCV images
batch_images = [
cv2.imread("image1.jpg"),
cv2.imread("image2.jpg")
]
inputs = processor(batch_images)
# Run inference
outputs = model(**inputs, conf_threshold=0.4)
for result in outputs:
boxes = result["boxes"] # [N, 4] bounding boxes (x1, y1, x2, y2)
labels = result["labels"] # [N] class indices
scores = result["scores"] # [N] confidence scores
```
## License
The D-FINE models use [Apache License 2.0](https://github.com/Peterande/D-FINE/blob/master/LICENSE). The L&A Pucks Dataset which the models have been trained on use [L&Aser Dataset Replication License (Version 1.0)](https://huggingface.co/datasets/Laudando-Associates-LLC/pucks/blob/main/LICENSE).
## Citation
If you use `D-FINE` or its methods in your work, please cite the following BibTeX entries:
```latex
@misc{peng2024dfine,
title={D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement},
author={Yansong Peng and Hebei Li and Peixi Wu and Yueyi Zhang and Xiaoyan Sun and Feng Wu},
year={2024},
eprint={2410.13842},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```