d-fine

File size: 10,331 Bytes

0e51862
e2d5f98
 
0e51862
 
e1571ff
b4562a0
3577555
e1571ff
cee084c
5da9e4f
0e51862
cee084c
 
0e51862
 
cbb9fd4
fccf2c8
22ade2f
 
 
 
 
cbb9fd4
 
 
694db6f
 
 
cbb9fd4
 
 
 
b89b418
fc04234
56cb816
 
 
 
 
 
 
 
 
 
 
 
bb36f8e
 
368d516
bb36f8e
27e576d
368d516
 
 
27e576d
bb36f8e
3a80764
 
cbb9fd4
 
aebb7ea
5724596
 
 
 
 
 
162da29
3a80764
90dad3e
cbb9fd4
 
 
 
 
 
 
7e2021c
cbb9fd4
90dad3e
6e74e31
 
 
 
ea4122a
6e74e31
 
 
 
 
 
 
 
 
 
 
 
 
 
bb36f8e
6e74e31
 
ea4122a
 
 
6e74e31
ea4122a
 
 
 
 
 
 
 
 
 
 
6e74e31
 
cbb9fd4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3577555
6aa54e1
 
 
3577555

---
language:
  - en
license: apache-2.0
tags:
    - object-detection
    - onnx
    - safetensors
    - AgTech
    - transformers
library_name: pytorch
inference: false
datasets:
  - Laudando-Associates-LLC/pucks
---

<h1 align="center"><strong>D-FINE</strong></h1>

<p align="center">
  <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine">
    <img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface&style=for-the-badge">
  </a>
</p>

<div align="justify">

[D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement](https://arxiv.org/abs/2410.13842) is a family of real-time object detectors that improve localization accuracy by rethinking how bounding boxes are predicted in DETR-style models. Instead of directly regressing box coordinates, D-FINE introduces a distribution based refinement approach that progressively sharpens predictions over multiple stages.

It also includes a self-distillation mechanism that passes refined localization knowledge to earlier layers, improving training efficiency and model robustness. Combined with lightweight architectural optimizations, D-FINE achieves a strong balance between speed and accuracy.

This repository provides five pretrained variants — Nano, Small, Medium, Large, and Extra Large — offering a trade-off between speed and accuracy for different deployment needs.

</div>

<h3 align="left">Sample Predictions Across D-FINE Variants</h3>

<table align="center">
  <tr>
    <td align="center"><img src="assets/nano.png" alt="Nano" style="width:100%; max-width:300px;"><br><strong>Nano</strong></td>
    <td align="center"><img src="assets/small.png" alt="Small" style="width:100%; max-width:300px;"><br><strong>Small</strong></td>
  </tr>
  <tr>
    <td align="center"><img src="assets/medium.png" alt="Medium" style="width:100%; max-width:300px;"><br><strong>Medium</strong></td>
    <td align="center"><img src="assets/large.png" alt="Large" style="width:100%; max-width:300px;"><br><strong>Large</strong></td>
  </tr>
</table>

## Try it in the Browser

You can test the model(s) using our interactive Gradio demo:

<p align="center">
  <a href="https://huggingface.co/spaces/Laudando-Associates-LLC/d-fine-demo">
    <img src="https://img.shields.io/badge/Launch%20Demo-Gradio-FF4B4B?logo=gradio&logoColor=white&style=for-the-badge">
  </a>
</p>

## D-FINE Variants

The D-FINE family includes five model sizes trained on the [L&A Pucks Dataset](https://huggingface.co/datasets/Laudando-Associates-LLC/pucks), each offering a different balance between model size and detection accuracy.

| Variant      | Parameters | mAP@[0.50:0.95] | Model Card | ONNX | PyTorch |
|:------------:|:----------:|:---------------:|:-----------:|:--------------:|:-------:|
| Nano         | 3.76M      | 0.825           | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-nano"><img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface&style=for-the-badge"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-nano/resolve/main/model.onnx"><img src="https://img.shields.io/badge/-ONNX-005CED?style=for-the-badge&logo=onnx&logoColor=white"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-nano/resolve/main/pytorch_model.bin"><img src="https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white"></a> |
| Small        | 10.3M      | 0.816           | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-small"><img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface&style=for-the-badge"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-small/resolve/main/model.onnx"><img src="https://img.shields.io/badge/-ONNX-005CED?style=for-the-badge&logo=onnx&logoColor=white"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-small/resolve/main/pytorch_model.bin"><img src="https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white"></a> |
| Medium       | 19.6M      | 0.840           | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-medium"><img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface&style=for-the-badge"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-medium/resolve/main/model.onnx"><img src="https://img.shields.io/badge/-ONNX-005CED?style=for-the-badge&logo=onnx&logoColor=white"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-medium/resolve/main/pytorch_model.bin"><img src="https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white"></a> |
| Large        | 31.2M      | 0.828           | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-large"><img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface&style=for-the-badge"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-large/resolve/main/model.onnx"><img src="https://img.shields.io/badge/-ONNX-005CED?style=for-the-badge&logo=onnx&logoColor=white"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-large/resolve/main/pytorch_model.bin"><img src="https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white"></a> |
| Extra Large  | 62.7M      | 0.803           | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-xlarge"><img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface&style=for-the-badge"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-xlarge/resolve/main/model.onnx"><img src="https://img.shields.io/badge/-ONNX-005CED?style=for-the-badge&logo=onnx&logoColor=white"></a> | <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-xlarge/resolve/main/pytorch_model.bin"><img src="https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white"></a> |


> mAP values are evaluated on the validation set of the [L&A Pucks Dataset](https://huggingface.co/datasets/Laudando-Associates-LLC/pucks).

## Installation

```bash
pip install -r requirements.txt
```

> Tip: Use a virtual environment (venv or conda) to avoid dependency conflicts.

## Quick start on [L&A Pucks Dataset](https://huggingface.co/datasets/Laudando-Associates-LLC/pucks)

```python
from datasets import load_dataset
from transformers import AutoProcessor, AutoModel
from PIL import ImageDraw, ImageFont

# Load the validation split (or 'train')
ds = load_dataset("Laudando-Associates-LLC/pucks", split="test")

# Access the first example
image = ds[1]["image"]

# Load processor and model
processor = AutoProcessor.from_pretrained("Laudando-Associates-LLC/d-fine", trust_remote_code=True)
model = AutoModel.from_pretrained(f"Laudando-Associates-LLC/d-fine-nano", trust_remote_code=True)

# Process the image, reize and pad
inputs = processor(image)

# Run inference
outputs = model(**inputs, conf_threshold=0.4)

# Draw boxes
draw = ImageDraw.Draw(image)
font = ImageFont.truetype("DejaVuSans-Bold.ttf", size=24)
for result in outputs:
    boxes = result["boxes"]
    labels = result["labels"]
    scores = result["scores"]

    for box, label, score in zip(boxes, labels, scores):
        x1, y1, x2, y2 = box.tolist()
        draw.rectangle([x1, y1, x2, y2], outline="blue", width=5)
        draw.text((x1, max(0, y1 - 25)), f"{score:.2f}", fill="blue", font=font)

# Save result
image.save("output.jpg")
```

## How to Use

The D-FINE model family uses a shared processor and variant-specific models. All components are compatible with Hugging Face's ```transformers``` library via ```trust_remote_code=True```.

### Step 1: Load the Preprocessor

The preprocessor is common to all D-FINE variants and handles resizing and padding.

```python
from transformers import AutoProcessor

# Load the shared D-FINE processor
processor = AutoProcessor.from_pretrained("Laudando-Associates-LLC/d-fine", trust_remote_code=True)
```

### Step 2: Load a D-FINE model variant

You can choose from any of the five variants: Nano, Small, Medium, Large, or Extra Large.

```python
from transformers import AutoModel

model_variant = "nano" # small, medium, large, xlarge

# Load the D-FINE model variant
model = AutoModel.from_pretrained(f"Laudando-Associates-LLC/d-fine-{model_variant}", trust_remote_code=True)
```

### Step 3: Run Inference

Using Pillow with a single or batch images:

```python
from PIL import Image

# Single image
image = Image.open("your_image.jpg").convert("RGB")
inputs = processor(image)

# Batch of images
batch_images = [
    Image.open("image1.jpg").convert("RGB"),
    Image.open("image2.jpg").convert("RGB")
]
inputs = processor(batch_images)

# Run inference
outputs = model(**inputs, conf_threshold=0.4)

for result in outputs:
  boxes  = result["boxes"]   # [N, 4] bounding boxes (x1, y1, x2, y2)
  labels = result["labels"]  # [N] class indices
  scores = result["scores"]  # [N] confidence scores
```

Using OpenCV with a single or batch images:

```python
import cv2

# Single OpenCV image (BGR)
image = cv2.imread("your_image.jpg")
inputs = processor(image)

# Batch of OpenCV images
batch_images = [
    cv2.imread("image1.jpg"),
    cv2.imread("image2.jpg")
]
inputs = processor(batch_images)

# Run inference
outputs = model(**inputs, conf_threshold=0.4)

for result in outputs:
  boxes  = result["boxes"]   # [N, 4] bounding boxes (x1, y1, x2, y2)
  labels = result["labels"]  # [N] class indices
  scores = result["scores"]  # [N] confidence scores
```

## License
The D-FINE models use [Apache License 2.0](https://github.com/Peterande/D-FINE/blob/master/LICENSE). The L&A Pucks Dataset which the models have been trained on use [L&Aser Dataset Replication License (Version 1.0)](https://huggingface.co/datasets/Laudando-Associates-LLC/pucks/blob/main/LICENSE).

## Citation
If you use `D-FINE` or its methods in your work, please cite the following BibTeX entries:

```latex
@misc{peng2024dfine,
      title={D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement},
      author={Yansong Peng and Hebei Li and Peixi Wu and Yueyi Zhang and Xiaoyan Sun and Feng Wu},
      year={2024},
      eprint={2410.13842},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
```