d-fine / README.md

Update README.md

aebb7ea verified 8 months ago

10.3 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- object-detection
	- onnx
	- safetensors
	- AgTech
	- transformers
	library_name: pytorch
	inference: false
	datasets:
	- Laudando-Associates-LLC/pucks
	---

	<h1 align="center"><strong>D-FINE</strong></h1>

	<p align="center">
	<a href="https://huggingface.co/Laudando-Associates-LLC/d-fine">
	<img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface&style=for-the-badge">
	</a>
	</p>

	<div align="justify">

	[D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement](https://arxiv.org/abs/2410.13842) is a family of real-time object detectors that improve localization accuracy by rethinking how bounding boxes are predicted in DETR-style models. Instead of directly regressing box coordinates, D-FINE introduces a distribution based refinement approach that progressively sharpens predictions over multiple stages.

	It also includes a self-distillation mechanism that passes refined localization knowledge to earlier layers, improving training efficiency and model robustness. Combined with lightweight architectural optimizations, D-FINE achieves a strong balance between speed and accuracy.

	This repository provides five pretrained variants — Nano, Small, Medium, Large, and Extra Large — offering a trade-off between speed and accuracy for different deployment needs.

	</div>

	<h3 align="left">Sample Predictions Across D-FINE Variants</h3>

	<table align="center">
	<tr>
	<td align="center"><img src="assets/nano.png" alt="Nano" style="width:100%; max-width:300px;"><br><strong>Nano</strong></td>
	<td align="center"><img src="assets/small.png" alt="Small" style="width:100%; max-width:300px;"><br><strong>Small</strong></td>
	</tr>
	<tr>
	<td align="center"><img src="assets/medium.png" alt="Medium" style="width:100%; max-width:300px;"><br><strong>Medium</strong></td>
	<td align="center"><img src="assets/large.png" alt="Large" style="width:100%; max-width:300px;"><br><strong>Large</strong></td>
	</tr>
	</table>

	## Try it in the Browser

	You can test the model(s) using our interactive Gradio demo:

	<p align="center">
	<a href="https://huggingface.co/spaces/Laudando-Associates-LLC/d-fine-demo">
	<img src="https://img.shields.io/badge/Launch%20Demo-Gradio-FF4B4B?logo=gradio&logoColor=white&style=for-the-badge">
	</a>
	</p>

	## D-FINE Variants

	The D-FINE family includes five model sizes trained on the [L&A Pucks Dataset](https://huggingface.co/datasets/Laudando-Associates-LLC/pucks), each offering a different balance between model size and detection accuracy.

	\| Variant \| Parameters \| mAP@[0.50:0.95] \| Model Card \| ONNX \| PyTorch \|
	\|:------------:\|:----------:\|:---------------:\|:-----------:\|:--------------:\|:-------:\|
	\| Nano \| 3.76M \| 0.825 \| <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-nano"><img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface&style=for-the-badge"></a> \| <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-nano/resolve/main/model.onnx"><img src="https://img.shields.io/badge/-ONNX-005CED?style=for-the-badge&logo=onnx&logoColor=white"></a> \| <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-nano/resolve/main/pytorch_model.bin"><img src="https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white"></a> \|
	\| Small \| 10.3M \| 0.816 \| <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-small"><img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface&style=for-the-badge"></a> \| <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-small/resolve/main/model.onnx"><img src="https://img.shields.io/badge/-ONNX-005CED?style=for-the-badge&logo=onnx&logoColor=white"></a> \| <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-small/resolve/main/pytorch_model.bin"><img src="https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white"></a> \|
	\| Medium \| 19.6M \| 0.840 \| <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-medium"><img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface&style=for-the-badge"></a> \| <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-medium/resolve/main/model.onnx"><img src="https://img.shields.io/badge/-ONNX-005CED?style=for-the-badge&logo=onnx&logoColor=white"></a> \| <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-medium/resolve/main/pytorch_model.bin"><img src="https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white"></a> \|
	\| Large \| 31.2M \| 0.828 \| <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-large"><img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface&style=for-the-badge"></a> \| <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-large/resolve/main/model.onnx"><img src="https://img.shields.io/badge/-ONNX-005CED?style=for-the-badge&logo=onnx&logoColor=white"></a> \| <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-large/resolve/main/pytorch_model.bin"><img src="https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white"></a> \|
	\| Extra Large \| 62.7M \| 0.803 \| <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-xlarge"><img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface&style=for-the-badge"></a> \| <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-xlarge/resolve/main/model.onnx"><img src="https://img.shields.io/badge/-ONNX-005CED?style=for-the-badge&logo=onnx&logoColor=white"></a> \| <a href="https://huggingface.co/Laudando-Associates-LLC/d-fine-xlarge/resolve/main/pytorch_model.bin"><img src="https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white"></a> \|


	> mAP values are evaluated on the validation set of the [L&A Pucks Dataset](https://huggingface.co/datasets/Laudando-Associates-LLC/pucks).

	## Installation

	```bash
	pip install -r requirements.txt
	```

	> Tip: Use a virtual environment (venv or conda) to avoid dependency conflicts.

	## Quick start on [L&A Pucks Dataset](https://huggingface.co/datasets/Laudando-Associates-LLC/pucks)

	```python
	from datasets import load_dataset
	from transformers import AutoProcessor, AutoModel
	from PIL import ImageDraw, ImageFont

	# Load the validation split (or 'train')
	ds = load_dataset("Laudando-Associates-LLC/pucks", split="test")

	# Access the first example
	image = ds[1]["image"]

	# Load processor and model
	processor = AutoProcessor.from_pretrained("Laudando-Associates-LLC/d-fine", trust_remote_code=True)
	model = AutoModel.from_pretrained(f"Laudando-Associates-LLC/d-fine-nano", trust_remote_code=True)

	# Process the image, reize and pad
	inputs = processor(image)

	# Run inference
	outputs = model(**inputs, conf_threshold=0.4)

	# Draw boxes
	draw = ImageDraw.Draw(image)
	font = ImageFont.truetype("DejaVuSans-Bold.ttf", size=24)
	for result in outputs:
	boxes = result["boxes"]
	labels = result["labels"]
	scores = result["scores"]

	for box, label, score in zip(boxes, labels, scores):
	x1, y1, x2, y2 = box.tolist()
	draw.rectangle([x1, y1, x2, y2], outline="blue", width=5)
	draw.text((x1, max(0, y1 - 25)), f"{score:.2f}", fill="blue", font=font)

	# Save result
	image.save("output.jpg")
	```

	## How to Use

	The D-FINE model family uses a shared processor and variant-specific models. All components are compatible with Hugging Face's ```transformers``` library via ```trust_remote_code=True```.

	### Step 1: Load the Preprocessor

	The preprocessor is common to all D-FINE variants and handles resizing and padding.

	```python
	from transformers import AutoProcessor

	# Load the shared D-FINE processor
	processor = AutoProcessor.from_pretrained("Laudando-Associates-LLC/d-fine", trust_remote_code=True)
	```

	### Step 2: Load a D-FINE model variant

	You can choose from any of the five variants: Nano, Small, Medium, Large, or Extra Large.

	```python
	from transformers import AutoModel

	model_variant = "nano" # small, medium, large, xlarge

	# Load the D-FINE model variant
	model = AutoModel.from_pretrained(f"Laudando-Associates-LLC/d-fine-{model_variant}", trust_remote_code=True)
	```

	### Step 3: Run Inference

	Using Pillow with a single or batch images:

	```python
	from PIL import Image

	# Single image
	image = Image.open("your_image.jpg").convert("RGB")
	inputs = processor(image)

	# Batch of images
	batch_images = [
	Image.open("image1.jpg").convert("RGB"),
	Image.open("image2.jpg").convert("RGB")
	]
	inputs = processor(batch_images)

	# Run inference
	outputs = model(**inputs, conf_threshold=0.4)

	for result in outputs:
	boxes = result["boxes"] # [N, 4] bounding boxes (x1, y1, x2, y2)
	labels = result["labels"] # [N] class indices
	scores = result["scores"] # [N] confidence scores
	```

	Using OpenCV with a single or batch images:

	```python
	import cv2

	# Single OpenCV image (BGR)
	image = cv2.imread("your_image.jpg")
	inputs = processor(image)

	# Batch of OpenCV images
	batch_images = [
	cv2.imread("image1.jpg"),
	cv2.imread("image2.jpg")
	]
	inputs = processor(batch_images)

	# Run inference
	outputs = model(**inputs, conf_threshold=0.4)

	for result in outputs:
	boxes = result["boxes"] # [N, 4] bounding boxes (x1, y1, x2, y2)
	labels = result["labels"] # [N] class indices
	scores = result["scores"] # [N] confidence scores
	```

	## License
	The D-FINE models use [Apache License 2.0](https://github.com/Peterande/D-FINE/blob/master/LICENSE). The L&A Pucks Dataset which the models have been trained on use [L&Aser Dataset Replication License (Version 1.0)](https://huggingface.co/datasets/Laudando-Associates-LLC/pucks/blob/main/LICENSE).

	## Citation
	If you use `D-FINE` or its methods in your work, please cite the following BibTeX entries:

	```latex
	@misc{peng2024dfine,
	title={D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement},
	author={Yansong Peng and Hebei Li and Peixi Wu and Yueyi Zhang and Xiaoyan Sun and Feng Wu},
	year={2024},
	eprint={2410.13842},
	archivePrefix={arXiv},
	primaryClass={cs.CV}
	}
	```