--- language: - en license: apache-2.0 tags: - object-detection - onnx - safetensors - AgTech - transformers library_name: pytorch inference: false datasets: - Laudando-Associates-LLC/pucks ---

D-FINE

[D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement](https://arxiv.org/abs/2410.13842) is a family of real-time object detectors that improve localization accuracy by rethinking how bounding boxes are predicted in DETR-style models. Instead of directly regressing box coordinates, D-FINE introduces a distribution based refinement approach that progressively sharpens predictions over multiple stages. It also includes a self-distillation mechanism that passes refined localization knowledge to earlier layers, improving training efficiency and model robustness. Combined with lightweight architectural optimizations, D-FINE achieves a strong balance between speed and accuracy. This repository provides five pretrained variants — Nano, Small, Medium, Large, and Extra Large — offering a trade-off between speed and accuracy for different deployment needs.

Sample Predictions Across D-FINE Variants

Nano
Nano
Small
Small
Medium
Medium
Large
Large
## Try it in the Browser You can test the model(s) using our interactive Gradio demo:

## D-FINE Variants The D-FINE family includes five model sizes trained on the [L&A Pucks Dataset](https://huggingface.co/datasets/Laudando-Associates-LLC/pucks), each offering a different balance between model size and detection accuracy. | Variant | Parameters | mAP@[0.50:0.95] | Model Card | ONNX | PyTorch | |:------------:|:----------:|:---------------:|:-----------:|:--------------:|:-------:| | Nano | 3.76M | 0.825 | | | | | Small | 10.3M | 0.816 | | | | | Medium | 19.6M | 0.840 | | | | | Large | 31.2M | 0.828 | | | | | Extra Large | 62.7M | 0.803 | | | | > mAP values are evaluated on the validation set of the [L&A Pucks Dataset](https://huggingface.co/datasets/Laudando-Associates-LLC/pucks). ## Installation ```bash pip install -r requirements.txt ``` > Tip: Use a virtual environment (venv or conda) to avoid dependency conflicts. ## Quick start on [L&A Pucks Dataset](https://huggingface.co/datasets/Laudando-Associates-LLC/pucks) ```python from datasets import load_dataset from transformers import AutoProcessor, AutoModel from PIL import ImageDraw, ImageFont # Load the validation split (or 'train') ds = load_dataset("Laudando-Associates-LLC/pucks", split="test") # Access the first example image = ds[1]["image"] # Load processor and model processor = AutoProcessor.from_pretrained("Laudando-Associates-LLC/d-fine", trust_remote_code=True) model = AutoModel.from_pretrained(f"Laudando-Associates-LLC/d-fine-nano", trust_remote_code=True) # Process the image, reize and pad inputs = processor(image) # Run inference outputs = model(**inputs, conf_threshold=0.4) # Draw boxes draw = ImageDraw.Draw(image) font = ImageFont.truetype("DejaVuSans-Bold.ttf", size=24) for result in outputs: boxes = result["boxes"] labels = result["labels"] scores = result["scores"] for box, label, score in zip(boxes, labels, scores): x1, y1, x2, y2 = box.tolist() draw.rectangle([x1, y1, x2, y2], outline="blue", width=5) draw.text((x1, max(0, y1 - 25)), f"{score:.2f}", fill="blue", font=font) # Save result image.save("output.jpg") ``` ## How to Use The D-FINE model family uses a shared processor and variant-specific models. All components are compatible with Hugging Face's ```transformers``` library via ```trust_remote_code=True```. ### Step 1: Load the Preprocessor The preprocessor is common to all D-FINE variants and handles resizing and padding. ```python from transformers import AutoProcessor # Load the shared D-FINE processor processor = AutoProcessor.from_pretrained("Laudando-Associates-LLC/d-fine", trust_remote_code=True) ``` ### Step 2: Load a D-FINE model variant You can choose from any of the five variants: Nano, Small, Medium, Large, or Extra Large. ```python from transformers import AutoModel model_variant = "nano" # small, medium, large, xlarge # Load the D-FINE model variant model = AutoModel.from_pretrained(f"Laudando-Associates-LLC/d-fine-{model_variant}", trust_remote_code=True) ``` ### Step 3: Run Inference Using Pillow with a single or batch images: ```python from PIL import Image # Single image image = Image.open("your_image.jpg").convert("RGB") inputs = processor(image) # Batch of images batch_images = [ Image.open("image1.jpg").convert("RGB"), Image.open("image2.jpg").convert("RGB") ] inputs = processor(batch_images) # Run inference outputs = model(**inputs, conf_threshold=0.4) for result in outputs: boxes = result["boxes"] # [N, 4] bounding boxes (x1, y1, x2, y2) labels = result["labels"] # [N] class indices scores = result["scores"] # [N] confidence scores ``` Using OpenCV with a single or batch images: ```python import cv2 # Single OpenCV image (BGR) image = cv2.imread("your_image.jpg") inputs = processor(image) # Batch of OpenCV images batch_images = [ cv2.imread("image1.jpg"), cv2.imread("image2.jpg") ] inputs = processor(batch_images) # Run inference outputs = model(**inputs, conf_threshold=0.4) for result in outputs: boxes = result["boxes"] # [N, 4] bounding boxes (x1, y1, x2, y2) labels = result["labels"] # [N] class indices scores = result["scores"] # [N] confidence scores ``` ## License The D-FINE models use [Apache License 2.0](https://github.com/Peterande/D-FINE/blob/master/LICENSE). The L&A Pucks Dataset which the models have been trained on use [L&Aser Dataset Replication License (Version 1.0)](https://huggingface.co/datasets/Laudando-Associates-LLC/pucks/blob/main/LICENSE). ## Citation If you use `D-FINE` or its methods in your work, please cite the following BibTeX entries: ```latex @misc{peng2024dfine, title={D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement}, author={Yansong Peng and Hebei Li and Peixi Wu and Yueyi Zhang and Xiaoyan Sun and Feng Wu}, year={2024}, eprint={2410.13842}, archivePrefix={arXiv}, primaryClass={cs.CV} } ```