🔍 D-FINE X — Object Detection for Document Layout

D-FINE HGNetv2-X fine-tuned for document layout analysis

license mAP50:95 mAP50 Python


Overview

Mô hình D-FINE HGNetv2-X được fine-tune cho bài toán nhận diện bố cục tài liệu (Document Layout Analysis). Mô hình sử dụng kiến trúc D-FINE tiên tiến với cơ chế Fine-grained Distribution Refinement (FDR) để tinh chỉnh bounding box, kết hợp backbone HGNetv2-B5 mạnh mẽ.

Property Value
Base model D-FINE (HGNetv2-X)
Task Object Detection
Domain Document Layout
Input resolution 640 × 640
Training 80 epochs (72 Stage 1 + 8 Stage 2)

Classes

ID Label Description
0 Object-detection Generic detectable region
1 Figure Diagrams, illustrations, charts
2 Icon Small symbolic graphics
3 Table Tabular data structures

Benchmark

✅ Best validation metrics (Stage 2)

Metric Value
mAP @ [0.50 : 0.95] 0.8375
mAP @ 0.50 0.9182
mAP @ 0.75 0.8966
AP (Medium) 0.8703
AP (Large) 0.8384
AR @ 100 0.9348

Usage

Option 1 — HuggingFace Transformers (recommended)

pip install transformers torch torchvision
# Clone D-FINE source code (bắt buộc)
git clone https://github.com/Peterande/D-FINE.git
cd D-FINE
pip install -r requirements.txt
from transformers import AutoModel, AutoImageProcessor
from PIL import Image

processor = AutoImageProcessor.from_pretrained(
    "ducnhan0804/dfine-x-obj-detection",
    trust_remote_code=True
)
model = AutoModel.from_pretrained(
    "ducnhan0804/dfine-x-obj-detection",
    trust_remote_code=True
)

image = Image.open("path/to/your/document.png")
inputs = processor.preprocess(image)
outputs = model(inputs["images"], threshold=0.4)
results = processor.post_process_object_detection(outputs, threshold=0.4)

# results[0]["boxes"]  → xyxy coordinates (tensor)
# results[0]["scores"] → confidence scores (tensor)
# results[0]["labels"] → class ids (tensor)
print(results)

Note: trust_remote_code=True is required because this model uses custom modeling_dfine.py. The checkpoint (1GB) is downloaded automatically to `/.cache/huggingface/` on first run.

Important: You must run this code from the D-FINE root directory so that the source code (src/) can be imported.

Option 2 — D-FINE directly

git clone https://github.com/Peterande/D-FINE.git
cd D-FINE
pip install -r requirements.txt
# Download checkpoint from HuggingFace
from huggingface_hub import hf_hub_download

weights = hf_hub_download(
    repo_id="ducnhan0804/dfine-x-obj-detection",
    filename="best_stg2.pth"
)

# Run inference using D-FINE's built-in script
# python tools/inference/torch_inf.py -c configs/dfine/custom/dfine_hgnetv2_x_custom.yml -r <weights_path> -i <image_path> -d cuda:0

Training Configuration

model:
  architecture: D-FINE (HGNetv2-X)
  backbone: HGNetv2-B5

training:
  total_epochs: 35
  stage1_epochs: 30 (heavy augmentation)
  stage2_epochs: 5 (fine-tuning, no heavy augmentation)
  resolution: 640
  device: cuda

optimizer:
  type: AdamW
  lr: 0.00025
  backbone_lr: 0.0000025

Dataset split (COCO format)

datasets/
├── train/
│   ├── _annotations.coco.json
│   └── *.jpg
└── valid/
    ├── _annotations.coco.json
    └── *.jpg

Repository Structure

ducnhan0804/dfine-x-obj-detection
├── README.md                        # Model card
├── config.json                      # HuggingFace model config
├── preprocessor_config.json         # Image processor config
├── modeling_dfine.py                # Custom modeling code (trust_remote_code)
├── best_stg2.pth                    # Best checkpoint (Stage 2)
├── dfine_hgnetv2_x_custom.yml      # D-FINE training config
└── custom_detection.yml             # Dataset config

Limitations

  • Trained on a specific document layout dataset; generalization to other document types is not guaranteed.
  • Requires D-FINE source code to be cloned and available locally for inference.
  • The checkpoint file is ~1GB in size.

License

This model is released under the Apache License 2.0, inherited from the D-FINE base model.


Citation

Base model (D-FINE):

@article{peng2024d,
  title={D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement},
  author={Peng, Yansong and Songtao, Liu and others},
  journal={arXiv preprint arXiv:2407.06537},
  year={2024}
}
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for ducnhan0804/dfine-x-obj-detection