Efficient and Accurate Memorable Conversation Model using DPO based on sLLM
Paper • 2407.06537 • Published
D-FINE HGNetv2-X fine-tuned for document layout analysis
Mô hình D-FINE HGNetv2-X được fine-tune cho bài toán nhận diện bố cục tài liệu (Document Layout Analysis). Mô hình sử dụng kiến trúc D-FINE tiên tiến với cơ chế Fine-grained Distribution Refinement (FDR) để tinh chỉnh bounding box, kết hợp backbone HGNetv2-B5 mạnh mẽ.
| Property | Value |
|---|---|
| Base model | D-FINE (HGNetv2-X) |
| Task | Object Detection |
| Domain | Document Layout |
| Input resolution | 640 × 640 |
| Training | 80 epochs (72 Stage 1 + 8 Stage 2) |
| ID | Label | Description |
|---|---|---|
| 0 | Object-detection |
Generic detectable region |
| 1 | Figure |
Diagrams, illustrations, charts |
| 2 | Icon |
Small symbolic graphics |
| 3 | Table |
Tabular data structures |
| Metric | Value |
|---|---|
| mAP @ [0.50 : 0.95] | 0.8375 |
| mAP @ 0.50 | 0.9182 |
| mAP @ 0.75 | 0.8966 |
| AP (Medium) | 0.8703 |
| AP (Large) | 0.8384 |
| AR @ 100 | 0.9348 |
pip install transformers torch torchvision
# Clone D-FINE source code (bắt buộc)
git clone https://github.com/Peterande/D-FINE.git
cd D-FINE
pip install -r requirements.txt
from transformers import AutoModel, AutoImageProcessor
from PIL import Image
processor = AutoImageProcessor.from_pretrained(
"ducnhan0804/dfine-x-obj-detection",
trust_remote_code=True
)
model = AutoModel.from_pretrained(
"ducnhan0804/dfine-x-obj-detection",
trust_remote_code=True
)
image = Image.open("path/to/your/document.png")
inputs = processor.preprocess(image)
outputs = model(inputs["images"], threshold=0.4)
results = processor.post_process_object_detection(outputs, threshold=0.4)
# results[0]["boxes"] → xyxy coordinates (tensor)
# results[0]["scores"] → confidence scores (tensor)
# results[0]["labels"] → class ids (tensor)
print(results)
Note:
trust_remote_code=Trueis required because this model uses custommodeling_dfine.py. The checkpoint (1GB) is downloaded automatically to `/.cache/huggingface/` on first run.
Important: You must run this code from the D-FINE root directory so that the source code (
src/) can be imported.
git clone https://github.com/Peterande/D-FINE.git
cd D-FINE
pip install -r requirements.txt
# Download checkpoint from HuggingFace
from huggingface_hub import hf_hub_download
weights = hf_hub_download(
repo_id="ducnhan0804/dfine-x-obj-detection",
filename="best_stg2.pth"
)
# Run inference using D-FINE's built-in script
# python tools/inference/torch_inf.py -c configs/dfine/custom/dfine_hgnetv2_x_custom.yml -r <weights_path> -i <image_path> -d cuda:0
model:
architecture: D-FINE (HGNetv2-X)
backbone: HGNetv2-B5
training:
total_epochs: 35
stage1_epochs: 30 (heavy augmentation)
stage2_epochs: 5 (fine-tuning, no heavy augmentation)
resolution: 640
device: cuda
optimizer:
type: AdamW
lr: 0.00025
backbone_lr: 0.0000025
datasets/
├── train/
│ ├── _annotations.coco.json
│ └── *.jpg
└── valid/
├── _annotations.coco.json
└── *.jpg
ducnhan0804/dfine-x-obj-detection
├── README.md # Model card
├── config.json # HuggingFace model config
├── preprocessor_config.json # Image processor config
├── modeling_dfine.py # Custom modeling code (trust_remote_code)
├── best_stg2.pth # Best checkpoint (Stage 2)
├── dfine_hgnetv2_x_custom.yml # D-FINE training config
└── custom_detection.yml # Dataset config
This model is released under the Apache License 2.0, inherited from the D-FINE base model.
Base model (D-FINE):
@article{peng2024d,
title={D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement},
author={Peng, Yansong and Songtao, Liu and others},
journal={arXiv preprint arXiv:2407.06537},
year={2024}
}