upload config

74233a1 verified 3 months ago

16.3 kB

	<h2 align="center">
	DEIM: DETR with Improved Matching for Fast Convergence
	</h2>

	<p align="center">
	<a href="https://github.com/ShihuaHuang95/DEIM/blob/master/LICENSE">
	<img alt="license" src="https://img.shields.io/badge/LICENSE-Apache%202.0-blue">
	</a>
	<a href="https://arxiv.org/abs/2412.04234">
	<img alt="arXiv" src="https://img.shields.io/badge/arXiv-2412.04234-red">
	</a>
	<a href="https://www.shihuahuang.cn/DEIM/">
	<img alt="project webpage" src="https://img.shields.io/badge/Webpage-DEIM-purple">
	</a>
	<a href="https://github.com/ShihuaHuang95/DEIM/pulls">
	<img alt="prs" src="https://img.shields.io/github/issues-pr/ShihuaHuang95/DEIM">
	</a>
	<a href="https://github.com/ShihuaHuang95/DEIM/issues">
	<img alt="issues" src="https://img.shields.io/github/issues/ShihuaHuang95/DEIM?color=olive">
	</a>
	<a href="https://github.com/ShihuaHuang95/DEIM">
	<img alt="stars" src="https://img.shields.io/github/stars/ShihuaHuang95/DEIM">
	</a>
	<a href="mailto:shihuahuang95@gmail.com">
	<img alt="Contact Us" src="https://img.shields.io/badge/Contact-Email-yellow">
	</a>
	</p>
	<p align="center" style="font-size: 2.0em; font-weight: bold;">
	🎉 <strong>We’re excited to share <a href="https://intellindust-ai-lab.github.io/projects/DEIMv2/" style="color: #d9534f; text-decoration: none;">DEIMv2</a> </strong>🎉
	</p>


	<p align="center">
	DEIM is an advanced training framework designed to enhance the matching mechanism in DETRs, enabling faster convergence and improved accuracy. It serves as a robust foundation for future research and applications in the field of real-time object detection.
	</p>

	---


	<div align="center">
	<a href="http://www.shihuahuang.cn">Shihua Huang</a><sup>1</sup>,
	<a href="https://scholar.google.com/citations?user=tIFWBcQAAAAJ&hl=en">Zhichao Lu</a><sup>2</sup>,
	<a href="https://vinthony.github.io/academic/">Xiaodong Cun</a><sup>3</sup>,
	Yongjun Yu<sup>1</sup>,
	Xiao Zhou<sup>4</sup>,
	<a href="https://xishen0220.github.io">Xi Shen</a><sup>1*</sup>
	</div>


	<p align="center">
	<i>
	1. Intellindust AI Lab   2. City University of Hong Kong   3. Great Bay University   4. Hefei Normal University
	</i>
	</p>

	<p align="center">
	📧 Corresponding author: <a href="mailto:shenxiluc@gmail.com">shenxiluc@gmail.com</a>
	</p>

	<p align="center">
	<a href="https://paperswithcode.com/sota/real-time-object-detection-on-coco?p=deim-detr-with-improved-matching-for-fast">
	<img alt="sota" src="https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deim-detr-with-improved-matching-for-fast/real-time-object-detection-on-coco">
	</a>
	</p>

	<p align="center">
	<strong>If you like our work, please give us a ⭐!</strong>
	</p>


	<p align="center">
	<img src="./figures/teaser_a.png" alt="Image 1" width="49%">
	<img src="./figures/teaser_b.png" alt="Image 2" width="49%">
	</p>

	</details>



	## 🚀 Updates
	- [x] \[2025.09.26\] DEIMv2 is now available with the [project page](https://intellindust-ai-lab.github.io/projects/DEIMv2/) and [release code](https://github.com/Intellindust-AI-Lab/DEIMv2). The series covers eight model sizes, from X down to Atto. For the S, M, L, and X variants, we leverage DINOv3 features (distilled or pretrained). DEIMv2 achieves higher performance with fewer parameters and FLOPs.
	- [x] \[2025.06.24\] DEIMv2 is coming soon: our next-gen detection series, along with three ultra-light variants: Pico (1.5M), Femto (0.96M), and Atto (0.49M), all delivering SoTA performance. Atto, in particular, is tailored for mobile devices, achieving 23.8 AP on COCO at 320×320 resolution.
	- [x] \[2025.03.12\] The Object365 Pretrained [DEIM-D-FINE-X](https://drive.google.com/file/d/1RMNrHh3bYN0FfT5ZlWhXtQxkG23xb2xj/view?usp=drive_link) model is released, which achieves 59.5% AP after fine-tuning 24 COCO epochs.
	- [x] \[2025.03.05\] The Nano DEIM model is released.
	- [x] \[2025.02.27\] The DEIM paper is accepted to CVPR 2025. Thanks to all co-authors.
	- [x] \[2024.12.26\] A more efficient implementation of Dense O2O, achieving nearly a 30% improvement in loading speed (See [the pull request](https://github.com/ShihuaHuang95/DEIM/pull/13) for more details). Huge thanks to my colleague [Longfei Liu](https://github.com/capsule2077).
	- [x] \[2024.12.03\] Release DEIM series. Besides, this repo also supports the re-implmentations of [D-FINE](https://arxiv.org/abs/2410.13842) and [RT-DETR](https://arxiv.org/abs/2407.17140).

	## Table of Content
	* [1. Model Zoo](https://github.com/ShihuaHuang95/DEIM?tab=readme-ov-file#1-model-zoo)
	* [2. Quick start](https://github.com/ShihuaHuang95/DEIM?tab=readme-ov-file#2-quick-start)
	* [3. Usage](https://github.com/ShihuaHuang95/DEIM?tab=readme-ov-file#3-usage)
	* [4. Tools](https://github.com/ShihuaHuang95/DEIM?tab=readme-ov-file#4-tools)
	* [5. Citation](https://github.com/ShihuaHuang95/DEIM?tab=readme-ov-file#5-citation)
	* [6. Acknowledgement](https://github.com/ShihuaHuang95/DEIM?tab=readme-ov-file#6-acknowledgement)


	## 1. Model Zoo

	### DEIM-D-FINE
	\| Model \| Dataset \| AP<sup>D-FINE</sup> \| AP<sup>DEIM</sup> \| #Params \| Latency \| GFLOPs \| config \| checkpoint
	\| :---: \| :---: \| :---: \| :---: \| :---: \| :---: \| :---: \| :---: \| :---:
	N \| COCO \| 42.8 \| 43.0 \| 4M \| 2.12ms \| 7 \| [yml](./configs/deim_dfine/deim_hgnetv2_n_coco.yml) \| [ckpt](https://drive.google.com/file/d/1ZPEhiU9nhW4M5jLnYOFwTSLQC1Ugf62e/view?usp=sharing) \|
	S \| COCO \| 48.7 \| 49.0 \| 10M \| 3.49ms \| 25 \| [yml](./configs/deim_dfine/deim_hgnetv2_s_coco.yml) \| [ckpt](https://drive.google.com/file/d/1tB8gVJNrfb6dhFvoHJECKOF5VpkthhfC/view?usp=drive_link) \|
	M \| COCO \| 52.3 \| 52.7 \| 19M \| 5.62ms \| 57 \| [yml](./configs/deim_dfine/deim_hgnetv2_m_coco.yml) \| [ckpt](https://drive.google.com/file/d/18Lj2a6UN6k_n_UzqnJyiaiLGpDzQQit8/view?usp=drive_link) \|
	L \| COCO \| 54.0 \| 54.7 \| 31M \| 8.07ms \| 91 \| [yml](./configs/deim_dfine/deim_hgnetv2_l_coco.yml) \| [ckpt](https://drive.google.com/file/d/1PIRf02XkrA2xAD3wEiKE2FaamZgSGTAr/view?usp=drive_link) \|
	X \| COCO \| 55.8 \| 56.5 \| 62M \| 12.89ms \| 202 \| [yml](./configs/deim_dfine/deim_hgnetv2_x_coco.yml) \| [ckpt](https://drive.google.com/file/d/1dPtbgtGgq1Oa7k_LgH1GXPelg1IVeu0j/view?usp=drive_link) \|


	### DEIM-RT-DETRv2
	\| Model \| Dataset \| AP<sup>RT-DETRv2</sup> \| AP<sup>DEIM</sup> \| #Params \| Latency \| GFLOPs \| config \| checkpoint
	\| :---: \| :---: \| :---: \| :---: \| :---: \| :---: \| :---: \| :---: \| :---:
	S \| COCO \| 47.9 \| 49.0 \| 20M \| 4.59ms \| 60 \| [yml](./configs/deim_rtdetrv2/deim_r18vd_120e_coco.yml) \| [ckpt](https://drive.google.com/file/d/153_JKff6EpFgiLKaqkJsoDcLal_0ux_F/view?usp=drive_link) \|
	M \| COCO \| 49.9 \| 50.9 \| 31M \| 6.40ms \| 92 \| [yml](./configs/deim_rtdetrv2/deim_r34vd_120e_coco.yml) \| [ckpt](https://drive.google.com/file/d/1O9RjZF6kdFWGv1Etn1Toml4r-YfdMDMM/view?usp=drive_link) \|
	M* \| COCO \| 51.9 \| 53.2 \| 33M \| 6.90ms \| 100 \| [yml](./configs/deim_rtdetrv2/deim_r50vd_m_60e_coco.yml) \| [ckpt](https://drive.google.com/file/d/10dLuqdBZ6H5ip9BbBiE6S7ZcmHkRbD0E/view?usp=drive_link) \|
	L \| COCO \| 53.4 \| 54.3 \| 42M \| 9.15ms \| 136 \| [yml](./configs/deim_rtdetrv2/deim_r50vd_60e_coco.yml) \| [ckpt](https://drive.google.com/file/d/1mWknAXD5JYknUQ94WCEvPfXz13jcNOTI/view?usp=drive_link) \|
	X \| COCO \| 54.3 \| 55.5 \| 76M \| 13.66ms \| 259 \| [yml](./configs/deim_rtdetrv2/deim_r101vd_60e_coco.yml) \| [ckpt](https://drive.google.com/file/d/1BIevZijOcBO17llTyDX32F_pYppBfnzu/view?usp=drive_link) \|


	## 2. Quick start

	### Setup

	```shell
	conda create -n deim python=3.11.9
	conda activate deim
	pip install -r requirements.txt
	```


	### Data Preparation

	<details>
	<summary> COCO2017 Dataset </summary>

	1. Download COCO2017 from [OpenDataLab](https://opendatalab.com/OpenDataLab/COCO_2017) or [COCO](https://cocodataset.org/#download).
	1. Modify paths in [coco_detection.yml](./configs/dataset/coco_detection.yml)

	```yaml
	train_dataloader:
	img_folder: /data/COCO2017/train2017/
	ann_file: /data/COCO2017/annotations/instances_train2017.json
	val_dataloader:
	img_folder: /data/COCO2017/val2017/
	ann_file: /data/COCO2017/annotations/instances_val2017.json
	```

	</details>

	<details>
	<summary>Custom Dataset</summary>

	To train on your custom dataset, you need to organize it in the COCO format. Follow the steps below to prepare your dataset:

	1. Set `remap_mscoco_category` to `False`:

	This prevents the automatic remapping of category IDs to match the MSCOCO categories.

	```yaml
	remap_mscoco_category: False
	```

	2. Organize Images:

	Structure your dataset directories as follows:

	```shell
	dataset/
	├── images/
	│ ├── train/
	│ │ ├── image1.jpg
	│ │ ├── image2.jpg
	│ │ └── ...
	│ ├── val/
	│ │ ├── image1.jpg
	│ │ ├── image2.jpg
	│ │ └── ...
	└── annotations/
	├── instances_train.json
	├── instances_val.json
	└── ...
	```

	- `images/train/`: Contains all training images.
	- `images/val/`: Contains all validation images.
	- `annotations/`: Contains COCO-formatted annotation files.

	3. Convert Annotations to COCO Format:

	If your annotations are not already in COCO format, you'll need to convert them. You can use the following Python script as a reference or utilize existing tools:

	```python
	import json

	def convert_to_coco(input_annotations, output_annotations):
	# Implement conversion logic here
	pass

	if __name__ == "__main__":
	convert_to_coco('path/to/your_annotations.json', 'dataset/annotations/instances_train.json')
	```

	4. Update Configuration Files:

	Modify your [custom_detection.yml](./configs/dataset/custom_detection.yml).

	```yaml
	task: detection

	evaluator:
	type: CocoEvaluator
	iou_types: ['bbox', ]

	num_classes: 777 # your dataset classes
	remap_mscoco_category: False

	train_dataloader:
	type: DataLoader
	dataset:
	type: CocoDetection
	img_folder: /data/yourdataset/train
	ann_file: /data/yourdataset/train/train.json
	return_masks: False
	transforms:
	type: Compose
	ops: ~
	shuffle: True
	num_workers: 4
	drop_last: True
	collate_fn:
	type: BatchImageCollateFunction

	val_dataloader:
	type: DataLoader
	dataset:
	type: CocoDetection
	img_folder: /data/yourdataset/val
	ann_file: /data/yourdataset/val/ann.json
	return_masks: False
	transforms:
	type: Compose
	ops: ~
	shuffle: False
	num_workers: 4
	drop_last: False
	collate_fn:
	type: BatchImageCollateFunction
	```

	</details>


	## 3. Usage
	<details open>
	<summary> COCO2017 </summary>

	1. Training
	```shell
	CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/deim_dfine/deim_hgnetv2_${model}_coco.yml --use-amp --seed=0
	```

	<!-- <summary>2. Testing </summary> -->
	2. Testing
	```shell
	CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/deim_dfine/deim_hgnetv2_${model}_coco.yml --test-only -r model.pth
	```

	<!-- <summary>3. Tuning </summary> -->
	3. Tuning
	```shell
	CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/deim_dfine/deim_hgnetv2_${model}_coco.yml --use-amp --seed=0 -t model.pth
	```
	</details>

	<details>
	<summary> Customizing Batch Size </summary>

	For example, if you want to double the total batch size when training D-FINE-L on COCO2017, here are the steps you should follow:

	1. Modify your [dataloader.yml](./configs/base/dataloader.yml) to increase the `total_batch_size`:

	```yaml
	train_dataloader:
	total_batch_size: 64 # Previously it was 32, now doubled
	```

	2. Modify your [deim_hgnetv2_l_coco.yml](./configs/deim_dfine/deim_hgnetv2_l_coco.yml). Here’s how the key parameters should be adjusted:

	```yaml
	optimizer:
	type: AdamW
	params:
	-
	params: '^(?=.backbone)(?!.norm\|bn).*$'
	lr: 0.000025 # doubled, linear scaling law
	-
	params: '^(?=.(?:encoder\|decoder))(?=.(?:norm\|bn)).*$'
	weight_decay: 0.

	lr: 0.0005 # doubled, linear scaling law
	betas: [0.9, 0.999]
	weight_decay: 0.0001 # need a grid search

	ema: # added EMA settings
	decay: 0.9998 # adjusted by 1 - (1 - decay) * 2
	warmups: 500 # halved

	lr_warmup_scheduler:
	warmup_duration: 250 # halved
	```

	</details>


	<details>
	<summary> Customizing Input Size </summary>

	If you'd like to train DEIM on COCO2017 with an input size of 320x320, follow these steps:

	1. Modify your [dataloader.yml](./configs/base/dataloader.yml):

	```yaml

	train_dataloader:
	dataset:
	transforms:
	ops:
	- {type: Resize, size: [320, 320], }
	collate_fn:
	base_size: 320
	dataset:
	transforms:
	ops:
	- {type: Resize, size: [320, 320], }
	```

	2. Modify your [dfine_hgnetv2.yml](./configs/base/dfine_hgnetv2.yml):

	```yaml
	eval_spatial_size: [320, 320]
	```

	</details>

	## 4. Tools
	<details>
	<summary> Deployment </summary>

	<!-- <summary>4. Export onnx </summary> -->
	1. Setup
	```shell
	pip install onnx onnxsim
	```

	2. Export onnx
	```shell
	python tools/deployment/export_onnx.py --check -c configs/deim_dfine/deim_hgnetv2_${model}_coco.yml -r model.pth
	```

	3. Export [tensorrt](https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html)
	```shell
	trtexec --onnx="model.onnx" --saveEngine="model.engine" --fp16
	```

	</details>

	<details>
	<summary> Inference (Visualization) </summary>


	1. Setup
	```shell
	pip install -r tools/inference/requirements.txt
	```


	<!-- <summary>5. Inference </summary> -->
	2. Inference (onnxruntime / tensorrt / torch)

	Inference on images and videos is now supported.
	```shell
	python tools/inference/onnx_inf.py --onnx model.onnx --input image.jpg # video.mp4
	python tools/inference/trt_inf.py --trt model.engine --input image.jpg
	python tools/inference/torch_inf.py -c configs/deim_dfine/deim_hgnetv2_${model}_coco.yml -r model.pth --input image.jpg --device cuda:0
	```
	</details>

	<details>
	<summary> Benchmark </summary>

	1. Setup
	```shell
	pip install -r tools/benchmark/requirements.txt
	```

	<!-- <summary>6. Benchmark </summary> -->
	2. Model FLOPs, MACs, and Params
	```shell
	python tools/benchmark/get_info.py -c configs/deim_dfine/deim_hgnetv2_${model}_coco.yml
	```

	2. TensorRT Latency
	```shell
	python tools/benchmark/trt_benchmark.py --COCO_dir path/to/COCO2017 --engine_dir model.engine
	```
	</details>

	<details>
	<summary> Fiftyone Visualization </summary>

	1. Setup
	```shell
	pip install fiftyone
	```
	4. Voxel51 Fiftyone Visualization ([fiftyone](https://github.com/voxel51/fiftyone))
	```shell
	python tools/visualization/fiftyone_vis.py -c configs/deim_dfine/deim_hgnetv2_${model}_coco.yml -r model.pth
	```
	</details>

	<details>
	<summary> Others </summary>

	1. Auto Resume Training
	```shell
	bash reference/safe_training.sh
	```

	2. Converting Model Weights
	```shell
	python reference/convert_weight.py model.pth
	```
	</details>


	## 5. Citation
	If you use `DEIM` or its methods in your work, please cite the following BibTeX entries:
	<details open>
	<summary> bibtex </summary>

	```latex
	@misc{huang2024deim,
	title={DEIM: DETR with Improved Matching for Fast Convergence},
	author={Shihua, Huang and Zhichao, Lu and Xiaodong, Cun and Yongjun, Yu and Xiao, Zhou and Xi, Shen},
	booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
	year={2025},
	}
	```
	</details>

	## 6. Acknowledgement
	Our work is built upon [D-FINE](https://github.com/Peterande/D-FINE) and [RT-DETR](https://github.com/lyuwenyu/RT-DETR).

	✨ Feel free to contribute and reach out if you have any questions! ✨