Model Zoo
Common Settings
- All COCO models were trained on
coco_2017_trainand evaluated oncoco_2017_val. - All models were trained using distributed training.
- Most models were trained with 50 epochs settings (~51 COCO epochs) with multi-step LR scheduler which is the common setting in DETR-like methods.
COCO Object Detection Baselines
Here we provides our pretrained baselines with detrex. And more pretrained weights will be released in the future version. We also provide our converted pretrained weights for the users which will be marked as (converted).
DETR
| Name | Backbone | Pretrained | Epochs | box AP |
Download |
|---|---|---|---|---|---|
| DETR-R50 (converted) | R-50 | IN1k | 500 | 42.0 | model |
| DETR-R50-DC5 (converted) | R-50 | IN1k | 500 | 43.4 | model |
| DETR-R101 (converted) | R-101 | IN1k | 500 | 43.5 | model |
| DETR-R101-DC5 (converted) | R-101 | IN1k | 500 | 44.9 | model |
Deformable-DETR
| Name | Backbone | Pretrained | Epochs | box AP |
Download |
|---|---|---|---|---|---|
| Deformable-DETR + Box Refinement | R50 | IN1k | 50 | 47.0 | model |
| Deformable-DETR + Box Refinement + Two Stage | R50 | IN1k | 50 | 48.2 | model |
Anchor-DETR
| Name | Backbone | Pretrain | Epochs | box AP |
download |
|---|---|---|---|---|---|
| Anchor-DETR-R50 | R-50 | IN1k | 50 | 41.9 | model |
| Anchor-DETR-R50 (converted) | R-50 | IN1k | 50 | 42.2 | model |
| Anchor-DETR-R50-DC5 (converted) | R-50 | IN1k | 50 | 44.2 | model |
| Anchor-DETR-R101 (converted) | R-101 | IN1k | 50 | 43.5 | model |
| Anchor-DETR-R101-DC5 (converted) | R-101 | IN1k | 50 | 45.1 | model |
Conditional-DETR
| Name | Backbone | Pretrain | Epochs | box AP |
download |
|---|---|---|---|---|---|
| Conditional-DETR-R50 | R-50 | IN1k | 50 | 41.6 | model |
| Conditional-DETR-R50-DC5 (converted) | R-50-DC5 | IN1k | 50 | 43.8 | model |
| Conditional-DETR-R101 (converted) | R-101 | IN1k | 50 | 43.0 | model |
| Conditional-DETR-R101-DC5 (converted) | R-101-DC5 | IN1k | 50 | 45.1 | model |
DAB-DETR
| Name | Backbone | Pretrained | Epochs | box AP |
Download |
|---|---|---|---|---|---|
| DAB-DETR-R50 | R50 | IN1k | 50 | 43.3 | model |
| DAB-DETR-R50-3patterns (converted) | R-50 | IN1k | 50 | 42.8 | model |
| DAB-DETR-R50-DC5 (converted) | R-50 | IN1k | 50 | 44.6 | model |
| DAB-DETR-R50-DC5-3patterns (converted) | R-50 | IN1k | 50 | 45.7 | model |
| DAB-DETR-R101 | R101 | IN1k | 50 | 44.0 | model |
| DAB-DETR-R101-DC5 (converted) | R-101 | IN1k | 50 | 45.7 | model |
| DAB-DETR-Swin-T | Swin-Tiny-224 | IN1k | 50 | 45.2 | model |
| DAB-Deformable-DETR-R50 | R50 | IN1k | 50 | 49.0 | model |
| DAB-Deformable-DETR-R50-Two-Stage | R50 | IN1k | 50 | 49.7 | model |
DN-DETR
| Name | Backbone | Pretrained | Epochs | box AP |
Download |
|---|---|---|---|---|---|
| DN-DETR-R50 | R50 | IN1k | 50 | 44.7 | model |
| DN-DETR-R50-DC5 (converted) | R50 | IN1k | 50 | 46.3 | model |
DINO
Pretrained DINO with ResNet Backbone
| Name | Backbone | Pretrained | Epochs | Denoising Queries | box AP |
Download |
|---|---|---|---|---|---|---|
| DINO-R50-4scale | R50 | IN1k | 12 | 100 | 49.2 | model |
| DINO-R50-4scale (hacked trainer) | R-50 | IN1k | 12 | 100 | 49.4 | model |
| DINO-R50-4scale with EMA | R-50 | IN1k | 12 | 100 | 49.4 | model |
| DINO-R50-5scale | R50 | IN1k | 12 | 100 | 49.6 | model |
| DINO-R50-4scale | R50 | IN1k | 12 | 300 | 49.5 | model |
| DINO-R50-4scale | R50 | IN1k | 24 | 100 | 50.6 | model |
| DINO-R101-4scale | R101 | IN1k | 12 | 100 | 50.0 | model |
Pretrained DINO with Swin-Transformer Backbone
| Name | Backbone | Pretrained | Epochs | Denoising Queries | box AP |
Download |
|---|---|---|---|---|---|---|
| DINO-Swin-T-224-4scale | Swin-Tiny-224 | IN1k | 12 | 100 | 51.3 | model |
| DINO-Swin-T-224-4scale | Swin-Tiny-224 | IN22k to IN1k | 12 | 100 | 52.5 | model |
| DINO-Swin-S-224-4scale | Swin-Small-224 | IN1k | 12 | 100 | 53.0 | model |
| DINO-Swin-B-384-4scale | Swin-Base-384 | IN22k to IN1k | 12 | 100 | 55.8 | model |
| DINO-Swin-L-224-4scale | Swin-Large-224 | IN22k to IN1k | 12 | 100 | 56.9 | model |
| DINO-Swin-L-384-4scale | Swin-Large-384 | IN22k to IN1k | 12 | 100 | 56.9 | model |
| DINO-Swin-L-384-5scale | Swin-Large-384 | IN22k to IN1k | 12 | 100 | 57.5 | model |
| DINO-Swin-L-384-4scale | Swin-Large-384 | IN22k to IN1k | 36 | 100 | 58.1 | model |
| DINO-Swin-L-384-5scale | Swin-Large-384 | IN22k to IN1k | 36 | 100 | 58.5 | model |
Pretrained DINO with FocalNet Backbone
| Name | Backbone | Pretrained | Epochs | Denoising Queries | box AP |
Download |
|---|---|---|---|---|---|---|
| DINO-FocalNet-Large-4scale | FocalNet-384-LRF-3Level | IN22k | 12 | 100 | 57.5 | model |
| DINO-FocalNet-Large-4scale | FocalNet-384-LRF-4Level | IN22k | 12 | 100 | 58.0 | model |
| DINO-FocalNet-Large-5scale | FocalNet-384-LRF-4Level | IN22k | 12 | 100 | 58.5 | model |
Pretrained DINO with ViTDet Backbone
| Name | Backbone | Pretrained | Epochs | Denoising Queries | box AP |
Download |
|---|---|---|---|---|---|---|
| DINO-ViTDet-Base-4scale | ViT | IN1k, MAE | 12 | 100 | 50.2 | model |
| DINO-ViTDet-Base-4scale | ViT | IN1k, MAE | 50 | 100 | 55.0 | model |
| DINO-ViTDet-Large-4scale | ViT | IN1k, MAE | 12 | 100 | 52.9 | model |
| DINO-ViTDet-Large-4scale | ViT | IN1k, MAE | 50 | 100 | 57.5 | model |
H-Deformable-DETR
| Name | Backbone | Pretrained | Query | Epochs | box AP |
Download |
|---|---|---|---|---|---|---|
| H-Deformable-DETR-R50 + tricks (detrex) | R50 | IN1k | 300 | 12 | 49.1 | model |
| H-Deformable-DETR-R50 + tricks (converted) | R50 | IN1k | 300 | 12 | 48.9 | model |
| H-Deformable-DETR-R50 + tricks (converted) | R50 | IN1k | 300 | 36 | 50.3 | model |
| H-Deformable-DETR-Swin-T + tricks (converted) | Swin-Tiny | IN1k | 300 | 12 | 50.6 | model |
| H-Deformable-DETR-Swin-T + tricks (converted) | Swin-Tiny | IN1k | 300 | 36 | 53.5 | model |
| H-Deformable-DETR-Swin-L + tricks (converted) | Swin-Large | IN22k | 300 | 12 | 56.2 | model |
| H-Deformable-DETR-Swin-L + tricks (converted) | Swin-Large | IN22k | 300 | 36 | 57.5 | model |
| H-Deformable-DETR-Swin-L + tricks (converted) | Swin-Large | IN22k | 900 | 12 | 56.4 | model |
| H-Deformable-DETR-Swin-L + tricks (converted) | Swin-Large | IN22k | 300 | 36 | 57.5 | model |
DETA
| Name | Backbone | Pretrained | Epochs | box AP |
Download |
|---|---|---|---|---|---|
| Improved-Deformable-DETR-R50 (converted) | R-50 | IN1k | 50 | 49.8 | model |
| DETA-R50-5scale (bs=8, 180000 iterations) | R-50 | IN1k | 12 | 50.0 | model |
| DETA-R50-5scale (with hacked train engine) | R-50 | IN1k | 12 | 49.9 | model |
| DETA-R50-5scale-12ep (no frozen backbone) | R-50 | IN1k | 12 | 50.2 | model |
| DETA-R50-5scale (converted) | R-50 | IN1k | 12 | 50.1 | model |
| DETA-Swin-Large-finetune (converted) | Swin-Large-384 | Object 365 | 24 | 62.9 | model |