Spongebobbbbbbbb commited on
Commit
a834971
·
verified ·
1 Parent(s): bd83593

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +125 -0
README.md ADDED
@@ -0,0 +1,125 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - multi-object-tracking
5
+ - MOT
6
+ - DETR
7
+ - object-detection
8
+ - computer-vision
9
+ - pytorch
10
+ - CVPR2026
11
+ datasets:
12
+ - DanceTrack
13
+ - SportsMOT
14
+ - BFT
15
+ language:
16
+ - en
17
+ pipeline_tag: object-detection
18
+ ---
19
+
20
+ # FDTA: From Detection to Association
21
+
22
+ [![arXiv](https://img.shields.io/badge/ArXiv-2512.02392-B31B1B.svg)](https://arxiv.org/abs/2512.02392)
23
+ [![GitHub](https://img.shields.io/badge/GitHub-FDTA-blue?logo=github)](https://github.com/Spongebobbbbbbbb/FDTA)
24
+
25
+ Official model weights for the paper **"From Detection to Association: Learning Discriminative Object Embeddings for Multi-Object Tracking"** (CVPR 2026).
26
+
27
+ > **TL;DR.** We reveal that DETR-based end-to-end MOT suffers from overly similar object embeddings. FDTA explicitly enhances discriminativeness in this paradigm.
28
+
29
+ ## Model Description
30
+
31
+ FDTA is built upon Deformable DETR with a ResNet-50 backbone. It introduces:
32
+
33
+ - **Spatial Adapter**: A depth-aware module that incorporates monocular depth estimation to enrich spatial understanding.
34
+ - **Temporal Adapter**: Trajectory-level temporal modeling for robust identity association across frames.
35
+ - **ID Decoder**: A dedicated decoder with learnable ID vocabulary to produce discriminative object embeddings for multi-object tracking.
36
+
37
+ ## Available Checkpoints
38
+
39
+ | File | Dataset | Training Split | Description |
40
+ |------|---------|----------------|-------------|
41
+ | `dancetrack.pth` | DanceTrack | train | Best model on DanceTrack |
42
+ | `sportsmot.pth` | SportsMOT | train | Best model on SportsMOT |
43
+ | `bft.pth` | BFT | train | Best model on BFT |
44
+
45
+ ## Main Results
46
+
47
+ ### DanceTrack
48
+
49
+ | Training Data | HOTA | IDF1 | AssA | MOTA | DetA |
50
+ |---------------|------|------|------|------|------|
51
+ | train | 71.7 | 77.2 | 63.5 | 91.3 | 81.0 |
52
+ | train+val | 74.4 | 80.0 | 67.0 | 92.2 | 82.7 |
53
+
54
+ ### SportsMOT
55
+
56
+ | Training Data | HOTA | IDF1 | AssA | MOTA | DetA |
57
+ |---------------|------|------|------|------|------|
58
+ | train | 74.2 | 78.5 | 65.5 | 93.0 | 84.1 |
59
+
60
+ ### BFT
61
+
62
+ | Training Data | HOTA | IDF1 | AssA | MOTA | DetA |
63
+ |---------------|------|------|------|------|------|
64
+ | train | 72.2 | 84.2 | 74.5 | 78.2 | 70.1 |
65
+
66
+ ## Usage
67
+
68
+ ### 1. Download Checkpoints
69
+
70
+ ```python
71
+ from huggingface_hub import hf_hub_download
72
+
73
+ # Download the DanceTrack checkpoint
74
+ ckpt_path = hf_hub_download(
75
+ repo_id="Spongebobbbbbbbb/FDTA",
76
+ filename="dancetrack.pth",
77
+ local_dir="./checkpoints/"
78
+ )
79
+ ```
80
+
81
+ Or manually download from the **Files** tab and place under `./checkpoints/`.
82
+
83
+ ### 2. Inference
84
+
85
+ ```shell
86
+ accelerate launch --num_processes=4 submit_and_evaluate.py \
87
+ --data-root /path/to/your/datasets/ \
88
+ --inference-mode evaluate \
89
+ --config-path ./configs/dancetrack.yaml \
90
+ --inference-model ./checkpoints/dancetrack.pth \
91
+ --outputs-dir ./outputs/ \
92
+ --inference-dataset DanceTrack \
93
+ --inference-split val
94
+ ```
95
+
96
+ > Add `--inference-dtype FP16` for faster inference with minimal performance loss.
97
+
98
+ For full training and evaluation instructions, please refer to the [GitHub repository](https://github.com/Spongebobbbbbbbb/FDTA).
99
+
100
+ ## Architecture Details
101
+
102
+ | Component | Details |
103
+ |-----------|---------|
104
+ | Backbone | ResNet-50 |
105
+ | Detector | Deformable DETR (6 encoder + 6 decoder layers) |
106
+ | Queries | 300 |
107
+ | Feature Dim | 256 |
108
+ | ID Decoder Layers | 6 |
109
+ | ID Vocabulary Size | 50 |
110
+ | Depth Estimation | LID mode, 150 bins |
111
+
112
+ ## Citation
113
+
114
+ ```bibtex
115
+ @article{shao2025fdta,
116
+ title={From Detection to Association: Learning Discriminative Object Embeddings for Multi-Object Tracking},
117
+ author={Shao, Yuqing and Yang, Yuchen and Yu, Rui and Li, Weilong and Guo, Xu and Yan, Huaicheng and Wang, Wei and Sun, Xiao},
118
+ journal={arXiv preprint arXiv:2512.02392},
119
+ year={2025}
120
+ }
121
+ ```
122
+
123
+ ## License
124
+
125
+ This project is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).