Spongebobbbbbbbb
/

FDTA

+---
+license: apache-2.0
+tags:
+  - multi-object-tracking
+  - MOT
+  - DETR
+  - object-detection
+  - computer-vision
+  - pytorch
+  - CVPR2026
+datasets:
+  - DanceTrack
+  - SportsMOT
+  - BFT
+language:
+  - en
+pipeline_tag: object-detection
+---
+# FDTA: From Detection to Association
+[![arXiv](https://img.shields.io/badge/ArXiv-2512.02392-B31B1B.svg)](https://arxiv.org/abs/2512.02392)
+[![GitHub](https://img.shields.io/badge/GitHub-FDTA-blue?logo=github)](https://github.com/Spongebobbbbbbbb/FDTA)
+Official model weights for the paper **"From Detection to Association: Learning Discriminative Object Embeddings for Multi-Object Tracking"** (CVPR 2026).
+> **TL;DR.** We reveal that DETR-based end-to-end MOT suffers from overly similar object embeddings. FDTA explicitly enhances discriminativeness in this paradigm.
+## Model Description
+FDTA is built upon Deformable DETR with a ResNet-50 backbone. It introduces:
+- **Spatial Adapter**: A depth-aware module that incorporates monocular depth estimation to enrich spatial understanding.
+- **Temporal Adapter**: Trajectory-level temporal modeling for robust identity association across frames.
+- **ID Decoder**: A dedicated decoder with learnable ID vocabulary to produce discriminative object embeddings for multi-object tracking.
+## Available Checkpoints
+| File | Dataset | Training Split | Description |
+|------|---------|----------------|-------------|
+| `dancetrack.pth` | DanceTrack | train | Best model on DanceTrack |
+| `sportsmot.pth` | SportsMOT | train | Best model on SportsMOT |
+| `bft.pth` | BFT | train | Best model on BFT |
+## Main Results
+### DanceTrack
+| Training Data | HOTA | IDF1 | AssA | MOTA | DetA |
+|---------------|------|------|------|------|------|
+| train         | 71.7 | 77.2 | 63.5 | 91.3 | 81.0 |
+| train+val     | 74.4 | 80.0 | 67.0 | 92.2 | 82.7 |
+### SportsMOT
+| Training Data | HOTA | IDF1 | AssA | MOTA | DetA |
+|---------------|------|------|------|------|------|
+| train         | 74.2 | 78.5 | 65.5 | 93.0 | 84.1 |
+### BFT
+| Training Data | HOTA | IDF1 | AssA | MOTA | DetA |
+|---------------|------|------|------|------|------|
+| train         | 72.2 | 84.2 | 74.5 | 78.2 | 70.1 |
+## Usage
+### 1. Download Checkpoints
+```python
+from huggingface_hub import hf_hub_download
+# Download the DanceTrack checkpoint
+ckpt_path = hf_hub_download(
+    repo_id="Spongebobbbbbbbb/FDTA",
+    filename="dancetrack.pth",
+    local_dir="./checkpoints/"
+)
+```
+Or manually download from the **Files** tab and place under `./checkpoints/`.
+### 2. Inference
+```shell
+accelerate launch --num_processes=4 submit_and_evaluate.py \
+  --data-root /path/to/your/datasets/ \
+  --inference-mode evaluate \
+  --config-path ./configs/dancetrack.yaml \
+  --inference-model ./checkpoints/dancetrack.pth \
+  --outputs-dir ./outputs/ \
+  --inference-dataset DanceTrack \
+  --inference-split val
+```
+> Add `--inference-dtype FP16` for faster inference with minimal performance loss.
+For full training and evaluation instructions, please refer to the [GitHub repository](https://github.com/Spongebobbbbbbbb/FDTA).
+## Architecture Details
+| Component | Details |
+|-----------|---------|
+| Backbone | ResNet-50 |
+| Detector | Deformable DETR (6 encoder + 6 decoder layers) |
+| Queries | 300 |
+| Feature Dim | 256 |
+| ID Decoder Layers | 6 |
+| ID Vocabulary Size | 50 |
+| Depth Estimation | LID mode, 150 bins |
+## Citation
+```bibtex
+@article{shao2025fdta,
+  title={From Detection to Association: Learning Discriminative Object Embeddings for Multi-Object Tracking},
+  author={Shao, Yuqing and Yang, Yuchen and Yu, Rui and Li, Weilong and Guo, Xu and Yan, Huaicheng and Wang, Wei and Sun, Xiao},
+  journal={arXiv preprint arXiv:2512.02392},
+  year={2025}
+}
+```
+## License
+This project is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).