Spongebobbbbbbbb
/

FDTA

multi-object-tracking

computer-vision

Model card Files Files and versions

Spongebobbbbbbbb commited on Mar 21

Commit

2a1e4a9

·

verified ·

1 Parent(s): a834971

Update README.md

Files changed (1) hide show

README.md +1 -39

README.md CHANGED Viewed

@@ -2,11 +2,8 @@
 license: apache-2.0
 tags:
   - multi-object-tracking
-  - MOT
   - DETR
-  - object-detection
   - computer-vision
-  - pytorch
   - CVPR2026
 datasets:
   - DanceTrack
@@ -14,7 +11,6 @@ datasets:
   - BFT
 language:
   - en
-pipeline_tag: object-detection
 ---
 # FDTA: From Detection to Association
@@ -26,13 +22,6 @@ Official model weights for the paper **"From Detection to Association: Learning
 > **TL;DR.** We reveal that DETR-based end-to-end MOT suffers from overly similar object embeddings. FDTA explicitly enhances discriminativeness in this paradigm.
-## Model Description
-FDTA is built upon Deformable DETR with a ResNet-50 backbone. It introduces:
-- **Spatial Adapter**: A depth-aware module that incorporates monocular depth estimation to enrich spatial understanding.
-- **Temporal Adapter**: Trajectory-level temporal modeling for robust identity association across frames.
-- **ID Decoder**: A dedicated decoder with learnable ID vocabulary to produce discriminative object embeddings for multi-object tracking.
 ## Available Checkpoints
@@ -78,36 +67,9 @@ ckpt_path = hf_hub_download(
 )
 ```
-Or manually download from the **Files** tab and place under `./checkpoints/`.
-### 2. Inference
-```shell
-accelerate launch --num_processes=4 submit_and_evaluate.py \
-  --data-root /path/to/your/datasets/ \
-  --inference-mode evaluate \
-  --config-path ./configs/dancetrack.yaml \
-  --inference-model ./checkpoints/dancetrack.pth \
-  --outputs-dir ./outputs/ \
-  --inference-dataset DanceTrack \
-  --inference-split val
-```
-> Add `--inference-dtype FP16` for faster inference with minimal performance loss.
 For full training and evaluation instructions, please refer to the [GitHub repository](https://github.com/Spongebobbbbbbbb/FDTA).
-## Architecture Details
-| Component | Details |
-|-----------|---------|
-| Backbone | ResNet-50 |
-| Detector | Deformable DETR (6 encoder + 6 decoder layers) |
-| Queries | 300 |
-| Feature Dim | 256 |
-| ID Decoder Layers | 6 |
-| ID Vocabulary Size | 50 |
-| Depth Estimation | LID mode, 150 bins |
 ## Citation

 license: apache-2.0
 tags:
   - multi-object-tracking
   - DETR
   - computer-vision
   - CVPR2026
 datasets:
   - DanceTrack
   - BFT
 language:
   - en
 ---
 # FDTA: From Detection to Association
 > **TL;DR.** We reveal that DETR-based end-to-end MOT suffers from overly similar object embeddings. FDTA explicitly enhances discriminativeness in this paradigm.
 ## Available Checkpoints
 )
 ```
 For full training and evaluation instructions, please refer to the [GitHub repository](https://github.com/Spongebobbbbbbbb/FDTA).
 ## Citation