| --- |
| license: mit |
| library_name: pytorch |
| tags: |
| - computer-vision |
| - object-tracking |
| - rgb-event |
| - event-camera |
| - cvpr-2024 |
| - sdstrack |
| datasets: |
| - krisspy39/visevent |
| --- |
| |
| # SDSTrack — RGB-E Checkpoint for VisEvent |
|
|
| This repository contains the **RGB-E (RGB-Event)** checkpoint for [SDSTrack](https://github.com/hoqolo/SDSTrack), a self-distillation symmetric adapter learning tracker for multi-modal single object tracking (CVPR 2024). |
|
|
| ## Model Details |
|
|
| | Attribute | Value | |
| |-----------|-------| |
| | **Tracker** | SDSTrack (Self-Distillation Symmetric Adapter Learning) | |
| | **Backbone** | ViT-B (Vision Transformer, base size) with MAE pretraining | |
| | **Modality** | RGB-E (RGB + Event camera) | |
| | **Dataset** | VisEvent | |
| | **Config** | `cvpr2024_rgbe` | |
| | **Training epochs** | 50 | |
| | **Batch size** | 16 | |
| | **Learning rate** | 1e-4 | |
| | **Expected Success AUC** | ~0.62 | |
| | **Expected Precision @ 20px** | ~0.74 | |
|
|
| ## Architecture |
|
|
| - **Symmetric adapters**: Lightweight parameter-efficient modules for each modality branch |
| - **Cross-modal fusion**: Symmetric fusion before the transformer head |
| - **Self-distillation**: Complementary masked patch distillation between RGB and event tokens |
| - **Foundation model**: OSTrack (One-stream transformer tracker) with ViT-B backbone |
|
|
| ## Files |
|
|
| | File | Size | Description | |
| |------|------|-------------| |
| | `SDSTrack_cvpr2024_rgbe.pth.tar` | ~490 MB | Trained checkpoint for RGB-E evaluation on VisEvent | |
|
|
| ## Usage |
|
|
| ### Loading the checkpoint in Python |
|
|
| ```python |
| from huggingface_hub import hf_hub_download |
| import torch |
| |
| checkpoint_path = hf_hub_download( |
| repo_id="krisspy39/sdstrack-rgbe", |
| filename="SDSTrack_cvpr2024_rgbe.pth.tar", |
| repo_type="model" |
| ) |
| |
| checkpoint = torch.load(checkpoint_path, map_location="cpu", weights_only=False) |
| ``` |
|
|
| ### Evaluation with upstream code |
|
|
| 1. Clone [SDSTrack](https://github.com/hoqolo/SDSTrack): |
| ```bash |
| git clone https://github.com/hoqolo/SDSTrack.git |
| cd SDSTrack |
| ``` |
|
|
| 2. Download the pretrained OSTrack foundation model to `./pretrained/vitb_256_mae_ce_32x4_ep300/OSTrack_ep0300.pth.tar` |
|
|
| 3. Symlink or copy this checkpoint to `./models/SDSTrack_cvpr2024_rgbe.pth.tar` |
|
|
| 4. Run evaluation: |
| ```bash |
| python ./RGBE_workspace/test_rgbe_mgpus.py \ |
| --script_name sdstrack \ |
| --num_gpus 1 \ |
| --threads 4 \ |
| --epoch 50 \ |
| --yaml_name cvpr2024_rgbe |
| ``` |
|
|
| ## Dataset |
|
|
| This checkpoint is trained and evaluated on **[VisEvent](https://github.com/wangxiao5791509/VisEvent_SOT_Benchmark)**, a large-scale RGB-Event single object tracking benchmark. |
|
|
| - **Train**: 120 sequences |
| - **Test**: ~302 sequences |
| - **Data format**: Each sequence contains `vis_imgs/` (RGB frames), `event_imgs/` (event frames), `groundtruth.txt`, and `absent_label.txt` |
|
|
| The VisEvent dataset is also available as a webdataset on Hugging Face: [krisspy39/visevent](https://huggingface.co/datasets/krisspy39/visevent) |
|
|
| ## Citation |
|
|
| If you use this model or the SDSTrack tracker, please cite: |
|
|
| ```bibtex |
| @inproceedings{hou2024sdstrack, |
| title={Self-Distillation Symmetric Adapter Learning for Multi-Modal Object Tracking}, |
| author={Hou, Xiaojun and others}, |
| booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, |
| year={2024} |
| } |
| ``` |
|
|
| ## License |
|
|
| This checkpoint is provided for research purposes. Please refer to the original [SDSTrack repository](https://github.com/hoqolo/SDSTrack) for licensing terms. |
|
|
| ## Acknowledgments |
|
|
| - Original SDSTrack implementation by [hoqolo](https://github.com/hoqolo) |
| - VisEvent dataset by [wangxiao5791509](https://github.com/wangxiao5791509/VisEvent_SOT_Benchmark) |
| - This checkpoint was reproduced as part of a university Pattern Recognition course project (Topic #65: Event-camera-based object tracking) |
|
|