--- license: mit library_name: pytorch tags: - computer-vision - object-tracking - rgb-event - event-camera - cvpr-2024 - sdstrack datasets: - krisspy39/visevent --- # SDSTrack — RGB-E Checkpoint for VisEvent This repository contains the **RGB-E (RGB-Event)** checkpoint for [SDSTrack](https://github.com/hoqolo/SDSTrack), a self-distillation symmetric adapter learning tracker for multi-modal single object tracking (CVPR 2024). ## Model Details | Attribute | Value | |-----------|-------| | **Tracker** | SDSTrack (Self-Distillation Symmetric Adapter Learning) | | **Backbone** | ViT-B (Vision Transformer, base size) with MAE pretraining | | **Modality** | RGB-E (RGB + Event camera) | | **Dataset** | VisEvent | | **Config** | `cvpr2024_rgbe` | | **Training epochs** | 50 | | **Batch size** | 16 | | **Learning rate** | 1e-4 | | **Expected Success AUC** | ~0.62 | | **Expected Precision @ 20px** | ~0.74 | ## Architecture - **Symmetric adapters**: Lightweight parameter-efficient modules for each modality branch - **Cross-modal fusion**: Symmetric fusion before the transformer head - **Self-distillation**: Complementary masked patch distillation between RGB and event tokens - **Foundation model**: OSTrack (One-stream transformer tracker) with ViT-B backbone ## Files | File | Size | Description | |------|------|-------------| | `SDSTrack_cvpr2024_rgbe.pth.tar` | ~490 MB | Trained checkpoint for RGB-E evaluation on VisEvent | ## Usage ### Loading the checkpoint in Python ```python from huggingface_hub import hf_hub_download import torch checkpoint_path = hf_hub_download( repo_id="krisspy39/sdstrack-rgbe", filename="SDSTrack_cvpr2024_rgbe.pth.tar", repo_type="model" ) checkpoint = torch.load(checkpoint_path, map_location="cpu", weights_only=False) ``` ### Evaluation with upstream code 1. Clone [SDSTrack](https://github.com/hoqolo/SDSTrack): ```bash git clone https://github.com/hoqolo/SDSTrack.git cd SDSTrack ``` 2. Download the pretrained OSTrack foundation model to `./pretrained/vitb_256_mae_ce_32x4_ep300/OSTrack_ep0300.pth.tar` 3. Symlink or copy this checkpoint to `./models/SDSTrack_cvpr2024_rgbe.pth.tar` 4. Run evaluation: ```bash python ./RGBE_workspace/test_rgbe_mgpus.py \ --script_name sdstrack \ --num_gpus 1 \ --threads 4 \ --epoch 50 \ --yaml_name cvpr2024_rgbe ``` ## Dataset This checkpoint is trained and evaluated on **[VisEvent](https://github.com/wangxiao5791509/VisEvent_SOT_Benchmark)**, a large-scale RGB-Event single object tracking benchmark. - **Train**: 120 sequences - **Test**: ~302 sequences - **Data format**: Each sequence contains `vis_imgs/` (RGB frames), `event_imgs/` (event frames), `groundtruth.txt`, and `absent_label.txt` The VisEvent dataset is also available as a webdataset on Hugging Face: [krisspy39/visevent](https://huggingface.co/datasets/krisspy39/visevent) ## Citation If you use this model or the SDSTrack tracker, please cite: ```bibtex @inproceedings{hou2024sdstrack, title={Self-Distillation Symmetric Adapter Learning for Multi-Modal Object Tracking}, author={Hou, Xiaojun and others}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2024} } ``` ## License This checkpoint is provided for research purposes. Please refer to the original [SDSTrack repository](https://github.com/hoqolo/SDSTrack) for licensing terms. ## Acknowledgments - Original SDSTrack implementation by [hoqolo](https://github.com/hoqolo) - VisEvent dataset by [wangxiao5791509](https://github.com/wangxiao5791509/VisEvent_SOT_Benchmark) - This checkpoint was reproduced as part of a university Pattern Recognition course project (Topic #65: Event-camera-based object tracking)