SDSTrack — RGB-E Checkpoint for VisEvent
This repository contains the RGB-E (RGB-Event) checkpoint for SDSTrack, a self-distillation symmetric adapter learning tracker for multi-modal single object tracking (CVPR 2024).
Model Details
| Attribute | Value |
|---|---|
| Tracker | SDSTrack (Self-Distillation Symmetric Adapter Learning) |
| Backbone | ViT-B (Vision Transformer, base size) with MAE pretraining |
| Modality | RGB-E (RGB + Event camera) |
| Dataset | VisEvent |
| Config | cvpr2024_rgbe |
| Training epochs | 50 |
| Batch size | 16 |
| Learning rate | 1e-4 |
| Expected Success AUC | ~0.62 |
| Expected Precision @ 20px | ~0.74 |
Architecture
- Symmetric adapters: Lightweight parameter-efficient modules for each modality branch
- Cross-modal fusion: Symmetric fusion before the transformer head
- Self-distillation: Complementary masked patch distillation between RGB and event tokens
- Foundation model: OSTrack (One-stream transformer tracker) with ViT-B backbone
Files
| File | Size | Description |
|---|---|---|
SDSTrack_cvpr2024_rgbe.pth.tar |
~490 MB | Trained checkpoint for RGB-E evaluation on VisEvent |
Usage
Loading the checkpoint in Python
from huggingface_hub import hf_hub_download
import torch
checkpoint_path = hf_hub_download(
repo_id="krisspy39/sdstrack-rgbe",
filename="SDSTrack_cvpr2024_rgbe.pth.tar",
repo_type="model"
)
checkpoint = torch.load(checkpoint_path, map_location="cpu", weights_only=False)
Evaluation with upstream code
- Clone SDSTrack:
git clone https://github.com/hoqolo/SDSTrack.git
cd SDSTrack
Download the pretrained OSTrack foundation model to
./pretrained/vitb_256_mae_ce_32x4_ep300/OSTrack_ep0300.pth.tarSymlink or copy this checkpoint to
./models/SDSTrack_cvpr2024_rgbe.pth.tarRun evaluation:
python ./RGBE_workspace/test_rgbe_mgpus.py \
--script_name sdstrack \
--num_gpus 1 \
--threads 4 \
--epoch 50 \
--yaml_name cvpr2024_rgbe
Dataset
This checkpoint is trained and evaluated on VisEvent, a large-scale RGB-Event single object tracking benchmark.
- Train: 120 sequences
- Test: ~302 sequences
- Data format: Each sequence contains
vis_imgs/(RGB frames),event_imgs/(event frames),groundtruth.txt, andabsent_label.txt
The VisEvent dataset is also available as a webdataset on Hugging Face: krisspy39/visevent
Citation
If you use this model or the SDSTrack tracker, please cite:
@inproceedings{hou2024sdstrack,
title={Self-Distillation Symmetric Adapter Learning for Multi-Modal Object Tracking},
author={Hou, Xiaojun and others},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2024}
}
License
This checkpoint is provided for research purposes. Please refer to the original SDSTrack repository for licensing terms.
Acknowledgments
- Original SDSTrack implementation by hoqolo
- VisEvent dataset by wangxiao5791509
- This checkpoint was reproduced as part of a university Pattern Recognition course project (Topic #65: Event-camera-based object tracking)