SDSTrack — RGB-E Checkpoint for VisEvent

This repository contains the RGB-E (RGB-Event) checkpoint for SDSTrack, a self-distillation symmetric adapter learning tracker for multi-modal single object tracking (CVPR 2024).

Model Details

Attribute Value
Tracker SDSTrack (Self-Distillation Symmetric Adapter Learning)
Backbone ViT-B (Vision Transformer, base size) with MAE pretraining
Modality RGB-E (RGB + Event camera)
Dataset VisEvent
Config cvpr2024_rgbe
Training epochs 50
Batch size 16
Learning rate 1e-4
Expected Success AUC ~0.62
Expected Precision @ 20px ~0.74

Architecture

  • Symmetric adapters: Lightweight parameter-efficient modules for each modality branch
  • Cross-modal fusion: Symmetric fusion before the transformer head
  • Self-distillation: Complementary masked patch distillation between RGB and event tokens
  • Foundation model: OSTrack (One-stream transformer tracker) with ViT-B backbone

Files

File Size Description
SDSTrack_cvpr2024_rgbe.pth.tar ~490 MB Trained checkpoint for RGB-E evaluation on VisEvent

Usage

Loading the checkpoint in Python

from huggingface_hub import hf_hub_download
import torch

checkpoint_path = hf_hub_download(
    repo_id="krisspy39/sdstrack-rgbe",
    filename="SDSTrack_cvpr2024_rgbe.pth.tar",
    repo_type="model"
)

checkpoint = torch.load(checkpoint_path, map_location="cpu", weights_only=False)

Evaluation with upstream code

  1. Clone SDSTrack:
git clone https://github.com/hoqolo/SDSTrack.git
cd SDSTrack
  1. Download the pretrained OSTrack foundation model to ./pretrained/vitb_256_mae_ce_32x4_ep300/OSTrack_ep0300.pth.tar

  2. Symlink or copy this checkpoint to ./models/SDSTrack_cvpr2024_rgbe.pth.tar

  3. Run evaluation:

python ./RGBE_workspace/test_rgbe_mgpus.py \
  --script_name sdstrack \
  --num_gpus 1 \
  --threads 4 \
  --epoch 50 \
  --yaml_name cvpr2024_rgbe

Dataset

This checkpoint is trained and evaluated on VisEvent, a large-scale RGB-Event single object tracking benchmark.

  • Train: 120 sequences
  • Test: ~302 sequences
  • Data format: Each sequence contains vis_imgs/ (RGB frames), event_imgs/ (event frames), groundtruth.txt, and absent_label.txt

The VisEvent dataset is also available as a webdataset on Hugging Face: krisspy39/visevent

Citation

If you use this model or the SDSTrack tracker, please cite:

@inproceedings{hou2024sdstrack,
  title={Self-Distillation Symmetric Adapter Learning for Multi-Modal Object Tracking},
  author={Hou, Xiaojun and others},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2024}
}

License

This checkpoint is provided for research purposes. Please refer to the original SDSTrack repository for licensing terms.

Acknowledgments

  • Original SDSTrack implementation by hoqolo
  • VisEvent dataset by wangxiao5791509
  • This checkpoint was reproduced as part of a university Pattern Recognition course project (Topic #65: Event-camera-based object tracking)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train krisspy39/sdstrack-rgbe

Collection including krisspy39/sdstrack-rgbe