sdstrack-rgbe / README.md
krisspy39's picture
Fix model card with YAML metadata
2a3aa69 verified
---
license: mit
library_name: pytorch
tags:
- computer-vision
- object-tracking
- rgb-event
- event-camera
- cvpr-2024
- sdstrack
datasets:
- krisspy39/visevent
---
# SDSTrack — RGB-E Checkpoint for VisEvent
This repository contains the **RGB-E (RGB-Event)** checkpoint for [SDSTrack](https://github.com/hoqolo/SDSTrack), a self-distillation symmetric adapter learning tracker for multi-modal single object tracking (CVPR 2024).
## Model Details
| Attribute | Value |
|-----------|-------|
| **Tracker** | SDSTrack (Self-Distillation Symmetric Adapter Learning) |
| **Backbone** | ViT-B (Vision Transformer, base size) with MAE pretraining |
| **Modality** | RGB-E (RGB + Event camera) |
| **Dataset** | VisEvent |
| **Config** | `cvpr2024_rgbe` |
| **Training epochs** | 50 |
| **Batch size** | 16 |
| **Learning rate** | 1e-4 |
| **Expected Success AUC** | ~0.62 |
| **Expected Precision @ 20px** | ~0.74 |
## Architecture
- **Symmetric adapters**: Lightweight parameter-efficient modules for each modality branch
- **Cross-modal fusion**: Symmetric fusion before the transformer head
- **Self-distillation**: Complementary masked patch distillation between RGB and event tokens
- **Foundation model**: OSTrack (One-stream transformer tracker) with ViT-B backbone
## Files
| File | Size | Description |
|------|------|-------------|
| `SDSTrack_cvpr2024_rgbe.pth.tar` | ~490 MB | Trained checkpoint for RGB-E evaluation on VisEvent |
## Usage
### Loading the checkpoint in Python
```python
from huggingface_hub import hf_hub_download
import torch
checkpoint_path = hf_hub_download(
repo_id="krisspy39/sdstrack-rgbe",
filename="SDSTrack_cvpr2024_rgbe.pth.tar",
repo_type="model"
)
checkpoint = torch.load(checkpoint_path, map_location="cpu", weights_only=False)
```
### Evaluation with upstream code
1. Clone [SDSTrack](https://github.com/hoqolo/SDSTrack):
```bash
git clone https://github.com/hoqolo/SDSTrack.git
cd SDSTrack
```
2. Download the pretrained OSTrack foundation model to `./pretrained/vitb_256_mae_ce_32x4_ep300/OSTrack_ep0300.pth.tar`
3. Symlink or copy this checkpoint to `./models/SDSTrack_cvpr2024_rgbe.pth.tar`
4. Run evaluation:
```bash
python ./RGBE_workspace/test_rgbe_mgpus.py \
--script_name sdstrack \
--num_gpus 1 \
--threads 4 \
--epoch 50 \
--yaml_name cvpr2024_rgbe
```
## Dataset
This checkpoint is trained and evaluated on **[VisEvent](https://github.com/wangxiao5791509/VisEvent_SOT_Benchmark)**, a large-scale RGB-Event single object tracking benchmark.
- **Train**: 120 sequences
- **Test**: ~302 sequences
- **Data format**: Each sequence contains `vis_imgs/` (RGB frames), `event_imgs/` (event frames), `groundtruth.txt`, and `absent_label.txt`
The VisEvent dataset is also available as a webdataset on Hugging Face: [krisspy39/visevent](https://huggingface.co/datasets/krisspy39/visevent)
## Citation
If you use this model or the SDSTrack tracker, please cite:
```bibtex
@inproceedings{hou2024sdstrack,
title={Self-Distillation Symmetric Adapter Learning for Multi-Modal Object Tracking},
author={Hou, Xiaojun and others},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2024}
}
```
## License
This checkpoint is provided for research purposes. Please refer to the original [SDSTrack repository](https://github.com/hoqolo/SDSTrack) for licensing terms.
## Acknowledgments
- Original SDSTrack implementation by [hoqolo](https://github.com/hoqolo)
- VisEvent dataset by [wangxiao5791509](https://github.com/wangxiao5791509/VisEvent_SOT_Benchmark)
- This checkpoint was reproduced as part of a university Pattern Recognition course project (Topic #65: Event-camera-based object tracking)