Fix model card with YAML metadata

2a3aa69 verified 2 days ago

3.75 kB

	---
	license: mit
	library_name: pytorch
	tags:
	- computer-vision
	- object-tracking
	- rgb-event
	- event-camera
	- cvpr-2024
	- sdstrack
	datasets:
	- krisspy39/visevent
	---

	# SDSTrack — RGB-E Checkpoint for VisEvent

	This repository contains the RGB-E (RGB-Event) checkpoint for [SDSTrack](https://github.com/hoqolo/SDSTrack), a self-distillation symmetric adapter learning tracker for multi-modal single object tracking (CVPR 2024).

	## Model Details

	\| Attribute \| Value \|
	\|-----------\|-------\|
	\| Tracker \| SDSTrack (Self-Distillation Symmetric Adapter Learning) \|
	\| Backbone \| ViT-B (Vision Transformer, base size) with MAE pretraining \|
	\| Modality \| RGB-E (RGB + Event camera) \|
	\| Dataset \| VisEvent \|
	\| Config \| `cvpr2024_rgbe` \|
	\| Training epochs \| 50 \|
	\| Batch size \| 16 \|
	\| Learning rate \| 1e-4 \|
	\| Expected Success AUC \| ~0.62 \|
	\| Expected Precision @ 20px \| ~0.74 \|

	## Architecture

	- Symmetric adapters: Lightweight parameter-efficient modules for each modality branch
	- Cross-modal fusion: Symmetric fusion before the transformer head
	- Self-distillation: Complementary masked patch distillation between RGB and event tokens
	- Foundation model: OSTrack (One-stream transformer tracker) with ViT-B backbone

	## Files

	\| File \| Size \| Description \|
	\|------\|------\|-------------\|
	\| `SDSTrack_cvpr2024_rgbe.pth.tar` \| ~490 MB \| Trained checkpoint for RGB-E evaluation on VisEvent \|

	## Usage

	### Loading the checkpoint in Python

	```python
	from huggingface_hub import hf_hub_download
	import torch

	checkpoint_path = hf_hub_download(
	repo_id="krisspy39/sdstrack-rgbe",
	filename="SDSTrack_cvpr2024_rgbe.pth.tar",
	repo_type="model"
	)

	checkpoint = torch.load(checkpoint_path, map_location="cpu", weights_only=False)
	```

	### Evaluation with upstream code

	1. Clone [SDSTrack](https://github.com/hoqolo/SDSTrack):
	```bash
	git clone https://github.com/hoqolo/SDSTrack.git
	cd SDSTrack
	```

	2. Download the pretrained OSTrack foundation model to `./pretrained/vitb_256_mae_ce_32x4_ep300/OSTrack_ep0300.pth.tar`

	3. Symlink or copy this checkpoint to `./models/SDSTrack_cvpr2024_rgbe.pth.tar`

	4. Run evaluation:
	```bash
	python ./RGBE_workspace/test_rgbe_mgpus.py \
	--script_name sdstrack \
	--num_gpus 1 \
	--threads 4 \
	--epoch 50 \
	--yaml_name cvpr2024_rgbe
	```

	## Dataset

	This checkpoint is trained and evaluated on [VisEvent](https://github.com/wangxiao5791509/VisEvent_SOT_Benchmark), a large-scale RGB-Event single object tracking benchmark.

	- Train: 120 sequences
	- Test: ~302 sequences
	- Data format: Each sequence contains `vis_imgs/` (RGB frames), `event_imgs/` (event frames), `groundtruth.txt`, and `absent_label.txt`

	The VisEvent dataset is also available as a webdataset on Hugging Face: [krisspy39/visevent](https://huggingface.co/datasets/krisspy39/visevent)

	## Citation

	If you use this model or the SDSTrack tracker, please cite:

	```bibtex
	@inproceedings{hou2024sdstrack,
	title={Self-Distillation Symmetric Adapter Learning for Multi-Modal Object Tracking},
	author={Hou, Xiaojun and others},
	booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
	year={2024}
	}
	```

	## License

	This checkpoint is provided for research purposes. Please refer to the original [SDSTrack repository](https://github.com/hoqolo/SDSTrack) for licensing terms.

	## Acknowledgments

	- Original SDSTrack implementation by [hoqolo](https://github.com/hoqolo)
	- VisEvent dataset by [wangxiao5791509](https://github.com/wangxiao5791509/VisEvent_SOT_Benchmark)
	- This checkpoint was reproduced as part of a university Pattern Recognition course project (Topic #65: Event-camera-based object tracking)