|
|
--- |
|
|
license: cc-by-4.0 |
|
|
library_name: pytorch |
|
|
tags: |
|
|
- computer-vision |
|
|
- object-tracking |
|
|
- spiking-neural-networks |
|
|
- visual-streaming-perception |
|
|
- energy-efficient |
|
|
- cvpr-2025 |
|
|
pipeline_tag: object-detection |
|
|
--- |
|
|
|
|
|
# ViStream: Law-of-Charge-Conservation Inspired Spiking Neural Network for Visual Streaming Perception |
|
|
|
|
|
**ViStream** is a novel energy-efficient framework for Visual Streaming Perception (VSP) that leverages Spiking Neural Networks (SNNs) with Law of Charge Conservation (LoCC) properties. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
|
|
|
- **Developed by:** Kang You, Ziling Wei, Jing Yan, Boning Zhang, Qinghai Guo, Yaoyu Zhang, Zhezhi He |
|
|
- **Model type:** Spiking Neural Network for Visual Streaming Perception |
|
|
- **Language(s):** PyTorch implementation |
|
|
- **License:** CC-BY-4.0 |
|
|
- **Paper:** [CVPR 2025](https://openaccess.thecvf.com/content/CVPR2025/papers/You_VISTREAM_Improving_Computation_Efficiency_of_Visual_Streaming_Perception_via_Law-of-Charge-Conservation_CVPR_2025_paper.pdf) |
|
|
- **Repository:** [GitHub](https://github.com/Intelligent-Computing-Research-Group/ViStream) |
|
|
|
|
|
### Model Architecture |
|
|
|
|
|
ViStream introduces two key innovations: |
|
|
1. **Law of Charge Conservation (LoCC)** property in ST-BIF neurons |
|
|
2. **Differential Encoding (DiffEncode)** scheme for temporal optimization |
|
|
|
|
|
The framework achieves significant computational reduction while maintaining accuracy equivalent to ANN counterparts. |
|
|
|
|
|
## Uses |
|
|
|
|
|
### Direct Use |
|
|
|
|
|
ViStream can be directly used for: |
|
|
- **Multiple Object Tracking (MOT)** |
|
|
- **Single Object Tracking (SOT)** |
|
|
- **Video Object Segmentation (VOS)** |
|
|
- **Multiple Object Tracking and Segmentation (MOTS)** |
|
|
- **Pose Tracking** |
|
|
|
|
|
### Downstream Use |
|
|
|
|
|
The model can be fine-tuned for various visual streaming perception tasks in: |
|
|
- Autonomous driving |
|
|
- UAV navigation |
|
|
- AR/VR applications |
|
|
- Real-time surveillance |
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
|
|
|
### Limitations |
|
|
- Requires specific hardware optimization for maximum energy benefits |
|
|
- Performance may vary with different frame rates |
|
|
- Limited to visual perception tasks |
|
|
|
|
|
### Recommendations |
|
|
- Test thoroughly on target hardware before deployment |
|
|
- Consider computational constraints of edge devices |
|
|
- Validate performance on domain-specific datasets |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
|
|
```python |
|
|
from huggingface_hub import hf_hub_download |
|
|
import torch |
|
|
|
|
|
# Download the checkpoint |
|
|
checkpoint_path = hf_hub_download( |
|
|
repo_id="AndyBlocker/ViStream", |
|
|
filename="checkpoint-90.pth" |
|
|
) |
|
|
|
|
|
# Load the model (requires ViStream implementation) |
|
|
checkpoint = torch.load(checkpoint_path, map_location='cpu') |
|
|
``` |
|
|
|
|
|
For complete usage examples, see the [GitHub repository](https://github.com/Intelligent-Computing-Research-Group/ViStream). |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
|
|
|
The model was trained on multiple datasets for various visual streaming perception tasks including object tracking, video object segmentation, and pose tracking. |
|
|
|
|
|
### Training Procedure |
|
|
|
|
|
**Training Details:** |
|
|
- Framework: PyTorch |
|
|
- Optimization: Energy-efficient SNN training with Law of Charge Conservation |
|
|
- Architecture: ResNet-based backbone with spike quantization layers |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
The model demonstrates competitive performance across multiple visual streaming perception tasks while achieving significant energy efficiency improvements compared to traditional ANN-based approaches. Detailed evaluation results are available in the [CVPR 2025 paper](https://openaccess.thecvf.com/content/CVPR2025/papers/You_VISTREAM_Improving_Computation_Efficiency_of_Visual_Streaming_Perception_via_Law-of-Charge-Conservation_CVPR_2025_paper.pdf). |
|
|
|
|
|
## Model Card Authors |
|
|
|
|
|
Kang You, Ziling Wei, Jing Yan, Boning Zhang, Qinghai Guo, Yaoyu Zhang, Zhezhi He |
|
|
|
|
|
## Model Card Contact |
|
|
|
|
|
For questions about this model, please open an issue in the [GitHub repository](https://github.com/Intelligent-Computing-Research-Group/ViStream). |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@inproceedings{you2025vistream, |
|
|
title={VISTREAM: Improving Computation Efficiency of Visual Streaming Perception via Law-of-Charge-Conservation Inspired Spiking Neural Network}, |
|
|
author={You, Kang and Wei, Ziling and Yan, Jing and Zhang, Boning and Guo, Qinghai and Zhang, Yaoyu and He, Zhezhi}, |
|
|
booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference}, |
|
|
pages={8796--8805}, |
|
|
year={2025} |
|
|
} |
|
|
``` |