license: cc-by-4.0
library_name: pytorch
tags:
- computer-vision
- object-tracking
- spiking-neural-networks
- visual-streaming-perception
- energy-efficient
- cvpr-2025
pipeline_tag: object-detection
widget:
- src: >-
https://huggingface.co/datasets/mishig/sample_images/resolve/main/tiger.jpg
example_title: Object Tracking Example
datasets:
- MOT16
- MOT17
- DAVIS2017
- LaSOT
- GOT-10k
metrics:
- accuracy
- energy-efficiency
model-index:
- name: ViStream
results:
- task:
type: object-tracking
name: Multiple Object Tracking
dataset:
type: MOT16
name: MOT16
metrics:
- type: MOTA
value: 65.8
name: Multiple Object Tracking Accuracy
- task:
type: object-tracking
name: Single Object Tracking
dataset:
type: LaSOT
name: LaSOT
metrics:
- type: Success
value: 58.4
name: Success Rate
ViStream: Law-of-Charge-Conservation Inspired Spiking Neural Network for Visual Streaming Perception
ViStream is a novel energy-efficient framework for Visual Streaming Perception (VSP) that leverages Spiking Neural Networks (SNNs) with Law of Charge Conservation (LoCC) properties.
Model Details
Model Description
- Developed by: Kang You, Ziling Wei, Jing Yan, Boning Zhang, Qinghai Guo, Yaoyu Zhang, Zhezhi He
- Model type: Spiking Neural Network for Visual Streaming Perception
- Language(s): PyTorch implementation
- License: CC-BY-4.0
- Paper: CVPR 2025
- Repository: GitHub
Model Architecture
ViStream introduces two key innovations:
- Law of Charge Conservation (LoCC) property in ST-BIF neurons
- Differential Encoding (DiffEncode) scheme for temporal optimization
The framework achieves significant computational reduction while maintaining accuracy equivalent to ANN counterparts.
Uses
Direct Use
ViStream can be directly used for:
- Multiple Object Tracking (MOT)
- Single Object Tracking (SOT)
- Video Object Segmentation (VOS)
- Multiple Object Tracking and Segmentation (MOTS)
- Pose Tracking
Downstream Use
The model can be fine-tuned for various visual streaming perception tasks in:
- Autonomous driving
- UAV navigation
- AR/VR applications
- Real-time surveillance
Bias, Risks, and Limitations
Limitations
- Requires specific hardware optimization for maximum energy benefits
- Performance may vary with different frame rates
- Limited to visual perception tasks
Recommendations
- Test thoroughly on target hardware before deployment
- Consider computational constraints of edge devices
- Validate performance on domain-specific datasets
How to Get Started with the Model
from huggingface_hub import hf_hub_download
import torch
# Download the checkpoint
checkpoint_path = hf_hub_download(
repo_id="AndyBlocker/ViStream",
filename="checkpoint-90.pth"
)
# Load the model (requires ViStream implementation)
checkpoint = torch.load(checkpoint_path, map_location='cpu')
For complete usage examples, see the GitHub repository.
Training Details
Training Data
The model was trained on multiple datasets:
- MOT datasets: MOT16, MOT17 for multiple object tracking
- SOT datasets: LaSOT, GOT-10k for single object tracking
- VOS datasets: DAVIS2017 for video object segmentation
- Pose datasets: PoseTrack for human pose tracking
Training Procedure
Training Hyperparameters:
- Framework: PyTorch
- Optimization: Energy-efficient SNN training
- Architecture: ResNet-based backbone with spike quantization
Evaluation
Testing Data, Factors & Metrics
Datasets:
- MOT16/17 for multiple object tracking
- LaSOT, GOT-10k for single object tracking
- DAVIS2017 for video object segmentation
Metrics:
- Tracking Accuracy: MOTA, MOTP, Success Rate
- Energy Efficiency: SOP (Synaptic Operations), Power Consumption
- Speed: FPS, Latency
Results
| Task | Dataset | Metric | ViStream | ANN Baseline |
|---|---|---|---|---|
| MOT | MOT16 | MOTA | 65.8% | 66.1% |
| SOT | LaSOT | Success | 58.4% | 58.7% |
| VOS | DAVIS17 | J&F | 72.3% | 72.8% |
Energy Efficiency:
- 3.2x reduction in synaptic operations
- 2.8x improvement in energy efficiency
- Minimal accuracy degradation (<1%)
Model Card Authors
Kang You, Ziling Wei, Jing Yan, Boning Zhang, Qinghai Guo, Yaoyu Zhang, Zhezhi He
Model Card Contact
For questions about this model, please open an issue in the GitHub repository.
Citation
@inproceedings{you2025vistream,
title={VISTREAM: Improving Computation Efficiency of Visual Streaming Perception via Law-of-Charge-Conservation Inspired Spiking Neural Network},
author={You, Kang and Wei, Ziling and Yan, Jing and Zhang, Boning and Guo, Qinghai and Zhang, Yaoyu and He, Zhezhi},
booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
pages={8796--8805},
year={2025}
}