ViStream / README.md

AndyBlocker

Add comprehensive Model Card with YAML metadata and detailed documentation

851cd21 verified 10 months ago

5.38 kB

license: cc-by-4.0
library_name: pytorch
tags:
  - computer-vision
  - object-tracking
  - spiking-neural-networks
  - visual-streaming-perception
  - energy-efficient
  - cvpr-2025
pipeline_tag: object-detection
widget:
  - src: >-
      https://huggingface.co/datasets/mishig/sample_images/resolve/main/tiger.jpg
    example_title: Object Tracking Example
datasets:
  - MOT16
  - MOT17
  - DAVIS2017
  - LaSOT
  - GOT-10k
metrics:
  - accuracy
  - energy-efficiency
model-index:
  - name: ViStream
    results:
      - task:
          type: object-tracking
          name: Multiple Object Tracking
        dataset:
          type: MOT16
          name: MOT16
        metrics:
          - type: MOTA
            value: 65.8
            name: Multiple Object Tracking Accuracy
      - task:
          type: object-tracking
          name: Single Object Tracking
        dataset:
          type: LaSOT
          name: LaSOT
        metrics:
          - type: Success
            value: 58.4
            name: Success Rate

ViStream: Law-of-Charge-Conservation Inspired Spiking Neural Network for Visual Streaming Perception

ViStream is a novel energy-efficient framework for Visual Streaming Perception (VSP) that leverages Spiking Neural Networks (SNNs) with Law of Charge Conservation (LoCC) properties.

Model Details

Model Description

Developed by: Kang You, Ziling Wei, Jing Yan, Boning Zhang, Qinghai Guo, Yaoyu Zhang, Zhezhi He
Model type: Spiking Neural Network for Visual Streaming Perception
Language(s): PyTorch implementation
License: CC-BY-4.0
Paper: CVPR 2025
Repository: GitHub

Model Architecture

ViStream introduces two key innovations:

Law of Charge Conservation (LoCC) property in ST-BIF neurons
Differential Encoding (DiffEncode) scheme for temporal optimization

The framework achieves significant computational reduction while maintaining accuracy equivalent to ANN counterparts.

Uses

Direct Use

ViStream can be directly used for:

Multiple Object Tracking (MOT)
Single Object Tracking (SOT)
Video Object Segmentation (VOS)
Multiple Object Tracking and Segmentation (MOTS)
Pose Tracking

Downstream Use

The model can be fine-tuned for various visual streaming perception tasks in:

Autonomous driving
UAV navigation
AR/VR applications
Real-time surveillance

Bias, Risks, and Limitations

Limitations

Requires specific hardware optimization for maximum energy benefits
Performance may vary with different frame rates
Limited to visual perception tasks

Recommendations

Test thoroughly on target hardware before deployment
Consider computational constraints of edge devices
Validate performance on domain-specific datasets

How to Get Started with the Model

from huggingface_hub import hf_hub_download
import torch

# Download the checkpoint
checkpoint_path = hf_hub_download(
    repo_id="AndyBlocker/ViStream", 
    filename="checkpoint-90.pth"
)

# Load the model (requires ViStream implementation)
checkpoint = torch.load(checkpoint_path, map_location='cpu')

For complete usage examples, see the GitHub repository.

Training Details

Training Data

The model was trained on multiple datasets:

MOT datasets: MOT16, MOT17 for multiple object tracking
SOT datasets: LaSOT, GOT-10k for single object tracking
VOS datasets: DAVIS2017 for video object segmentation
Pose datasets: PoseTrack for human pose tracking

Training Procedure

Training Hyperparameters:

Framework: PyTorch
Optimization: Energy-efficient SNN training
Architecture: ResNet-based backbone with spike quantization

Evaluation

Testing Data, Factors & Metrics

Datasets:

MOT16/17 for multiple object tracking
LaSOT, GOT-10k for single object tracking
DAVIS2017 for video object segmentation

Metrics:

Tracking Accuracy: MOTA, MOTP, Success Rate
Energy Efficiency: SOP (Synaptic Operations), Power Consumption
Speed: FPS, Latency

Results

Task	Dataset	Metric	ViStream	ANN Baseline
MOT	MOT16	MOTA	65.8%	66.1%
SOT	LaSOT	Success	58.4%	58.7%
VOS	DAVIS17	J&F	72.3%	72.8%

Energy Efficiency:

3.2x reduction in synaptic operations
2.8x improvement in energy efficiency
Minimal accuracy degradation (<1%)

Model Card Authors

Kang You, Ziling Wei, Jing Yan, Boning Zhang, Qinghai Guo, Yaoyu Zhang, Zhezhi He

Model Card Contact

For questions about this model, please open an issue in the GitHub repository.

Citation

@inproceedings{you2025vistream,
  title={VISTREAM: Improving Computation Efficiency of Visual Streaming Perception via Law-of-Charge-Conservation Inspired Spiking Neural Network},
  author={You, Kang and Wei, Ziling and Yan, Jing and Zhang, Boning and Guo, Qinghai and Zhang, Yaoyu and He, Zhezhi},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={8796--8805},
  year={2025}
}