nielsr HF Staff

Improve model card: Add PTQ4VM paper, pipeline tag, and library

fcd769c verified about 1 year ago

2.47 kB

license: apache-2.0
pipeline_tag: image-classification
library_name: pytorch

Vim Model Card

This repository contains the model based on the paper PTQ4VM: Post-Training Quantization for Visual Mamba.

Model Details

Vision Mamba (Vim) is a generic backbone trained on the ImageNet-1K dataset for vision tasks.

Developed by: HUST, Horizon Robotics, BAAI
Model type: A generic vision backbone based on the bidirectional state space model (SSM) architecture.
License: Non-commercial license

Github repository:

https://github.com/YoungHyun197/ptq4vm

Model Sources

Repository: https://github.com/hustvl/Vim
Paper: https://arxiv.org/abs/2401.09417

Uses

The primary use of Vim is research on vision tasks, e.g., classification, segmentation, detection, and instance segmentation, with an SSM-based backbone. The primary intended users of the model are researchers and hobbyists in computer vision, machine learning, and artificial intelligence.

How to Get Started with the Model

You can replace the backbone for vision tasks with the proposed Vim: https://github.com/hustvl/Vim/blob/main/vim/models_mamba.py
Then you can load this checkpoint and start training.

Training Details

Vim is pretrained on ImageNet-1K with classification supervision. The training data is around 1.3M images from ImageNet-1K dataset. See more details in this paper.

Evaluation

Vim-tiny is evaluated on ImageNet-1K val set, and achieves 76.1% Top-1 Acc. By further finetuning at finer granularity, Vim-tiny achieves 78.3% Top-1 Acc. See more details in this paper.

Additional Information

Citation Information

 @article{vim,
  title={Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model},
  author={Lianghui Zhu and Bencheng Liao and Qian Zhang and Xinlong Wang and Wenyu Liu and Xinggang Wang},
  journal={arXiv preprint arXiv:2401.09417},
  year={2024}
}

@article{cho2024ptq4vm,
  title={PTQ4VM: Post-Training Quantization for Visual Mamba},
  author={Cho, Younghyun and Lee, Changhun and Kim, Seonggon and Park, Eunhyeok},
  journal={arXiv preprint arXiv:2412.20386},
  year={2024}
}