|
|
---
|
|
|
license: bsd-3-clause
|
|
|
language:
|
|
|
- en
|
|
|
- zh
|
|
|
base_model:
|
|
|
- HuggingFaceTB/SmolVLM2-500M-Video-Instruct
|
|
|
pipeline_tag: visual-question-answering
|
|
|
tags:
|
|
|
- HuggingFaceTB
|
|
|
- SmolVLM2-500M-Video-Instruct
|
|
|
---
|
|
|
|
|
|
# SmolVLM2-500M-Video-Instruct-Int8
|
|
|
|
|
|
This version of SmolVLM2-500M-Video-Instruct has been converted to run on the Axera NPU using **w8a16** quantization.
|
|
|
|
|
|
Compatible with Pulsar2 version: 4.0
|
|
|
|
|
|
## Convert tools links:
|
|
|
|
|
|
For those who are interested in model conversion, you can try to export axmodel through the original repo:
|
|
|
- https://huggingface.co/HuggingFaceTB/SmolVLM2-500M-Video-Instruct
|
|
|
|
|
|
- [Github for SmolVLM2-500M-Video-Instruct.axera](https://github.com/AXERA-TECH/SmolVLM2-500M-Video-Instruct.axera)
|
|
|
|
|
|
- [Pulsar2 Link, How to Convert LLM from Huggingface to axmodel](https://pulsar2-docs.readthedocs.io/en/latest/appendix/build_llm.html)
|
|
|
|
|
|
## Support Platform
|
|
|
- AX650
|
|
|
- [M4N-Dock(η±θ―ζ΄ΎPro)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html)
|
|
|
|
|
|
<!-- ## TODO Model infer time -->
|
|
|
|
|
|
## How to use
|
|
|
|
|
|
Download all files from this repository to the device.
|
|
|
|
|
|
**Using AX650 Board**
|
|
|
|
|
|
```bash
|
|
|
ai@ai-bj ~/yongqiang/SmolVLM2-500M-Video-Instruct $ tree -L 1
|
|
|
.
|
|
|
βββ assets
|
|
|
βββ embeds
|
|
|
βββ infer_axmodel.py
|
|
|
βββ README.md
|
|
|
βββ smolvlm2_axmodel
|
|
|
βββ smolvlm2_tokenizer
|
|
|
βββ vit_mdoel
|
|
|
|
|
|
5 directories, 2 files
|
|
|
```
|
|
|
|
|
|
#### Inference with AX650 Host, such as M4N-Dock(η±θ―ζ΄ΎPro) or AX650N DEMO Board
|
|
|
|
|
|
**Multimodal Understanding**
|
|
|
|
|
|
input image
|
|
|
|
|
|

|
|
|
|
|
|
input text:
|
|
|
|
|
|
```
|
|
|
Can you describe this image?
|
|
|
```
|
|
|
|
|
|
log information:
|
|
|
|
|
|
```bash
|
|
|
ai@ai-bj ~/yongqiang/SmolVLM2-500M-Video-Instruct $ python3 infer_axmodel.py
|
|
|
|
|
|
input prompt: Can you describe this image?
|
|
|
|
|
|
answer >> The image depicts a close-up view of a pink flower with a bee on it. The bee, which appears to be a bumblebee, is perched on the flower's center, which is surrounded by a cluster of other flowers. The bee is in the process of collecting nectar from the flower, which is a common behavior for bees. The flower itself has a yellow center with a cluster of yellow stamens surrounding it. The petals of the flower are a vibrant shade of pink, and the bee is positioned very close to^@ the camera, making it the focal point of the image. The background of the image is slightly blurred, but it appears to be a garden or a field with other flowers and plants, contributing to the overall natural setting of the image.
|
|
|
```
|
|
|
|