File size: 6,389 Bytes

3be80c4
 
 
 
 
 
 
 
e39a938
3be80c4
 
e39a938
3be80c4
 
e39a938
3be80c4
e39a938
3be80c4
 
 
e39a938
3be80c4
 
 
 
 
 
e39a938
3be80c4
e39a938
3be80c4
e39a938
3be80c4
 
 
 
 
 
 
 
 
 
 
 
 
e39a938
 
 
3be80c4
 
 
f0f00da
3be80c4
 
 
 
 
 
e39a938
3be80c4
e39a938
3be80c4
e39a938
3be80c4
e39a938
3be80c4
 
e39a938
3be80c4
e39a938
 
3be80c4
e39a938
 
 
3be80c4
 
 
 
 
 
 
 
 
 
 
e39a938
 
 
 
 
 
 
3be80c4
 
e39a938
3be80c4
 
 
 
 
 
 
 
 
e39a938
3be80c4
e39a938
3be80c4
 
e39a938
 
 
3be80c4
e39a938
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3be80c4
e39a938
3be80c4
e39a938
 
 
 
 
 
 
 
3be80c4
e39a938
 
 
3be80c4
e39a938
3be80c4
e39a938
3be80c4
e39a938

---
library_name: transformers
license: bsd-3-clause
base_model:
- OpenGVLab/InternVL2_5-1B
tags:
- InternVL2_5
- InternVL2_5-1B
- InternVL2_5-1B-MPO
- Int8
- VLM
pipeline_tag: image-text-to-text
---

# InternVL2_5-1B-MPO

This version of InternVL2_5-1B-MPO has been converted to run on the Axera NPU using **w8a16** quantization.

This model has been optimized with the following LoRA: 

Compatible with Pulsar2 version: 4.1

## Convert tools links:

For those who are interested in model conversion, you can try to export axmodel through the original repo : 
https://huggingface.co/OpenGVLab/InternVL2_5-1B-MPO

[How to Convert LLM from Huggingface to axmodel](https://github.com/AXERA-TECH/InternVL2_5-1B-MPO.axera/tree/master/model_convert) 

[AXera NPU HOST LLM Runtime](https://github.com/AXERA-TECH/ax-llm/tree/ax-internvl) 

[AXera NPU AXCL LLM Runtime](https://github.com/AXERA-TECH/ax-llm/tree/axcl-internvl)

## Support Platform

- AX650
  - AX650N DEMO Board
  - [M4N-Dock(爱芯派Pro)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html)
  - [M.2 Accelerator card](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html)
 
|Chips|image encoder 448|ttft|w8a16|
|--|--|--|--|
|AX650| 350 ms | 420 ms |32 tokens/sec|

- AX630C
  - [爱芯派2](https://axera-pi-2-docs-cn.readthedocs.io/zh-cn/latest/index.html)
  - [Module-LLM](https://docs.m5stack.com/zh_CN/module/Module-LLM)
  - [LLM630 Compute Kit](https://docs.m5stack.com/zh_CN/core/LLM630%20Compute%20Kit)
 
|Chips|image encoder 364|ttft|w8a16|
|--|--|--|--|
|AX630C| 1200 ms | 1123 ms |10 tokens/sec|

## How to use

Download all files from this repository to the device

```
root@ax650:/mnt/qtang/llm-test/internvl2_5-1b-mpo# tree -L 1
.
|-- README.md
|-- config.json
|-- image1.jpg
|-- internvl2_5_1b_364_ax630c
|-- internvl2_5_1b_448_ax650
|-- internvl2_5_tokenizer
|-- internvl2_5_tokenizer_364.py
|-- internvl2_5_tokenizer_448.py
|-- main
|-- main_ax650
|-- post_config.json
|-- run_internvl2_5_364_ax630c.sh
`-- run_internvl2_5_448_ax650.sh

3 directories, 10 files
```

#### Install transformer

```
pip install transformers==4.41.1
```

#### Start the Tokenizer service

```
root@ax650:/mnt/qtang/llm-test/internvl2_5-1b-mpo# python3 internvl2_5_tokenizer_448.py
None None 151645 <|im_end|> 151665 151667
context_len is  256
prompt is <|im_start|>system
你是书生·万象, 英文名是InternVL, 是由上海人工智能实验室、清华大学及多家合作单位联合开发的多模态大语言模型.<|im_end|>
.......
http://0.0.0.0:12345
```

#### Inference with AX650 Host, such as M4N-Dock(爱芯派Pro) or AX650 DEMO Board

- input text

```
Describe the picture
```

- input image

![](./image1.jpg)

Open another terminal and run `./run_internvl2_5_448_ax650.sh`

```
root@ax650:/mnt/qtang/llm-test/internvl2_5-1b-mpo# ./run_internvl2_5_448_ax650.sh
[I][                            Init][ 134]: LLM init start
[I][                            Init][  34]: connect http://0.0.0.0:12345 ok
bos_id: -1, eos_id: 151645
img_start_token: 151665
img_context_token: 151667
  3% | ██                                |   1 /  27 [0.01s<0.30s, 90.91 count/s] tokenizer init ok
[I][                            Init][  45]: LLaMaEmbedSelector use mmap
  7% | ███                               |   2 /  27 [0.01s<0.19s, 142.86 count/s] embed_selector init ok
100% | ████████████████████████████████ |  27 /  27 [4.31s<4.31s, 6.26 count/s] init post axmodel ok,remain_cmm(3881 MB)
[I][                            Init][ 226]: IMAGE_CONTEXT_TOKEN: 151667, IMAGE_START_TOKEN: 151665
[I][                            Init][ 251]: image encoder input nchw@float32
[I][                            Init][ 281]: image encoder output float32

[I][                            Init][ 291]: image_encoder_height : 448, image_encoder_width: 448
[I][                            Init][ 293]: max_token_len : 2559
[I][                            Init][ 296]: kv_cache_size : 128, kv_cache_num: 2559
[I][                            Init][ 304]: prefill_token_num : 128
[I][                            Init][ 308]: grp: 1, prefill_max_token_num : 1
[I][                            Init][ 308]: grp: 2, prefill_max_token_num : 128
[I][                            Init][ 308]: grp: 3, prefill_max_token_num : 256
[I][                            Init][ 308]: grp: 4, prefill_max_token_num : 384
[I][                            Init][ 308]: grp: 5, prefill_max_token_num : 512
[I][                            Init][ 308]: grp: 6, prefill_max_token_num : 640
[I][                            Init][ 308]: grp: 7, prefill_max_token_num : 768
[I][                            Init][ 308]: grp: 8, prefill_max_token_num : 896
[I][                            Init][ 308]: grp: 9, prefill_max_token_num : 1024
[I][                            Init][ 312]: prefill_max_token_num : 1024
[I][                     load_config][ 282]: load config:
{
    "enable_repetition_penalty": false,
    "enable_temperature": true,
    "enable_top_k_sampling": true,
    "enable_top_p_sampling": false,
    "penalty_window": 20,
    "repetition_penalty": 1.2,
    "temperature": 0.9,
    "top_k": 10,
    "top_p": 0.8
}

[I][                            Init][ 321]: LLM init ok
Type "q" to exit, Ctrl+c to stop current running
prompt >> Describe the picture
image >> image1.jpg
[I][                          Encode][ 415]: image encode time : 395.42 ms, size : 229376
[I][                          Encode][ 524]: idx:0 offset : 48 out_embed.size() : 277760
[I][                             Run][ 551]: input token num : 310, prefill_split_num : 3
[I][                             Run][ 566]: prefill grpid 4
[I][                             Run][ 593]: input_num_token:128
[I][                             Run][ 593]: input_num_token:128
[I][                             Run][ 593]: input_num_token:54
[I][                             Run][ 717]: ttft: 625.86 ms

: The image features a red panda sitting in a tree with a blurred green background indicating foliage.
The red panda has a distinctive reddish-brown head and back, white underparts, and black patches around its eyes,
nose, and mouth. It appears to be resting or lounging comfortably on a wooden platform.

[N][                             Run][ 826]: hit eos,avg 27.37 token/s

prompt >> q

```