Instructions to use AXERA-TECH/InternVL3_5-2B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AXERA-TECH/InternVL3_5-2B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="AXERA-TECH/InternVL3_5-2B")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("AXERA-TECH/InternVL3_5-2B", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use AXERA-TECH/InternVL3_5-2B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AXERA-TECH/InternVL3_5-2B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AXERA-TECH/InternVL3_5-2B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/AXERA-TECH/InternVL3_5-2B
- SGLang
How to use AXERA-TECH/InternVL3_5-2B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AXERA-TECH/InternVL3_5-2B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AXERA-TECH/InternVL3_5-2B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AXERA-TECH/InternVL3_5-2B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AXERA-TECH/InternVL3_5-2B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use AXERA-TECH/InternVL3_5-2B with Docker Model Runner:
docker model run hf.co/AXERA-TECH/InternVL3_5-2B
InternVL3_5-2B
This version of InternVL3_5-2B has been converted to run on the Axera NPU using w8a16 quantization.
This model has been optimized with the following LoRA:
Compatible with Pulsar2 version: 5.1-patch1.
Please note that the context of the model is 2k and the maximum prefill length is 1k.
Convert tools links:
For those who are interested in model conversion, you can try to export axmodel through the original repo:
https://huggingface.co/OpenGVLab/InternVL3_5-2B
How to Convert LLM from Huggingface to axmodel
Support Platform
- AX650
- AX650N DEMO Board
- M4N-Dock(爱芯派Pro)
- M.2 Accelerator card
| Chips | image encoder 448 | ttft | w8a16 |
|---|---|---|---|
| AX650 | 364.412 ms | 5844 ms | 9.52 tokens/sec |
How to use
Download all files from this repository to the device
$ tree -L 1
.
├── assets
├── config.json
├── examples
├── gradio_demo.py
├── infer_axmodel.py
├── infer_torch.py
├── internvl3-5_axmodel
├── internvl3-5_tokenizer
├── README.md
├── utils
└── vit-models
6 directories, 5 files
Install transformer
pip install transformers==4.57.1
Inference with AX650 Host, such as M4N-Dock(爱芯派Pro) or AX650 DEMO Board
Interactive conversations using the Gradio API:
$ python3 gradio_demo.py --hf_model internvl3-5_tokenizer/ --axmodel_path internvl3-5_axmodel/ --vit_model vit-models/internvl_vit_model_1x3x448x448.axmodel
Plain text dialogue:
Image understanding:
Run the following command on the Axera board to start a chat conversation:
$ python3 infer_axmodel.py --hf_model internvl3-5_tokenizer/ --axmodel_path internvl3-5_axmodel/ --question "请计算函数[y=2x^2+2]的导数, 并提供 markdown 格式的推理过程"
output:
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.1-dirty 0fdbfe15-dirty
Model loaded successfully!
slice_indices: [0]
Slice prefill done: 0
answer >> 函数 \( y = 2x^2 + 2 \) 的导数可以通过求导法则来计算。首先,我们对函数中的每一项分别求导:
1. 对于 \( 2x^2 \),使用幂法则求导:
\[
\frac{d}{dx}(2x^2) = 2 \cdot 2x = 4x
\]
2. 对于常数项 \( 2 \),其导数为 0,因为常数的导数为 0。
将这两部分的结果相加,得到函数 \( y \) 的导数:
\[
y' = 4x
\]
因此,函数 \( y = 2x^2 + 2 \) 的导数为 \( y' = 4x \)。
Enter the following command to perform the single-image understanding task:
$ python3 infer_axmodel.py --hf_model internvl3-5_tokenizer/ --axmodel_path internvl3-5_axmodel/ --question "请描述这幅图" -i examples/image_0.jpg --vit_model vit-models/internvl_vit_model_1x3x448x448.axmodel
output:
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.1-dirty 0fdbfe15-dirty
Model loaded successfully!
slice_indices: [0, 1, 2]
Slice prefill done: 0
Slice prefill done: 1
Slice prefill done: 2
answer >> 这是一张红熊猫的照片。红熊猫是一种红棕色的哺乳动物,通常生活在亚洲的森林中。它们以捕食昆虫和小型无脊椎动物为生。图片中,红熊猫正坐在一个木制的平台上,背景是绿色的树木和植被,显得非常自然和生动。红熊猫的表情看起来很友好,似乎在观察或等待什么。
- Downloads last month
- 1
Model tree for AXERA-TECH/InternVL3_5-2B
Base model
OpenGVLab/InternVL3_5-2B-Pretrained

