| | --- |
| | tags: |
| | - rk3588 |
| | - rockchip |
| | - rknpu |
| | - vlm |
| | - vision-language-model |
| | - internVL3.5 |
| | - edge-ai |
| | - embedded |
| | library_name: rkllm |
| | pipeline_tag: image-text-to-text |
| | inference: false |
| |
|
| | model_type: internVL3.5 |
| | architecture: Vision-Language Transformer |
| | quantization: W8A8 (LLM), FP16 (Vision) |
| | hardware: Rockchip RK3588 NPU |
| | runtime: RKLLM + RKNN |
| | --- |
| | |
| | # InternVL3.5-4B for RK3588 NPU |
| |
|
| | This repository provides a **hardware-accelerated port of InternVL3.5-4B** |
| | optimized for **Rockchip RK3588 NPU**. |
| |
|
| | <br>**User**:\<image\>Describe the image.<br><br> |
| | **Answer**: The image depicts an astronaut relaxing on the moon, holding a beer bottle and sitting next to a cooler. The background shows Earth in space with stars visible above. |
| |
|
| |
|
| | ------------ |
| |
|
| | ## Model Files |
| |
|
| | | Component | File | Precision | |
| | |---------|------|-----------| |
| | | LLM | `internvl3_5-4b-instruct_w8a8_rk3588.rkllm` | W8A8 | |
| | | Vision Encoder | `internvl3_5-4b_vision_rk3588.rknn` | FP16 | |
| |
|
| | ## Hardware Requirements |
| |
|
| | - Rockchip **RK3588 / RK3588S** |
| | - RKNPU2 driver |
| | - Tested on: |
| | - Rock 5C |
| | - Ubuntu 22.04 / 24.04 (Joshua Riek) |
| |
|
| | ## Runtime Requirements |
| |
|
| | - RKLLM runtime |
| | - RKNN runtime (rknpu2) |
| | - OpenCV (for image preprocessing) |
| |
|
| | ## Model performance benchmark (FPS) |
| |
|
| | All models, with C++ examples, can be found on the Q-engineering GitHub.<br><br> |
| | All LLM models are quantized to **w8a8**, while the VLM vision encoders use **fp16**.<br> |
| |
|
| | | model | RAM (GB)<sup>1</sup> | llm cold sec<sup>2</sup> | llm warm sec<sup>3</sup> | vlm cold sec<sup>2</sup> | vlm warm sec<sup>3</sup> | Resolution | Tokens/s | |
| | | --------------| :--: | :-----: | :-----: | :--------: | :-----: | :--------: | :--------: | |
| | | [Qwen3-2B](https://github.com/Qengineering/Qwen3-VL-2B-NPU) | 3.1 | 21.9 | 2.6 | 10.0 | 0.9 | 448 x 448 | 11.5 | |
| | | [Qwen3-4B](https://github.com/Qengineering/Qwen3-VL-4B-NPU) | 8.7 | 49.6 | 5.6 | 10.6 | 1.1 | 448 x 448 | 5.7 | |
| | | [InternVL3.5-1B](https://github.com/Qengineering/InternVL3.5-1B-NPU) | 1.9 | 8.3 | 8.0 | 1.5 | 0.8 | 448 x 448 | 24 | |
| | | [InternVL3.5-2B](https://github.com/Qengineering/InternVL3.5-2B-NPU) | 3.0 | 22 | 8.0 | 2.7 | 0.8 | 448 x 448 | 11.2 | |
| | | [InternVL3.5-4B](https://github.com/Qengineering/InternVL3.5-4B-NPU) | 5.4 | 50 | 8.0 | 5.9 | 0.8 | 448 x 448 | 5 | |
| | | [InternVL3.5-8B](https://github.com/Qengineering/InternVL3.5-8B-NPU) | 8.8 | 92 | 8.0 | 50.5 | 5.8 | 448 x 448 | 3.5 | |
| | | [Qwen2.5-3B](https://github.com/Qengineering/Qwen2.5-VL-3B-NPU) | 4.8 | 48.3 | 4.0 | 17.9 | 1.8 | 392 x 392 | 7.0 | |
| | | [Qwen2-7B](https://github.com/Qengineering/Qwen2-VL-7B-NPU) | 8.7 | 86.6 | 34.5 | 37.1 | 20.7 | 392 x 392 | 3.7 | |
| | | [Qwen2-2.2B](https://github.com/Qengineering/Qwen2-VL-2B-NPU) | 3.3 | 29.1 | 2.5 | 17.1 | 1.7 | 392 x 392 | 12.5 | |
| | | [InternVL3-1B](https://github.com/Qengineering/InternVL3-NPU) | 1.3 | 6.8 | 1.1 | 7.8 | 0.75 | 448 x 448 | 30 | |
| | | [SmolVLM2-2.2B](https://github.com/Qengineering/SmolVLM2-2B-NPU) | 3.4 | 21.2 | 2.6 | 10.5 | 0.9 | 384 x 384 | 11 | |
| | | [SmolVLM2-500M](https://github.com/Qengineering/SmolVLM2-500M-NPU) | 0.8 | 4.8 | 0.7 | 2.5 | 0.25 | 384 x 384 | 31 | |
| | | [SmolVLM2-256M](https://github.com/Qengineering/SmolVLM2-256M-NPU) | 0.5 | 1.1 | 0.4 | 2.5 | 0.25 | 384 x 384 | 54 | |
| |
|
| | <sup>1</sup> The total used memory; LLM plus the VLM. <br> |
| | <sup>2</sup> When an llm/vlm model is loaded for the first time from your disk to RAM or NPU, it is called a cold start.<br> |
| | The duration depends on your OS, I/O transfer rate, and memory mapping.<br> |
| | <sup>3</sup> Subsequent loading (warm start) takes advantage of the already mapped data in RAM. Mostly, only a few pointers need to be restored.<br><br> |
| | <img width="600" height="450" alt="Plot_1" src="https://github.com/user-attachments/assets/2dde8d27-c8ae-474c-b845-4ed52bdc0785" /><br> |
| | <img width="600" height="450" alt="Plot_2" src="https://github.com/user-attachments/assets/0cf946d5-5458-4166-bc2b-fa1592ae4d6b" /> |
| |
|
| |
|
| | ## Example Usage |
| |
|
| | - see: https://github.com/Qengineering/InternVL3.5-4B-NPU |
| |
|
| |
|
| | ### Notes |
| |
|
| | - This is not a Transformers-compatible model |
| | - This repository provides precompiled NPU binaries |
| | - CPU fallback is not supported |