qengineering commited on
Commit ·
07bcdac
1
Parent(s): 711efa9
Initial release: InternVL3.5 RK3588 NPU port
Browse files- .gitattributes +2 -0
- README.md +92 -3
- internvl3_5-1b-instruct_w8a8_rk3588.rkllm +3 -0
- internvl3_5-1b_vision_rk3588.rknn +3 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
*.rknn filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
*.rkllm filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
|
@@ -1,3 +1,92 @@
|
|
| 1 |
-
---
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
tags:
|
| 3 |
+
- rk3588
|
| 4 |
+
- rockchip
|
| 5 |
+
- rknpu
|
| 6 |
+
- vlm
|
| 7 |
+
- vision-language-model
|
| 8 |
+
- internVL3.5
|
| 9 |
+
- edge-ai
|
| 10 |
+
- embedded
|
| 11 |
+
library_name: rkllm
|
| 12 |
+
pipeline_tag: image-text-to-text
|
| 13 |
+
inference: false
|
| 14 |
+
|
| 15 |
+
model_type: internVL3.5
|
| 16 |
+
architecture: Vision-Language Transformer
|
| 17 |
+
quantization: W8A8 (LLM), FP16 (Vision)
|
| 18 |
+
hardware: Rockchip RK3588 NPU
|
| 19 |
+
runtime: RKLLM + RKNN
|
| 20 |
+
---
|
| 21 |
+
|
| 22 |
+
# InternVL3.5-1B for RK3588 NPU
|
| 23 |
+
|
| 24 |
+
This repository provides a **hardware-accelerated port of InternVL3.5-1B**
|
| 25 |
+
optimized for **Rockchip RK3588 NPU**.
|
| 26 |
+
|
| 27 |
+
<br>**User**:\<image\>Describe the image.<br><br>
|
| 28 |
+
**Answer**: The image depicts an astronaut on the moon, holding a green bottle of beer and sitting next to a green cooler with some writing on it. The background shows Earth from space, highlighting the contrast between the moon's barren surface and the planet below.
|
| 29 |
+
|
| 30 |
+
|
| 31 |
+
------------
|
| 32 |
+
|
| 33 |
+
## Model Files
|
| 34 |
+
|
| 35 |
+
| Component | File | Precision |
|
| 36 |
+
|---------|------|-----------|
|
| 37 |
+
| LLM | `internvl3_5-1b-instruct_w8a8_rk3588.rkllm` | W8A8 |
|
| 38 |
+
| Vision Encoder | `internvl3_5-1b_vision_rk3588.rknn` | FP16 |
|
| 39 |
+
|
| 40 |
+
## Hardware Requirements
|
| 41 |
+
|
| 42 |
+
- Rockchip **RK3588 / RK3588S**
|
| 43 |
+
- RKNPU2 driver
|
| 44 |
+
- Tested on:
|
| 45 |
+
- Rock 5C
|
| 46 |
+
- Ubuntu 22.04 / 24.04 (Joshua Riek)
|
| 47 |
+
|
| 48 |
+
## Runtime Requirements
|
| 49 |
+
|
| 50 |
+
- RKLLM runtime
|
| 51 |
+
- RKNN runtime (rknpu2)
|
| 52 |
+
- OpenCV (for image preprocessing)
|
| 53 |
+
|
| 54 |
+
## Model performance benchmark (FPS)
|
| 55 |
+
|
| 56 |
+
All models, with C++ examples, can be found on the Q-engineering GitHub.<br><br>
|
| 57 |
+
All LLM models are quantized to **w8a8**, while the VLM vision encoders use **fp16**.<br>
|
| 58 |
+
|
| 59 |
+
| model | RAM (GB)<sup>1</sup> | llm cold sec<sup>2</sup> | llm warm sec<sup>3</sup> | vlm cold sec<sup>2</sup> | vlm warm sec<sup>3</sup> | Resolution | Tokens/s |
|
| 60 |
+
| --------------| :--: | :-----: | :-----: | :--------: | :-----: | :--------: | :--------: |
|
| 61 |
+
| [Qwen3-2B](https://github.com/Qengineering/Qwen3-VL-2B-NPU) | 3.1 | 21.9 | 2.6 | 10.0 | 0.9 | 448 x 448 | 11.5 |
|
| 62 |
+
| [Qwen3-4B](https://github.com/Qengineering/Qwen3-VL-4B-NPU) | 8.7 | 49.6 | 5.6 | 10.6 | 1.1 | 448 x 448 | 5.7 |
|
| 63 |
+
| [InternVL3.5-1B](https://github.com/Qengineering/InternVL3.5-1B-NPU) | 8.8 | 92 | 8.0 | 50.5 | 5.8 | 448 x 448 | 3.5 |
|
| 64 |
+
| [InternVL3.5-2B](https://github.com/Qengineering/InternVL3.5-2B-NPU) | 5.4 | 50 | 8.0 | 5.9 | 0.8 | 448 x 448 | 5 |
|
| 65 |
+
| [InternVL3.5-4B](https://github.com/Qengineering/InternVL3.5-4B-NPU) | 3.0 | 22 | 8.0 | 2.7 | 0.8 | 448 x 448 | 11.2 |
|
| 66 |
+
| [InternVL3.5-8B](https://github.com/Qengineering/InternVL3.5-8B-NPU) | 1.9 | 8.3 | 8.0 | 1.5 | 0.8 | 448 x 448 | 24 |
|
| 67 |
+
| [Qwen2.5-3B](https://github.com/Qengineering/Qwen2.5-VL-3B-NPU) | 4.8 | 48.3 | 4.0 | 17.9 | 1.8 | 392 x 392 | 7.0 |
|
| 68 |
+
| [Qwen2-7B](https://github.com/Qengineering/Qwen2-VL-7B-NPU) | 8.7 | 86.6 | 34.5 | 37.1 | 20.7 | 392 x 392 | 3.7 |
|
| 69 |
+
| [Qwen2-2.2B](https://github.com/Qengineering/Qwen2-VL-2B-NPU) | 3.3 | 29.1 | 2.5 | 17.1 | 1.7 | 392 x 392 | 12.5 |
|
| 70 |
+
| [InternVL3-1B](https://github.com/Qengineering/InternVL3-NPU) | 1.3 | 6.8 | 1.1 | 7.8 | 0.75 | 448 x 448 | 30 |
|
| 71 |
+
| [SmolVLM2-2.2B](https://github.com/Qengineering/SmolVLM2-2B-NPU) | 3.4 | 21.2 | 2.6 | 10.5 | 0.9 | 384 x 384 | 11 |
|
| 72 |
+
| [SmolVLM2-500M](https://github.com/Qengineering/SmolVLM2-500M-NPU) | 0.8 | 4.8 | 0.7 | 2.5 | 0.25 | 384 x 384 | 31 |
|
| 73 |
+
| [SmolVLM2-256M](https://github.com/Qengineering/SmolVLM2-256M-NPU) | 0.5 | 1.1 | 0.4 | 2.5 | 0.25 | 384 x 384 | 54 |
|
| 74 |
+
|
| 75 |
+
<sup>1</sup> The total used memory; LLM plus the VLM. <br>
|
| 76 |
+
<sup>2</sup> When an llm/vlm model is loaded for the first time from your disk to RAM or NPU, it is called a cold start.<br>
|
| 77 |
+
The duration depends on your OS, I/O transfer rate, and memory mapping.<br>
|
| 78 |
+
<sup>3</sup> Subsequent loading (warm start) takes advantage of the already mapped data in RAM. Mostly, only a few pointers need to be restored.<br><br>
|
| 79 |
+
<img width="600" height="450" alt="Plot_1" src="https://github.com/user-attachments/assets/2dde8d27-c8ae-474c-b845-4ed52bdc0785" /><br>
|
| 80 |
+
<img width="600" height="450" alt="Plot_2" src="https://github.com/user-attachments/assets/0cf946d5-5458-4166-bc2b-fa1592ae4d6b" />
|
| 81 |
+
|
| 82 |
+
|
| 83 |
+
## Example Usage
|
| 84 |
+
|
| 85 |
+
- see: https://github.com/Qengineering/InternVL3.5-1B-NPU
|
| 86 |
+
|
| 87 |
+
|
| 88 |
+
### Notes
|
| 89 |
+
|
| 90 |
+
- This is not a Transformers-compatible model
|
| 91 |
+
- This repository provides precompiled NPU binaries
|
| 92 |
+
- CPU fallback is not supported
|
internvl3_5-1b-instruct_w8a8_rk3588.rkllm
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b6aee283a9029751a714813c1ff64b5da0cad72bb17eb943321f53ddb6e2ac68
|
| 3 |
+
size 936057308
|
internvl3_5-1b_vision_rk3588.rknn
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c52be0b277c56096eb499975614f72ff497dc5e86250b26dd8b3b7ff232a7b93
|
| 3 |
+
size 650595251
|