qengineering commited on
Commit
07bcdac
·
1 Parent(s): 711efa9

Initial release: InternVL3.5 RK3588 NPU port

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.rknn filter=lfs diff=lfs merge=lfs -text
37
+ *.rkllm filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,92 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - rk3588
4
+ - rockchip
5
+ - rknpu
6
+ - vlm
7
+ - vision-language-model
8
+ - internVL3.5
9
+ - edge-ai
10
+ - embedded
11
+ library_name: rkllm
12
+ pipeline_tag: image-text-to-text
13
+ inference: false
14
+
15
+ model_type: internVL3.5
16
+ architecture: Vision-Language Transformer
17
+ quantization: W8A8 (LLM), FP16 (Vision)
18
+ hardware: Rockchip RK3588 NPU
19
+ runtime: RKLLM + RKNN
20
+ ---
21
+
22
+ # InternVL3.5-1B for RK3588 NPU
23
+
24
+ This repository provides a **hardware-accelerated port of InternVL3.5-1B**
25
+ optimized for **Rockchip RK3588 NPU**.
26
+
27
+ ![Alt text](https://github.com/user-attachments/assets/6d297a34-c516-4cb1-be4a-bca471d40fa6)<br>**User**:\<image\>Describe the image.<br><br>
28
+ **Answer**: The image depicts an astronaut on the moon, holding a green bottle of beer and sitting next to a green cooler with some writing on it. The background shows Earth from space, highlighting the contrast between the moon's barren surface and the planet below.
29
+
30
+
31
+ ------------
32
+
33
+ ## Model Files
34
+
35
+ | Component | File | Precision |
36
+ |---------|------|-----------|
37
+ | LLM | `internvl3_5-1b-instruct_w8a8_rk3588.rkllm` | W8A8 |
38
+ | Vision Encoder | `internvl3_5-1b_vision_rk3588.rknn` | FP16 |
39
+
40
+ ## Hardware Requirements
41
+
42
+ - Rockchip **RK3588 / RK3588S**
43
+ - RKNPU2 driver
44
+ - Tested on:
45
+ - Rock 5C
46
+ - Ubuntu 22.04 / 24.04 (Joshua Riek)
47
+
48
+ ## Runtime Requirements
49
+
50
+ - RKLLM runtime
51
+ - RKNN runtime (rknpu2)
52
+ - OpenCV (for image preprocessing)
53
+
54
+ ## Model performance benchmark (FPS)
55
+
56
+ All models, with C++ examples, can be found on the Q-engineering GitHub.<br><br>
57
+ All LLM models are quantized to **w8a8**, while the VLM vision encoders use **fp16**.<br>
58
+
59
+ | model | RAM (GB)<sup>1</sup> | llm cold sec<sup>2</sup> | llm warm sec<sup>3</sup> | vlm cold sec<sup>2</sup> | vlm warm sec<sup>3</sup> | Resolution | Tokens/s |
60
+ | --------------| :--: | :-----: | :-----: | :--------: | :-----: | :--------: | :--------: |
61
+ | [Qwen3-2B](https://github.com/Qengineering/Qwen3-VL-2B-NPU) | 3.1 | 21.9 | 2.6 | 10.0 | 0.9 | 448 x 448 | 11.5 |
62
+ | [Qwen3-4B](https://github.com/Qengineering/Qwen3-VL-4B-NPU) | 8.7 | 49.6 | 5.6 | 10.6 | 1.1 | 448 x 448 | 5.7 |
63
+ | [InternVL3.5-1B](https://github.com/Qengineering/InternVL3.5-1B-NPU) | 8.8 | 92 | 8.0 | 50.5 | 5.8 | 448 x 448 | 3.5 |
64
+ | [InternVL3.5-2B](https://github.com/Qengineering/InternVL3.5-2B-NPU) | 5.4 | 50 | 8.0 | 5.9 | 0.8 | 448 x 448 | 5 |
65
+ | [InternVL3.5-4B](https://github.com/Qengineering/InternVL3.5-4B-NPU) | 3.0 | 22 | 8.0 | 2.7 | 0.8 | 448 x 448 | 11.2 |
66
+ | [InternVL3.5-8B](https://github.com/Qengineering/InternVL3.5-8B-NPU) | 1.9 | 8.3 | 8.0 | 1.5 | 0.8 | 448 x 448 | 24 |
67
+ | [Qwen2.5-3B](https://github.com/Qengineering/Qwen2.5-VL-3B-NPU) | 4.8 | 48.3 | 4.0 | 17.9 | 1.8 | 392 x 392 | 7.0 |
68
+ | [Qwen2-7B](https://github.com/Qengineering/Qwen2-VL-7B-NPU) | 8.7 | 86.6 | 34.5 | 37.1 | 20.7 | 392 x 392 | 3.7 |
69
+ | [Qwen2-2.2B](https://github.com/Qengineering/Qwen2-VL-2B-NPU) | 3.3 | 29.1 | 2.5 | 17.1 | 1.7 | 392 x 392 | 12.5 |
70
+ | [InternVL3-1B](https://github.com/Qengineering/InternVL3-NPU) | 1.3 | 6.8 | 1.1 | 7.8 | 0.75 | 448 x 448 | 30 |
71
+ | [SmolVLM2-2.2B](https://github.com/Qengineering/SmolVLM2-2B-NPU) | 3.4 | 21.2 | 2.6 | 10.5 | 0.9 | 384 x 384 | 11 |
72
+ | [SmolVLM2-500M](https://github.com/Qengineering/SmolVLM2-500M-NPU) | 0.8 | 4.8 | 0.7 | 2.5 | 0.25 | 384 x 384 | 31 |
73
+ | [SmolVLM2-256M](https://github.com/Qengineering/SmolVLM2-256M-NPU) | 0.5 | 1.1 | 0.4 | 2.5 | 0.25 | 384 x 384 | 54 |
74
+
75
+ <sup>1</sup> The total used memory; LLM plus the VLM. <br>
76
+ <sup>2</sup> When an llm/vlm model is loaded for the first time from your disk to RAM or NPU, it is called a cold start.<br>
77
+ The duration depends on your OS, I/O transfer rate, and memory mapping.<br>
78
+ <sup>3</sup> Subsequent loading (warm start) takes advantage of the already mapped data in RAM. Mostly, only a few pointers need to be restored.<br><br>
79
+ <img width="600" height="450" alt="Plot_1" src="https://github.com/user-attachments/assets/2dde8d27-c8ae-474c-b845-4ed52bdc0785" /><br>
80
+ <img width="600" height="450" alt="Plot_2" src="https://github.com/user-attachments/assets/0cf946d5-5458-4166-bc2b-fa1592ae4d6b" />
81
+
82
+
83
+ ## Example Usage
84
+
85
+ - see: https://github.com/Qengineering/InternVL3.5-1B-NPU
86
+
87
+
88
+ ### Notes
89
+
90
+ - This is not a Transformers-compatible model
91
+ - This repository provides precompiled NPU binaries
92
+ - CPU fallback is not supported
internvl3_5-1b-instruct_w8a8_rk3588.rkllm ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b6aee283a9029751a714813c1ff64b5da0cad72bb17eb943321f53ddb6e2ac68
3
+ size 936057308
internvl3_5-1b_vision_rk3588.rknn ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c52be0b277c56096eb499975614f72ff497dc5e86250b26dd8b3b7ff232a7b93
3
+ size 650595251