qengineering commited on
Commit
43106d7
·
1 Parent(s): aa3b089

Add SmolVLM2-256M RK3588 NPU binaries

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.rkllm filter=lfs diff=lfs merge=lfs -text
37
+ *.rknn filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,83 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - rk3588
4
+ - rockchip
5
+ - rknpu
6
+ - vision-language-model
7
+ - vlm
8
+ - smolvlm
9
+ - edge-ai
10
+ library_name: rkllm
11
+ ---
12
+
13
+ # SmolVLM2-256M for RK3588 NPU
14
+
15
+ This repository provides a **hardware-accelerated port of SmolVLM2-256M**
16
+ optimized for **Rockchip RK3588 NPU**.
17
+
18
+ ![Alt text](https://github.com/user-attachments/assets/6d297a34-c516-4cb1-be4a-bca471d40fa6)<br>**User**:\<image\>Describe the image.<br><br>
19
+ **Answer**: The image depicts a scene from space, specifically looking at the moon's surface. The moon is in the process of being tidied up and has been cleaned to remove any debris or stains. There are two large white objects on the right side of the image, which could be astronauts or other crew members. They appear to be working together to clean the area around the moon's surface.
20
+
21
+ The background shows a view of Earth from space, showing the planet with its atmosphere and oceans. The sky is dark, indicating that it might be either early morning or late afternoon. There are no visible clouds in the sky, which suggests that the sun is not at its peak position. The image also includes some text, but it's unclear what it says.
22
+
23
+ Overall, this image captures a moment of cleanliness and organization on the moon, with both humans and astronauts working together to clean up the area around the moon's surface.
24
+
25
+
26
+ ------------
27
+
28
+ ## Model Files
29
+
30
+ | Component | File | Precision |
31
+ |---------|------|-----------|
32
+ | LLM | `smolvlm2-256m-instruct_w8a8_rk3588.rkllm` | W8A8 |
33
+ | Vision Encoder | `smolvlm2_256m_vision_fp16_rk3588.rknn` | FP16 |
34
+
35
+ ## Hardware Requirements
36
+
37
+ - Rockchip **RK3588 / RK3588S**
38
+ - RKNPU2 driver
39
+ - Tested on:
40
+ - Rock 5C
41
+ - Ubuntu 22.04 / 24.04 (Joshua Riek)
42
+
43
+ ## Runtime Requirements
44
+
45
+ - RKLLM runtime
46
+ - RKNN runtime (rknpu2)
47
+ - OpenCV (for image preprocessing)
48
+
49
+ ## Model performance benchmark (FPS)
50
+
51
+ All models, with C++ examples, can be found on the Q-engineering GitHub.<br><br>
52
+ All LLM models are quantized to **w8a8**, while the VLM vision encoders use **fp16**.<br>
53
+
54
+ | model | RAM (GB)<sup>1</sup> | llm cold sec<sup>2</sup> | llm warm sec<sup>3</sup> | vlm cold sec<sup>2</sup> | vlm warm sec<sup>3</sup> | Resolution | Tokens/s |
55
+ | --------------| :--: | :-----: | :-----: | :--------: | :-----: | :--------: | :--------: |
56
+ | [Qwen3-2B](https://github.com/Qengineering/Qwen3-VL-2B-NPU) | 3.1 | 21.9 | 2.6 | 10.0 | 0.9 | 448 x 448 | 11.5 |
57
+ | [Qwen3-4B](https://github.com/Qengineering/Qwen3-VL-4B-NPU) | 8.7 | 49.6 | 5.6 | 10.6 | 1.1 | 448 x 448 | 5.7 |
58
+ | [Qwen2.5-3B](https://github.com/Qengineering/Qwen2.5-VL-3B-NPU) | 4.8 | 48.3 | 4.0 | 17.9 | 1.8 | 392 x 392 | 7.0 |
59
+ | [Qwen2-7B](https://github.com/Qengineering/Qwen2-VL-7B-NPU) | 8.7 | 86.6 | 34.5 | 37.1 | 20.7 | 392 x 392 | 3.7 |
60
+ | [Qwen2-2.2B](https://github.com/Qengineering/Qwen2-VL-2B-NPU) | 3.3 | 29.1 | 2.5 | 17.1 | 1.7 | 392 x 392 | 12.5 |
61
+ | [InternVL3-1B](https://github.com/Qengineering/InternVL3-NPU) | 1.3 | 6.8 | 1.1 | 7.8 | 0.75 | 448 x 448 | 30 |
62
+ | [SmolVLM2-2.2B](https://github.com/Qengineering/SmolVLM2-2B-NPU) | 3.4 | 21.2 | 2.6 | 10.5 | 0.9 | 384 x 384 | 11 |
63
+ | [SmolVLM2-500M](https://github.com/Qengineering/SmolVLM2-500M-NPU) | 0.8 | 4.8 | 0.7 | 2.5 | 0.25 | 384 x 384 | 31 |
64
+ | [SmolVLM2-256M](https://github.com/Qengineering/SmolVLM2-256M-NPU) | 0.5 | 1.1 | 0.4 | 2.5 | 0.25 | 384 x 384 | 54 |
65
+
66
+ <sup>1</sup> The total used memory; LLM plus the VLM. <br>
67
+ <sup>2</sup> When an llm/vlm model is loaded for the first time from your disk to RAM or NPU, it is called a cold start.<br>
68
+ The duration depends on your OS, I/O transfer rate, and memory mapping.<br>
69
+ <sup>3</sup> Subsequent loading (warm start) takes advantage of the already mapped data in RAM. Mostly, only a few pointers need to be restored.<br><br>
70
+ <img width="600" height="450" alt="Plot_1" src="https://github.com/user-attachments/assets/6dbb7d8a-c55b-49b4-ae5a-ac272aade60b" /><br>
71
+ <img width="600" height="450" alt="Plot_2" src="https://github.com/user-attachments/assets/c4dd409b-2533-4ae0-942c-58fb3e8f5834" />
72
+
73
+
74
+ ## Example Usage
75
+
76
+ - see: https://github.com/Qengineering/SmolVLM2-256M-NPU
77
+
78
+
79
+ ### Notes
80
+
81
+ - This is not a Transformers-compatible model
82
+ - This repository provides precompiled NPU binaries
83
+ - CPU fallback is not supported
smolvlm2-256m-instruct_w8a8_rk3588.rkllm ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a61aa930033ad3934e6b0a445fb12e87219a7ae7aa79458fc8f367a4ea2e8858
3
+ size 216490684
smolvlm2_256m_vision_fp16_rk3588.rknn ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:979aaccdd12466c5252f0cf0f5b49d1ca386011ea5ac4690780a818c5a719edb
3
+ size 197015354