Initial release: InternVL3.5 RK3588 NPU port

Files changed (4) hide show

.gitattributes +2 -0
README.md +92 -3
internvl3_5-1b-instruct_w8a8_rk3588.rkllm +3 -0
internvl3_5-1b_vision_rk3588.rknn +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+*.rknn filter=lfs diff=lfs merge=lfs -text
+*.rkllm filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,92 @@
----
-license: apache-2.0
----

+---
+tags:
+- rk3588
+- rockchip
+- rknpu
+- vlm
+- vision-language-model
+- internVL3.5
+- edge-ai
+- embedded
+library_name: rkllm
+pipeline_tag: image-text-to-text
+inference: false
+model_type: internVL3.5
+architecture: Vision-Language Transformer
+quantization: W8A8 (LLM), FP16 (Vision)
+hardware: Rockchip RK3588 NPU
+runtime: RKLLM + RKNN
+---
+# InternVL3.5-1B for RK3588 NPU
+This repository provides a **hardware-accelerated port of InternVL3.5-1B**
+optimized for **Rockchip RK3588 NPU**.
+![Alt text](https://github.com/user-attachments/assets/6d297a34-c516-4cb1-be4a-bca471d40fa6)<br>**User**:\<image\>Describe the image.<br><br>
+**Answer**: The image depicts an astronaut on the moon, holding a green bottle of beer and sitting next to a green cooler with some writing on it. The background shows Earth from space, highlighting the contrast between the moon's barren surface and the planet below.
+------------
+## Model Files
+| Component | File | Precision |
+|---------|------|-----------|
+| LLM | `internvl3_5-1b-instruct_w8a8_rk3588.rkllm` | W8A8 |
+| Vision Encoder | `internvl3_5-1b_vision_rk3588.rknn` | FP16 |
+## Hardware Requirements
+- Rockchip **RK3588 / RK3588S**
+- RKNPU2 driver
+- Tested on:
+  - Rock 5C
+  - Ubuntu 22.04 / 24.04 (Joshua Riek)
+## Runtime Requirements
+- RKLLM runtime
+- RKNN runtime (rknpu2)
+- OpenCV (for image preprocessing)
+## Model performance benchmark (FPS)
+All models, with C++ examples, can be found on the Q-engineering GitHub.<br><br>
+All LLM models are quantized to **w8a8**, while the VLM vision encoders use **fp16**.<br>
+| model         | RAM (GB)<sup>1</sup> | llm cold sec<sup>2</sup> | llm warm sec<sup>3</sup> | vlm cold sec<sup>2</sup> | vlm warm sec<sup>3</sup> | Resolution | Tokens/s |
+| --------------| :--: | :-----: | :-----: | :--------: | :-----: | :--------:  | :--------: |
+| [Qwen3-2B](https://github.com/Qengineering/Qwen3-VL-2B-NPU) | 3.1 | 21.9 | 2.6 | 10.0  | 0.9 | 448 x 448 | 11.5 |
+| [Qwen3-4B](https://github.com/Qengineering/Qwen3-VL-4B-NPU) | 8.7 | 49.6 | 5.6 | 10.6  | 1.1 | 448 x 448 | 5.7 |
+| [InternVL3.5-1B](https://github.com/Qengineering/InternVL3.5-1B-NPU) | 8.8 |  92 |   8.0 | 50.5    | 5.8 | 448 x 448 | 3.5 |
+| [InternVL3.5-2B](https://github.com/Qengineering/InternVL3.5-2B-NPU) | 5.4 |  50 |   8.0 | 5.9    | 0.8 | 448 x 448 | 5 |
+| [InternVL3.5-4B](https://github.com/Qengineering/InternVL3.5-4B-NPU) | 3.0 |  22 |   8.0 | 2.7    | 0.8 | 448 x 448 | 11.2 |
+| [InternVL3.5-8B](https://github.com/Qengineering/InternVL3.5-8B-NPU) | 1.9 |  8.3 |   8.0 | 1.5    | 0.8 | 448 x 448 | 24 |
+| [Qwen2.5-3B](https://github.com/Qengineering/Qwen2.5-VL-3B-NPU) | 4.8 | 48.3 |  4.0 | 17.9  | 1.8 | 392 x 392 | 7.0 |
+| [Qwen2-7B](https://github.com/Qengineering/Qwen2-VL-7B-NPU) | 8.7 | 86.6 |   34.5 | 37.1  | 20.7 | 392 x 392 | 3.7 |
+| [Qwen2-2.2B](https://github.com/Qengineering/Qwen2-VL-2B-NPU) | 3.3 | 29.1 |   2.5 | 17.1  | 1.7 | 392 x 392 | 12.5 |
+| [InternVL3-1B](https://github.com/Qengineering/InternVL3-NPU) | 1.3 |  6.8 |   1.1 | 7.8    | 0.75 | 448 x 448 | 30 |
+| [SmolVLM2-2.2B](https://github.com/Qengineering/SmolVLM2-2B-NPU) | 3.4 | 21.2 |   2.6 | 10.5   | 0.9  | 384 x 384 | 11 |
+| [SmolVLM2-500M](https://github.com/Qengineering/SmolVLM2-500M-NPU) | 0.8 |  4.8 |   0.7 | 2.5    | 0.25 | 384 x 384 | 31 |
+| [SmolVLM2-256M](https://github.com/Qengineering/SmolVLM2-256M-NPU) | 0.5 |  1.1 |   0.4 | 2.5    | 0.25 | 384 x 384 | 54 |
+<sup>1</sup> The total used memory; LLM plus the VLM. <br>
+<sup>2</sup> When an llm/vlm model is loaded for the first time from your disk to RAM or NPU, it is called a cold start.<br>
+The duration depends on your OS, I/O transfer rate, and memory mapping.<br>
+<sup>3</sup> Subsequent loading (warm start) takes advantage of the already mapped data in RAM. Mostly, only a few pointers need to be restored.<br><br>
+<img width="600" height="450" alt="Plot_1" src="https://github.com/user-attachments/assets/2dde8d27-c8ae-474c-b845-4ed52bdc0785" /><br>
+<img width="600" height="450" alt="Plot_2" src="https://github.com/user-attachments/assets/0cf946d5-5458-4166-bc2b-fa1592ae4d6b" />
+## Example Usage
+- see: https://github.com/Qengineering/InternVL3.5-1B-NPU
+### Notes
+- This is not a Transformers-compatible model
+- This repository provides precompiled NPU binaries
+- CPU fallback is not supported

internvl3_5-1b-instruct_w8a8_rk3588.rkllm ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b6aee283a9029751a714813c1ff64b5da0cad72bb17eb943321f53ddb6e2ac68
+size 936057308

internvl3_5-1b_vision_rk3588.rknn ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c52be0b277c56096eb499975614f72ff497dc5e86250b26dd8b3b7ff232a7b93
+size 650595251