Update README.md

ebd1961 verified 22 days ago

4.24 kB

	---
	tags:
	- rk3588
	- rockchip
	- rknpu
	- vlm
	- vision-language-model
	- internVL3.5
	- edge-ai
	- embedded
	library_name: rkllm
	pipeline_tag: image-text-to-text
	inference: false

	model_type: internVL3.5
	architecture: Vision-Language Transformer
	quantization: W8A8 (LLM), FP16 (Vision)
	hardware: Rockchip RK3588 NPU
	runtime: RKLLM + RKNN
	---

	# InternVL3.5-4B for RK3588 NPU

	This repository provides a hardware-accelerated port of InternVL3.5-4B
	optimized for Rockchip RK3588 NPU.

	![Alt text](https://github.com/user-attachments/assets/6d297a34-c516-4cb1-be4a-bca471d40fa6)<br>User:\<image\>Describe the image.<br><br>
	Answer: The image depicts an astronaut relaxing on the moon, holding a beer bottle and sitting next to a cooler. The background shows Earth in space with stars visible above.


	------------

	## Model Files

	\| Component \| File \| Precision \|
	\|---------\|------\|-----------\|
	\| LLM \| `internvl3_5-4b-instruct_w8a8_rk3588.rkllm` \| W8A8 \|
	\| Vision Encoder \| `internvl3_5-4b_vision_rk3588.rknn` \| FP16 \|

	## Hardware Requirements

	- Rockchip RK3588 / RK3588S
	- RKNPU2 driver
	- Tested on:
	- Rock 5C
	- Ubuntu 22.04 / 24.04 (Joshua Riek)

	## Runtime Requirements

	- RKLLM runtime
	- RKNN runtime (rknpu2)
	- OpenCV (for image preprocessing)

	## Model performance benchmark (FPS)

	All models, with C++ examples, can be found on the Q-engineering GitHub.<br><br>
	All LLM models are quantized to w8a8, while the VLM vision encoders use fp16.<br>

	\| model \| RAM (GB)<sup>1</sup> \| llm cold sec<sup>2</sup> \| llm warm sec<sup>3</sup> \| vlm cold sec<sup>2</sup> \| vlm warm sec<sup>3</sup> \| Resolution \| Tokens/s \|
	\| --------------\| :--: \| :-----: \| :-----: \| :--------: \| :-----: \| :--------: \| :--------: \|
	\| [Qwen3-2B](https://github.com/Qengineering/Qwen3-VL-2B-NPU) \| 3.1 \| 21.9 \| 2.6 \| 10.0 \| 0.9 \| 448 x 448 \| 11.5 \|
	\| [Qwen3-4B](https://github.com/Qengineering/Qwen3-VL-4B-NPU) \| 8.7 \| 49.6 \| 5.6 \| 10.6 \| 1.1 \| 448 x 448 \| 5.7 \|
	\| [InternVL3.5-1B](https://github.com/Qengineering/InternVL3.5-1B-NPU) \| 1.9 \| 8.3 \| 8.0 \| 1.5 \| 0.8 \| 448 x 448 \| 24 \|
	\| [InternVL3.5-2B](https://github.com/Qengineering/InternVL3.5-2B-NPU) \| 3.0 \| 22 \| 8.0 \| 2.7 \| 0.8 \| 448 x 448 \| 11.2 \|
	\| [InternVL3.5-4B](https://github.com/Qengineering/InternVL3.5-4B-NPU) \| 5.4 \| 50 \| 8.0 \| 5.9 \| 0.8 \| 448 x 448 \| 5 \|
	\| [InternVL3.5-8B](https://github.com/Qengineering/InternVL3.5-8B-NPU) \| 8.8 \| 92 \| 8.0 \| 50.5 \| 5.8 \| 448 x 448 \| 3.5 \|
	\| [Qwen2.5-3B](https://github.com/Qengineering/Qwen2.5-VL-3B-NPU) \| 4.8 \| 48.3 \| 4.0 \| 17.9 \| 1.8 \| 392 x 392 \| 7.0 \|
	\| [Qwen2-7B](https://github.com/Qengineering/Qwen2-VL-7B-NPU) \| 8.7 \| 86.6 \| 34.5 \| 37.1 \| 20.7 \| 392 x 392 \| 3.7 \|
	\| [Qwen2-2.2B](https://github.com/Qengineering/Qwen2-VL-2B-NPU) \| 3.3 \| 29.1 \| 2.5 \| 17.1 \| 1.7 \| 392 x 392 \| 12.5 \|
	\| [InternVL3-1B](https://github.com/Qengineering/InternVL3-NPU) \| 1.3 \| 6.8 \| 1.1 \| 7.8 \| 0.75 \| 448 x 448 \| 30 \|
	\| [SmolVLM2-2.2B](https://github.com/Qengineering/SmolVLM2-2B-NPU) \| 3.4 \| 21.2 \| 2.6 \| 10.5 \| 0.9 \| 384 x 384 \| 11 \|
	\| [SmolVLM2-500M](https://github.com/Qengineering/SmolVLM2-500M-NPU) \| 0.8 \| 4.8 \| 0.7 \| 2.5 \| 0.25 \| 384 x 384 \| 31 \|
	\| [SmolVLM2-256M](https://github.com/Qengineering/SmolVLM2-256M-NPU) \| 0.5 \| 1.1 \| 0.4 \| 2.5 \| 0.25 \| 384 x 384 \| 54 \|

	<sup>1</sup> The total used memory; LLM plus the VLM. <br>
	<sup>2</sup> When an llm/vlm model is loaded for the first time from your disk to RAM or NPU, it is called a cold start.<br>
	The duration depends on your OS, I/O transfer rate, and memory mapping.<br>
	<sup>3</sup> Subsequent loading (warm start) takes advantage of the already mapped data in RAM. Mostly, only a few pointers need to be restored.<br><br>
	<img width="600" height="450" alt="Plot_1" src="https://github.com/user-attachments/assets/2dde8d27-c8ae-474c-b845-4ed52bdc0785" /><br>
	<img width="600" height="450" alt="Plot_2" src="https://github.com/user-attachments/assets/0cf946d5-5458-4166-bc2b-fa1592ae4d6b" />


	## Example Usage

	- see: https://github.com/Qengineering/InternVL3.5-4B-NPU


	### Notes

	- This is not a Transformers-compatible model
	- This repository provides precompiled NPU binaries
	- CPU fallback is not supported