SmolVLM2-256M for RK3588 NPU

This repository provides a hardware-accelerated port of SmolVLM2-256M optimized for Rockchip RK3588 NPU.

User:<image>Describe the image.

Answer: The image depicts a scene from space, specifically looking at the moon's surface. The moon is in the process of being tidied up and has been cleaned to remove any debris or stains. There are two large white objects on the right side of the image, which could be astronauts or other crew members. They appear to be working together to clean the area around the moon's surface.

The background shows a view of Earth from space, showing the planet with its atmosphere and oceans. The sky is dark, indicating that it might be either early morning or late afternoon. There are no visible clouds in the sky, which suggests that the sun is not at its peak position. The image also includes some text, but it's unclear what it says.

Overall, this image captures a moment of cleanliness and organization on the moon, with both humans and astronauts working together to clean up the area around the moon's surface.

Model Files

Component	File	Precision
LLM	`smolvlm2-256m-instruct_w8a8_rk3588.rkllm`	W8A8
Vision Encoder	`smolvlm2_256m_vision_fp16_rk3588.rknn`	FP16

Hardware Requirements

Rockchip RK3588 / RK3588S
RKNPU2 driver
Tested on:
- Rock 5C
- Ubuntu 22.04 / 24.04 (Joshua Riek)

Runtime Requirements

RKLLM runtime
RKNN runtime (rknpu2)
OpenCV (for image preprocessing)

Model performance benchmark (FPS)

All models, with C++ examples, can be found on the Q-engineering GitHub.

All LLM models are quantized to w8a8, while the VLM vision encoders use fp16.

model	RAM (GB)¹	llm cold sec²	llm warm sec³	vlm cold sec²	vlm warm sec³	Resolution	Tokens/s
Qwen3-2B	3.1	21.9	2.6	10.0	0.9	448 x 448	11.5
Qwen3-4B	8.7	49.6	5.6	10.6	1.1	448 x 448	5.7
Qwen2.5-3B	4.8	48.3	4.0	17.9	1.8	392 x 392	7.0
Qwen2-7B	8.7	86.6	34.5	37.1	20.7	392 x 392	3.7
Qwen2-2.2B	3.3	29.1	2.5	17.1	1.7	392 x 392	12.5
InternVL3-1B	1.3	6.8	1.1	7.8	0.75	448 x 448	30
SmolVLM2-2.2B	3.4	21.2	2.6	10.5	0.9	384 x 384	11
SmolVLM2-500M	0.8	4.8	0.7	2.5	0.25	384 x 384	31
SmolVLM2-256M	0.5	1.1	0.4	2.5	0.25	384 x 384	54

¹ The total used memory; LLM plus the VLM.
² When an llm/vlm model is loaded for the first time from your disk to RAM or NPU, it is called a cold start.
The duration depends on your OS, I/O transfer rate, and memory mapping.
³ Subsequent loading (warm start) takes advantage of the already mapped data in RAM. Mostly, only a few pointers need to be restored.

Plot_1
Plot_2

Example Usage

see: https://github.com/Qengineering/SmolVLM2-256M-NPU

Notes

This is not a Transformers-compatible model
This repository provides precompiled NPU binaries
CPU fallback is not supported

Downloads last month: 20

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support