SmolVLM2-2.2B for RK3588 NPU
This repository provides a hardware-accelerated port of SmolVLM2-2.2B optimized for Rockchip RK3588 NPU.
User:<image>Describe the image.
Answer: In this captivating digital painting, an astronaut is comfortably seated on the surface of the moon, with their feet propped up on a green cooler. The astronaut, clad in a pristine white spacesuit, has a relaxed posture, suggesting they are enjoying a leisurely moment away from the hustle and bustle of space exploration.
The backdrop to this tranquil scene is a breathtaking view of Earth, which is visible in the distance. The planet's blue oceans contrast beautifully with its brown landmasses, while its white clouds add depth to the atmosphere. The moon itself forms a stark, cratered landscape that serves as an excellent contrast to the vibrant colors of Earth.
The astronaut's gaze is directed towards the Earth, indicating their fascination with our home planet. Their relaxed demeanor and the presence of the cooler suggest they might be taking a break from their spacewalk or perhaps enjoying a well-deserved moment on the lunar surface. The painting beautifully captures this serene moment in space exploration, highlighting both the astronaut's peaceful solitude and the awe-inspiring beauty of Earth.
Model Files
| Component | File | Precision |
|---|---|---|
| LLM | smolvlm2-2.2b-instruct_w8a8_rk3588.rkllm |
W8A8 |
| Vision Encoder | smolvlm2-2.2b_vision_fp16_rk3588.rknn |
FP16 |
Hardware Requirements
- Rockchip RK3588 / RK3588S
- RKNPU2 driver
- Tested on:
- Rock 5C
- Ubuntu 22.04 / 24.04 (Joshua Riek)
Runtime Requirements
- RKLLM runtime
- RKNN runtime (rknpu2)
- OpenCV (for image preprocessing)
Model performance benchmark (FPS)
All models, with C++ examples, can be found on the Q-engineering GitHub.
All LLM models are quantized to w8a8, while the VLM vision encoders use fp16.
| model | RAM (GB)1 | llm cold sec2 | llm warm sec3 | vlm cold sec2 | vlm warm sec3 | Resolution | Tokens/s |
|---|---|---|---|---|---|---|---|
| Qwen3-2B | 3.1 | 21.9 | 2.6 | 10.0 | 0.9 | 448 x 448 | 11.5 |
| Qwen3-4B | 8.7 | 49.6 | 5.6 | 10.6 | 1.1 | 448 x 448 | 5.7 |
| Qwen2.5-3B | 4.8 | 48.3 | 4.0 | 17.9 | 1.8 | 392 x 392 | 7.0 |
| Qwen2-7B | 8.7 | 86.6 | 34.5 | 37.1 | 20.7 | 392 x 392 | 3.7 |
| Qwen2-2.2B | 3.3 | 29.1 | 2.5 | 17.1 | 1.7 | 392 x 392 | 12.5 |
| InternVL3-1B | 1.3 | 6.8 | 1.1 | 7.8 | 0.75 | 448 x 448 | 30 |
| SmolVLM2-2.2B | 3.4 | 21.2 | 2.6 | 10.5 | 0.9 | 384 x 384 | 11 |
| SmolVLM2-500M | 0.8 | 4.8 | 0.7 | 2.5 | 0.25 | 384 x 384 | 31 |
| SmolVLM2-256M | 0.5 | 1.1 | 0.4 | 2.5 | 0.25 | 384 x 384 | 54 |
1 The total used memory; LLM plus the VLM.
2 When an llm/vlm model is loaded for the first time from your disk to RAM or NPU, it is called a cold start.
The duration depends on your OS, I/O transfer rate, and memory mapping.
3 Subsequent loading (warm start) takes advantage of the already mapped data in RAM. Mostly, only a few pointers need to be restored.
Example Usage
Notes
- This is not a Transformers-compatible model
- This repository provides precompiled NPU binaries
- CPU fallback is not supported
- Downloads last month
- 10