Phi-3-mini-4k-instruct β RKLLM (w8a8) for RK3588
Pre-quantized w8a8 build of microsoft/Phi-3-mini-4k-instruct ready to run on Rockchip RK3588 / RK3588S NPU via the RKLLM runtime v1.2.3.
| Detail | Value |
|---|---|
| Parameters | 3.82 B |
| Quantization | w8a8 (8-bit weights, 8-bit activations) |
| Context length | 4 096 tokens |
| NPU cores | 3 |
| File size | ~3.7 GB |
| Inference speed | 6.82 tokens/sec (measured on RK3588, 3 cores) |
| Prefill speed | 49.22 tokens/sec |
| RAM usage | ~3.7 GB |
| Toolkit version | rkllm-toolkit 1.2.3 |
| Runtime version | rkllm runtime 1.2.3 |
| Calibration | 20 diverse prompts (general knowledge, code, math, science) |
What is Phi-3?
Phi-3-Mini-4K-Instruct is a 3.8B parameter lightweight model from Microsoft, trained on 4.9 trillion tokens. It excels at reasoning, math, and code generation while being small enough for edge deployments. MIT licensed.
How to use
1. Download
Place the .rkllm file and config.json in a model directory on your RK3588 board:
mkdir -p ~/models/Phi-3-mini-4k-instruct
# Copy both files into this directory
2. Run with rkllm binary
cd ~/models/Phi-3-mini-4k-instruct
rkllm Phi-3-mini-4k-instruct-w8a8.rkllm
The RKLLM runtime applies Phi-3's chat template (<|system|>...<|end|><|user|>...<|end|><|assistant|>) internally at the token level. Send plain text only.
3. Chat template
Phi-3 uses a distinctive chat format:
<|system|>
You are a helpful assistant.<|end|>
<|user|>
How does photosynthesis work?<|end|>
<|assistant|>
The runtime handles this automatically β no manual template needed.
Conversion details
Toolkit: rkllm-toolkit 1.2.3 (Python 3.10, x86_64)
from rkllm.api import RKLLM
llm = RKLLM()
llm.load_huggingface(model="./model", model_lora=None, device="cpu")
llm.build(
do_quantization=True,
optimization_level=1,
quantized_dtype="w8a8",
quantized_algorithm="normal",
target_platform="rk3588",
num_npu_core=3,
extra_qparams=None,
dataset="./data_quant.json",
max_context=4096,
)
llm.export_rkllm("./Phi-3-mini-4k-instruct-w8a8.rkllm")
Calibration dataset: 20 prompts covering general knowledge, code, math, science, and creative writing, formatted with Phi-3's native chat template via tokenizer.apply_chat_template().
Notes
- No
<think>tags: Phi-3 is not a reasoning model β it does not produce chain-of-thought in<think>...</think>tags. For thinking mode, use Qwen3 models. - Strong at math/code: Despite its size, Phi-3 scores 85.7% on GSM8K and 57.3% on HumanEval.
- English-primary: Best performance in English. Other languages have reduced quality.
Hardware tested
- Orange Pi 5 Plus (RK3588, 16 GB RAM)
- RKNPU driver 0.9.8
- RKLLM runtime 1.2.3
License
MIT β same as the base model.
- Downloads last month
- 6
Model tree for GatekeeperZA/Phi-3-mini-4k-instruct-RKLLM-v1.2.3
Base model
microsoft/Phi-3-mini-4k-instruct