Phi-3-mini-4k-instruct — RKLLM (w8a8) for RK3588

Pre-quantized w8a8 build of microsoft/Phi-3-mini-4k-instruct ready to run on Rockchip RK3588 / RK3588S NPU via the RKLLM runtime v1.2.3.

Detail	Value
Parameters	3.82 B
Quantization	w8a8 (8-bit weights, 8-bit activations)
Context length	4 096 tokens
NPU cores	3
File size	~3.7 GB
Inference speed	6.82 tokens/sec (measured on RK3588, 3 cores)
Prefill speed	49.22 tokens/sec
RAM usage	~3.7 GB
Toolkit version	rkllm-toolkit 1.2.3
Runtime version	rkllm runtime 1.2.3
Calibration	20 diverse prompts (general knowledge, code, math, science)

What is Phi-3?

Phi-3-Mini-4K-Instruct is a 3.8B parameter lightweight model from Microsoft, trained on 4.9 trillion tokens. It excels at reasoning, math, and code generation while being small enough for edge deployments. MIT licensed.

How to use

1. Download

Place the .rkllm file and config.json in a model directory on your RK3588 board:

mkdir -p ~/models/Phi-3-mini-4k-instruct
# Copy both files into this directory

2. Run with rkllm binary

cd ~/models/Phi-3-mini-4k-instruct
rkllm Phi-3-mini-4k-instruct-w8a8.rkllm

The RKLLM runtime applies Phi-3's chat template (<|system|>...<|end|><|user|>...<|end|><|assistant|>) internally at the token level. Send plain text only.

3. Chat template

Phi-3 uses a distinctive chat format:

<|system|>
You are a helpful assistant.<|end|>
<|user|>
How does photosynthesis work?<|end|>
<|assistant|>

The runtime handles this automatically — no manual template needed.

Conversion details

Toolkit: rkllm-toolkit 1.2.3 (Python 3.10, x86_64)

from rkllm.api import RKLLM

llm = RKLLM()
llm.load_huggingface(model="./model", model_lora=None, device="cpu")
llm.build(
    do_quantization=True,
    optimization_level=1,
    quantized_dtype="w8a8",
    quantized_algorithm="normal",
    target_platform="rk3588",
    num_npu_core=3,
    extra_qparams=None,
    dataset="./data_quant.json",
    max_context=4096,
)
llm.export_rkllm("./Phi-3-mini-4k-instruct-w8a8.rkllm")

Calibration dataset: 20 prompts covering general knowledge, code, math, science, and creative writing, formatted with Phi-3's native chat template via tokenizer.apply_chat_template().

Notes

No <think> tags: Phi-3 is not a reasoning model — it does not produce chain-of-thought in <think>...</think> tags. For thinking mode, use Qwen3 models.
Strong at math/code: Despite its size, Phi-3 scores 85.7% on GSM8K and 57.3% on HumanEval.
English-primary: Best performance in English. Other languages have reduced quality.

Hardware tested

Orange Pi 5 Plus (RK3588, 16 GB RAM)
RKNPU driver 0.9.8
RKLLM runtime 1.2.3

License

MIT — same as the base model.

Downloads last month: 6

Model tree for GatekeeperZA/Phi-3-mini-4k-instruct-RKLLM-v1.2.3

Base model

microsoft/Phi-3-mini-4k-instruct

Finetuned

(467)

this model

Collection including GatekeeperZA/Phi-3-mini-4k-instruct-RKLLM-v1.2.3

RKLLM-v1.2.3 - Models

Collection

RK3588 NPU Converted Models | NPU Driver: v0.9.8 RKLLM Runtime: v1.2.3 • 4 items • Updated about 13 hours ago