GatekeeperZA's picture
Upload folder using huggingface_hub
f3915b0 verified
metadata
license: mit
base_model: microsoft/Phi-3-mini-4k-instruct
tags:
  - rkllm
  - rk3588
  - npu
  - phi3
  - quantized
  - w8a8
language:
  - en
pipeline_tag: text-generation

Phi-3-mini-4k-instruct — RKLLM (w8a8) for RK3588

Pre-quantized w8a8 build of microsoft/Phi-3-mini-4k-instruct ready to run on Rockchip RK3588 / RK3588S NPU via the RKLLM runtime v1.2.3.

Detail Value
Parameters 3.82 B
Quantization w8a8 (8-bit weights, 8-bit activations)
Context length 4 096 tokens
NPU cores 3
File size ~3.7 GB
Inference speed 6.82 tokens/sec (measured on RK3588, 3 cores)
Prefill speed 49.22 tokens/sec
RAM usage ~3.7 GB
Toolkit version rkllm-toolkit 1.2.3
Runtime version rkllm runtime 1.2.3
Calibration 20 diverse prompts (general knowledge, code, math, science)

What is Phi-3?

Phi-3-Mini-4K-Instruct is a 3.8B parameter lightweight model from Microsoft, trained on 4.9 trillion tokens. It excels at reasoning, math, and code generation while being small enough for edge deployments. MIT licensed.

How to use

1. Download

Place the .rkllm file and config.json in a model directory on your RK3588 board:

mkdir -p ~/models/Phi-3-mini-4k-instruct
# Copy both files into this directory

2. Run with rkllm binary

cd ~/models/Phi-3-mini-4k-instruct
rkllm Phi-3-mini-4k-instruct-w8a8.rkllm

The RKLLM runtime applies Phi-3's chat template (<|system|>...<|end|><|user|>...<|end|><|assistant|>) internally at the token level. Send plain text only.

3. Chat template

Phi-3 uses a distinctive chat format:

<|system|>
You are a helpful assistant.<|end|>
<|user|>
How does photosynthesis work?<|end|>
<|assistant|>

The runtime handles this automatically — no manual template needed.

Conversion details

Toolkit: rkllm-toolkit 1.2.3 (Python 3.10, x86_64)

from rkllm.api import RKLLM

llm = RKLLM()
llm.load_huggingface(model="./model", model_lora=None, device="cpu")
llm.build(
    do_quantization=True,
    optimization_level=1,
    quantized_dtype="w8a8",
    quantized_algorithm="normal",
    target_platform="rk3588",
    num_npu_core=3,
    extra_qparams=None,
    dataset="./data_quant.json",
    max_context=4096,
)
llm.export_rkllm("./Phi-3-mini-4k-instruct-w8a8.rkllm")

Calibration dataset: 20 prompts covering general knowledge, code, math, science, and creative writing, formatted with Phi-3's native chat template via tokenizer.apply_chat_template().

Notes

  • No <think> tags: Phi-3 is not a reasoning model — it does not produce chain-of-thought in <think>...</think> tags. For thinking mode, use Qwen3 models.
  • Strong at math/code: Despite its size, Phi-3 scores 85.7% on GSM8K and 57.3% on HumanEval.
  • English-primary: Best performance in English. Other languages have reduced quality.

Hardware tested

  • Orange Pi 5 Plus (RK3588, 16 GB RAM)
  • RKNPU driver 0.9.8
  • RKLLM runtime 1.2.3

License

MIT — same as the base model.