--- license: mit base_model: microsoft/Phi-3-mini-4k-instruct tags: - rkllm - rk3588 - npu - phi3 - quantized - w8a8 language: - en pipeline_tag: text-generation --- # Phi-3-mini-4k-instruct — RKLLM (w8a8) for RK3588 Pre-quantized **w8a8** build of [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) ready to run on Rockchip **RK3588 / RK3588S** NPU via the [RKLLM runtime v1.2.3](https://github.com/airockchip/rknn-llm). | Detail | Value | |---|---| | Parameters | 3.82 B | | Quantization | w8a8 (8-bit weights, 8-bit activations) | | Context length | 4 096 tokens | | NPU cores | 3 | | File size | ~3.7 GB | | Inference speed | **6.82 tokens/sec** (measured on RK3588, 3 cores) | | Prefill speed | 49.22 tokens/sec | | RAM usage | ~3.7 GB | | Toolkit version | rkllm-toolkit 1.2.3 | | Runtime version | rkllm runtime 1.2.3 | | Calibration | 20 diverse prompts (general knowledge, code, math, science) | ## What is Phi-3? Phi-3-Mini-4K-Instruct is a 3.8B parameter lightweight model from Microsoft, trained on 4.9 trillion tokens. It excels at reasoning, math, and code generation while being small enough for edge deployments. MIT licensed. ## How to use ### 1. Download Place the `.rkllm` file and `config.json` in a model directory on your RK3588 board: ```bash mkdir -p ~/models/Phi-3-mini-4k-instruct # Copy both files into this directory ``` ### 2. Run with rkllm binary ```bash cd ~/models/Phi-3-mini-4k-instruct rkllm Phi-3-mini-4k-instruct-w8a8.rkllm ``` The RKLLM runtime applies Phi-3's chat template (`<|system|>...<|end|><|user|>...<|end|><|assistant|>`) internally at the token level. Send plain text only. ### 3. Chat template Phi-3 uses a distinctive chat format: ``` <|system|> You are a helpful assistant.<|end|> <|user|> How does photosynthesis work?<|end|> <|assistant|> ``` The runtime handles this automatically — no manual template needed. ## Conversion details **Toolkit:** rkllm-toolkit 1.2.3 (Python 3.10, x86_64) ```python from rkllm.api import RKLLM llm = RKLLM() llm.load_huggingface(model="./model", model_lora=None, device="cpu") llm.build( do_quantization=True, optimization_level=1, quantized_dtype="w8a8", quantized_algorithm="normal", target_platform="rk3588", num_npu_core=3, extra_qparams=None, dataset="./data_quant.json", max_context=4096, ) llm.export_rkllm("./Phi-3-mini-4k-instruct-w8a8.rkllm") ``` **Calibration dataset:** 20 prompts covering general knowledge, code, math, science, and creative writing, formatted with Phi-3's native chat template via `tokenizer.apply_chat_template()`. ## Notes - **No `` tags:** Phi-3 is not a reasoning model — it does not produce chain-of-thought in `...` tags. For thinking mode, use Qwen3 models. - **Strong at math/code:** Despite its size, Phi-3 scores 85.7% on GSM8K and 57.3% on HumanEval. - **English-primary:** Best performance in English. Other languages have reduced quality. ## Hardware tested - Orange Pi 5 Plus (RK3588, 16 GB RAM) - RKNPU driver 0.9.8 - RKLLM runtime 1.2.3 ## License MIT — same as the base model.