GatekeeperZA's picture
Upload folder using huggingface_hub
f3915b0 verified
---
license: mit
base_model: microsoft/Phi-3-mini-4k-instruct
tags:
- rkllm
- rk3588
- npu
- phi3
- quantized
- w8a8
language:
- en
pipeline_tag: text-generation
---
# Phi-3-mini-4k-instruct — RKLLM (w8a8) for RK3588
Pre-quantized **w8a8** build of [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) ready to run on Rockchip **RK3588 / RK3588S** NPU via the [RKLLM runtime v1.2.3](https://github.com/airockchip/rknn-llm).
| Detail | Value |
|---|---|
| Parameters | 3.82 B |
| Quantization | w8a8 (8-bit weights, 8-bit activations) |
| Context length | 4 096 tokens |
| NPU cores | 3 |
| File size | ~3.7 GB |
| Inference speed | **6.82 tokens/sec** (measured on RK3588, 3 cores) |
| Prefill speed | 49.22 tokens/sec |
| RAM usage | ~3.7 GB |
| Toolkit version | rkllm-toolkit 1.2.3 |
| Runtime version | rkllm runtime 1.2.3 |
| Calibration | 20 diverse prompts (general knowledge, code, math, science) |
## What is Phi-3?
Phi-3-Mini-4K-Instruct is a 3.8B parameter lightweight model from Microsoft, trained on 4.9 trillion tokens. It excels at reasoning, math, and code generation while being small enough for edge deployments. MIT licensed.
## How to use
### 1. Download
Place the `.rkllm` file and `config.json` in a model directory on your RK3588 board:
```bash
mkdir -p ~/models/Phi-3-mini-4k-instruct
# Copy both files into this directory
```
### 2. Run with rkllm binary
```bash
cd ~/models/Phi-3-mini-4k-instruct
rkllm Phi-3-mini-4k-instruct-w8a8.rkllm
```
The RKLLM runtime applies Phi-3's chat template (`<|system|>...<|end|><|user|>...<|end|><|assistant|>`) internally at the token level. Send plain text only.
### 3. Chat template
Phi-3 uses a distinctive chat format:
```
<|system|>
You are a helpful assistant.<|end|>
<|user|>
How does photosynthesis work?<|end|>
<|assistant|>
```
The runtime handles this automatically — no manual template needed.
## Conversion details
**Toolkit:** rkllm-toolkit 1.2.3 (Python 3.10, x86_64)
```python
from rkllm.api import RKLLM
llm = RKLLM()
llm.load_huggingface(model="./model", model_lora=None, device="cpu")
llm.build(
do_quantization=True,
optimization_level=1,
quantized_dtype="w8a8",
quantized_algorithm="normal",
target_platform="rk3588",
num_npu_core=3,
extra_qparams=None,
dataset="./data_quant.json",
max_context=4096,
)
llm.export_rkllm("./Phi-3-mini-4k-instruct-w8a8.rkllm")
```
**Calibration dataset:** 20 prompts covering general knowledge, code, math, science, and creative writing, formatted with Phi-3's native chat template via `tokenizer.apply_chat_template()`.
## Notes
- **No `<think>` tags:** Phi-3 is not a reasoning model — it does not produce chain-of-thought in `<think>...</think>` tags. For thinking mode, use Qwen3 models.
- **Strong at math/code:** Despite its size, Phi-3 scores 85.7% on GSM8K and 57.3% on HumanEval.
- **English-primary:** Best performance in English. Other languages have reduced quality.
## Hardware tested
- Orange Pi 5 Plus (RK3588, 16 GB RAM)
- RKNPU driver 0.9.8
- RKLLM runtime 1.2.3
## License
MIT — same as the base model.