|
|
---
|
|
|
license: mit
|
|
|
base_model: microsoft/Phi-3-mini-4k-instruct
|
|
|
tags:
|
|
|
- rkllm
|
|
|
- rk3588
|
|
|
- npu
|
|
|
- phi3
|
|
|
- quantized
|
|
|
- w8a8
|
|
|
language:
|
|
|
- en
|
|
|
pipeline_tag: text-generation
|
|
|
---
|
|
|
|
|
|
# Phi-3-mini-4k-instruct — RKLLM (w8a8) for RK3588
|
|
|
|
|
|
Pre-quantized **w8a8** build of [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) ready to run on Rockchip **RK3588 / RK3588S** NPU via the [RKLLM runtime v1.2.3](https://github.com/airockchip/rknn-llm).
|
|
|
|
|
|
| Detail | Value |
|
|
|
|---|---|
|
|
|
| Parameters | 3.82 B |
|
|
|
| Quantization | w8a8 (8-bit weights, 8-bit activations) |
|
|
|
| Context length | 4 096 tokens |
|
|
|
| NPU cores | 3 |
|
|
|
| File size | ~3.7 GB |
|
|
|
| Inference speed | **6.82 tokens/sec** (measured on RK3588, 3 cores) |
|
|
|
| Prefill speed | 49.22 tokens/sec |
|
|
|
| RAM usage | ~3.7 GB |
|
|
|
| Toolkit version | rkllm-toolkit 1.2.3 |
|
|
|
| Runtime version | rkllm runtime 1.2.3 |
|
|
|
| Calibration | 20 diverse prompts (general knowledge, code, math, science) |
|
|
|
|
|
|
## What is Phi-3?
|
|
|
|
|
|
Phi-3-Mini-4K-Instruct is a 3.8B parameter lightweight model from Microsoft, trained on 4.9 trillion tokens. It excels at reasoning, math, and code generation while being small enough for edge deployments. MIT licensed.
|
|
|
|
|
|
## How to use
|
|
|
|
|
|
### 1. Download
|
|
|
|
|
|
Place the `.rkllm` file and `config.json` in a model directory on your RK3588 board:
|
|
|
|
|
|
```bash
|
|
|
mkdir -p ~/models/Phi-3-mini-4k-instruct
|
|
|
# Copy both files into this directory
|
|
|
```
|
|
|
|
|
|
### 2. Run with rkllm binary
|
|
|
|
|
|
```bash
|
|
|
cd ~/models/Phi-3-mini-4k-instruct
|
|
|
rkllm Phi-3-mini-4k-instruct-w8a8.rkllm
|
|
|
```
|
|
|
|
|
|
The RKLLM runtime applies Phi-3's chat template (`<|system|>...<|end|><|user|>...<|end|><|assistant|>`) internally at the token level. Send plain text only.
|
|
|
|
|
|
### 3. Chat template
|
|
|
|
|
|
Phi-3 uses a distinctive chat format:
|
|
|
```
|
|
|
<|system|>
|
|
|
You are a helpful assistant.<|end|>
|
|
|
<|user|>
|
|
|
How does photosynthesis work?<|end|>
|
|
|
<|assistant|>
|
|
|
```
|
|
|
|
|
|
The runtime handles this automatically — no manual template needed.
|
|
|
|
|
|
## Conversion details
|
|
|
|
|
|
**Toolkit:** rkllm-toolkit 1.2.3 (Python 3.10, x86_64)
|
|
|
|
|
|
```python
|
|
|
from rkllm.api import RKLLM
|
|
|
|
|
|
llm = RKLLM()
|
|
|
llm.load_huggingface(model="./model", model_lora=None, device="cpu")
|
|
|
llm.build(
|
|
|
do_quantization=True,
|
|
|
optimization_level=1,
|
|
|
quantized_dtype="w8a8",
|
|
|
quantized_algorithm="normal",
|
|
|
target_platform="rk3588",
|
|
|
num_npu_core=3,
|
|
|
extra_qparams=None,
|
|
|
dataset="./data_quant.json",
|
|
|
max_context=4096,
|
|
|
)
|
|
|
llm.export_rkllm("./Phi-3-mini-4k-instruct-w8a8.rkllm")
|
|
|
```
|
|
|
|
|
|
**Calibration dataset:** 20 prompts covering general knowledge, code, math, science, and creative writing, formatted with Phi-3's native chat template via `tokenizer.apply_chat_template()`.
|
|
|
|
|
|
## Notes
|
|
|
|
|
|
- **No `<think>` tags:** Phi-3 is not a reasoning model — it does not produce chain-of-thought in `<think>...</think>` tags. For thinking mode, use Qwen3 models.
|
|
|
- **Strong at math/code:** Despite its size, Phi-3 scores 85.7% on GSM8K and 57.3% on HumanEval.
|
|
|
- **English-primary:** Best performance in English. Other languages have reduced quality.
|
|
|
|
|
|
## Hardware tested
|
|
|
|
|
|
- Orange Pi 5 Plus (RK3588, 16 GB RAM)
|
|
|
- RKNPU driver 0.9.8
|
|
|
- RKLLM runtime 1.2.3
|
|
|
|
|
|
## License
|
|
|
|
|
|
MIT — same as the base model.
|
|
|
|