File size: 3,267 Bytes

f3915b0

---

license: mit
base_model: microsoft/Phi-3-mini-4k-instruct
tags:
  - rkllm
  - rk3588
  - npu
  - phi3
  - quantized
  - w8a8
language:
  - en
pipeline_tag: text-generation
---


# Phi-3-mini-4k-instruct — RKLLM (w8a8) for RK3588

Pre-quantized **w8a8** build of [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) ready to run on Rockchip **RK3588 / RK3588S** NPU via the [RKLLM runtime v1.2.3](https://github.com/airockchip/rknn-llm).

| Detail | Value |
|---|---|
| Parameters | 3.82 B |
| Quantization | w8a8 (8-bit weights, 8-bit activations) |
| Context length | 4 096 tokens |
| NPU cores | 3 |
| File size | ~3.7 GB |
| Inference speed | **6.82 tokens/sec** (measured on RK3588, 3 cores) |
| Prefill speed | 49.22 tokens/sec |
| RAM usage | ~3.7 GB |
| Toolkit version | rkllm-toolkit 1.2.3 |
| Runtime version | rkllm runtime 1.2.3 |
| Calibration | 20 diverse prompts (general knowledge, code, math, science) |

## What is Phi-3?

Phi-3-Mini-4K-Instruct is a 3.8B parameter lightweight model from Microsoft, trained on 4.9 trillion tokens. It excels at reasoning, math, and code generation while being small enough for edge deployments. MIT licensed.

## How to use

### 1. Download

Place the `.rkllm` file and `config.json` in a model directory on your RK3588 board:

```bash

mkdir -p ~/models/Phi-3-mini-4k-instruct

# Copy both files into this directory

```

### 2. Run with rkllm binary

```bash

cd ~/models/Phi-3-mini-4k-instruct

rkllm Phi-3-mini-4k-instruct-w8a8.rkllm

```

The RKLLM runtime applies Phi-3's chat template (`<|system|>...<|end|><|user|>...<|end|><|assistant|>`) internally at the token level. Send plain text only.

### 3. Chat template

Phi-3 uses a distinctive chat format:
```

<|system|>

You are a helpful assistant.<|end|>

<|user|>

How does photosynthesis work?<|end|>

<|assistant|>

```

The runtime handles this automatically — no manual template needed.

## Conversion details

**Toolkit:** rkllm-toolkit 1.2.3 (Python 3.10, x86_64)



```python

from rkllm.api import RKLLM



llm = RKLLM()

llm.load_huggingface(model="./model", model_lora=None, device="cpu")

llm.build(

    do_quantization=True,
    optimization_level=1,

    quantized_dtype="w8a8",

    quantized_algorithm="normal",

    target_platform="rk3588",

    num_npu_core=3,

    extra_qparams=None,

    dataset="./data_quant.json",

    max_context=4096,

)

llm.export_rkllm("./Phi-3-mini-4k-instruct-w8a8.rkllm")

```


**Calibration dataset:** 20 prompts covering general knowledge, code, math, science, and creative writing, formatted with Phi-3's native chat template via `tokenizer.apply_chat_template()`.

## Notes

- **No `<think>` tags:** Phi-3 is not a reasoning model — it does not produce chain-of-thought in `<think>...</think>` tags. For thinking mode, use Qwen3 models.
- **Strong at math/code:** Despite its size, Phi-3 scores 85.7% on GSM8K and 57.3% on HumanEval.
- **English-primary:** Best performance in English. Other languages have reduced quality.

## Hardware tested

- Orange Pi 5 Plus (RK3588, 16 GB RAM)
- RKNPU driver 0.9.8
- RKLLM runtime 1.2.3

## License

MIT — same as the base model.