GatekeeperZA
/

Phi-3-mini-4k-instruct-RKLLM-v1.2.3

Text Generation

Model card Files Files and versions

Phi-3-mini-4k-instruct-RKLLM-v1.2.3 / README.md

GatekeeperZA's picture

Upload folder using huggingface_hub

f3915b0 verified 4 days ago

|

history blame contribute delete

3.27 kB

	---
	license: mit
	base_model: microsoft/Phi-3-mini-4k-instruct
	tags:
	- rkllm
	- rk3588
	- npu
	- phi3
	- quantized
	- w8a8
	language:
	- en
	pipeline_tag: text-generation
	---

	# Phi-3-mini-4k-instruct — RKLLM (w8a8) for RK3588

	Pre-quantized w8a8 build of [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) ready to run on Rockchip RK3588 / RK3588S NPU via the [RKLLM runtime v1.2.3](https://github.com/airockchip/rknn-llm).

	\| Detail \| Value \|
	\|---\|---\|
	\| Parameters \| 3.82 B \|
	\| Quantization \| w8a8 (8-bit weights, 8-bit activations) \|
	\| Context length \| 4 096 tokens \|
	\| NPU cores \| 3 \|
	\| File size \| ~3.7 GB \|
	\| Inference speed \| 6.82 tokens/sec (measured on RK3588, 3 cores) \|
	\| Prefill speed \| 49.22 tokens/sec \|
	\| RAM usage \| ~3.7 GB \|
	\| Toolkit version \| rkllm-toolkit 1.2.3 \|
	\| Runtime version \| rkllm runtime 1.2.3 \|
	\| Calibration \| 20 diverse prompts (general knowledge, code, math, science) \|

	## What is Phi-3?

	Phi-3-Mini-4K-Instruct is a 3.8B parameter lightweight model from Microsoft, trained on 4.9 trillion tokens. It excels at reasoning, math, and code generation while being small enough for edge deployments. MIT licensed.

	## How to use

	### 1. Download

	Place the `.rkllm` file and `config.json` in a model directory on your RK3588 board:

	```bash
	mkdir -p ~/models/Phi-3-mini-4k-instruct
	# Copy both files into this directory
	```

	### 2. Run with rkllm binary

	```bash
	cd ~/models/Phi-3-mini-4k-instruct
	rkllm Phi-3-mini-4k-instruct-w8a8.rkllm
	```

	The RKLLM runtime applies Phi-3's chat template (`<\|system\|>...<\|end\|><\|user\|>...<\|end\|><\|assistant\|>`) internally at the token level. Send plain text only.

	### 3. Chat template

	Phi-3 uses a distinctive chat format:
	```
	<\|system\|>
	You are a helpful assistant.<\|end\|>
	<\|user\|>
	How does photosynthesis work?<\|end\|>
	<\|assistant\|>
	```

	The runtime handles this automatically — no manual template needed.

	## Conversion details

	Toolkit: rkllm-toolkit 1.2.3 (Python 3.10, x86_64)

	```python
	from rkllm.api import RKLLM

	llm = RKLLM()
	llm.load_huggingface(model="./model", model_lora=None, device="cpu")
	llm.build(
	do_quantization=True,
	optimization_level=1,
	quantized_dtype="w8a8",
	quantized_algorithm="normal",
	target_platform="rk3588",
	num_npu_core=3,
	extra_qparams=None,
	dataset="./data_quant.json",
	max_context=4096,
	)
	llm.export_rkllm("./Phi-3-mini-4k-instruct-w8a8.rkllm")
	```

	Calibration dataset: 20 prompts covering general knowledge, code, math, science, and creative writing, formatted with Phi-3's native chat template via `tokenizer.apply_chat_template()`.

	## Notes

	- No `<think>` tags: Phi-3 is not a reasoning model — it does not produce chain-of-thought in `<think>...</think>` tags. For thinking mode, use Qwen3 models.
	- Strong at math/code: Despite its size, Phi-3 scores 85.7% on GSM8K and 57.3% on HumanEval.
	- English-primary: Best performance in English. Other languages have reduced quality.

	## Hardware tested

	- Orange Pi 5 Plus (RK3588, 16 GB RAM)
	- RKNPU driver 0.9.8
	- RKLLM runtime 1.2.3

	## License

	MIT — same as the base model.