Upload folder using huggingface_hub

Browse files

Files changed (3) hide show

.gitattributes +1 -0
Qwen3-1.7B-w8a8-rk3588.rkllm +3 -0
README.md +135 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+Qwen3-1.7B-w8a8-rk3588.rkllm filter=lfs diff=lfs merge=lfs -text

Qwen3-1.7B-w8a8-rk3588.rkllm ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:52219492d52ea3ec4a7143770ab1e394fb8c339caceb4002f3c7659da3e735eb
+size 2375021644

README.md ADDED Viewed

	@@ -0,0 +1,135 @@

+---
+license: apache-2.0
+library_name: rkllm
+base_model: Qwen/Qwen3-1.7B
+tags:
+  - rkllm
+  - rk3588
+  - npu
+  - rockchip
+  - qwen3
+  - thinking
+  - reasoning
+  - quantized
+  - edge-ai
+  - orange-pi
+model_name: Qwen3-1.7B-RKLLM-v1.2.3
+pipeline_tag: text-generation
+language:
+  - en
+  - zh
+---
+# Qwen3-1.7B — RKLLM v1.2.3 (w8a8, RK3588)
+RKLLM conversion of [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) for Rockchip RK3588 NPU inference.
+Converted with **RKLLM Toolkit v1.2.3**, which includes full **thinking mode support** — the model produces `<think>…</think>` reasoning blocks when used with compatible runtimes.
+## Key Details
+| Property | Value |
+|---|---|
+| **Base Model** | [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) |
+| **Toolkit Version** | RKLLM Toolkit v1.2.3 |
+| **Runtime Version** | RKLLM Runtime ≥ v1.2.1 (v1.2.3 recommended) |
+| **Quantization** | w8a8 (8-bit weights, 8-bit activations) |
+| **Quantization Algorithm** | normal |
+| **Target Platform** | RK3588 |
+| **NPU Cores** | 3 |
+| **Max Context Length** | 4096 tokens |
+| **Optimization Level** | 1 |
+| **Thinking Mode** | ✅ Supported |
+| **Languages** | English, Chinese (+ others inherited from Qwen3) |
+## Why This Conversion?
+Previous Qwen3-1.7B RKLLM conversions on HuggingFace were built with **Toolkit v1.2.0**, which predates thinking mode support (added in v1.2.1). The chat template baked into those `.rkllm` files does not include the `<think>` trigger, so the model never produces reasoning output.
+This conversion uses **Toolkit v1.2.3**, which correctly embeds the thinking-enabled chat template into the model file.
+## Thinking Mode
+Qwen3-1.7B is a hybrid thinking model. When served through an OpenAI-compatible API that parses `<think>` tags, reasoning content appears separately from the final answer — enabling UIs like Open WebUI to show a collapsible "Thinking…" section.
+Example raw output:
+```
+<think>
+The user is asking about the capital of France. This is a straightforward geography question.
+</think>
+The capital of France is Paris.
+```
+## Hardware Tested
+- **Orange Pi 5 Plus** — RK3588, 16GB RAM, Armbian Linux
+- RKNPU driver 0.9.8
+- RKLLM Runtime v1.2.3
+## Usage
+### With the official RKLLM API demo
+```bash
+# Clone the runtime
+git clone https://github.com/airockchip/rknn-llm.git
+cd rknn-llm/examples/rkllm_api_demo
+# Run (aarch64)
+./build/rkllm_api_demo /path/to/Qwen3-1.7B-w8a8-rk3588.rkllm 2048 4096
+```
+### With a custom OpenAI-compatible server
+Any server that launches the RKLLM binary and parses `<think>` tags from the output stream will work. The model responds to standard chat completion requests.
+## Conversion Script
+```python
+from rkllm.api import RKLLM
+model_path = "Qwen/Qwen3-1.7B"  # or local path
+output_path = "./Qwen3-1.7B-w8a8-rk3588.rkllm"
+dataset_path = "./data_quant.json"  # calibration data
+# Load
+llm = RKLLM()
+llm.load_huggingface(model=model_path, model_lora=None, device="cpu")
+# Build
+llm.build(
+    do_quantization=True,
+    optimization_level=1,
+    quantized_dtype="w8a8",
+    quantized_algorithm="normal",
+    target_platform="rk3588",
+    num_npu_core=3,
+    extra_qparams=None,
+    dataset=dataset_path,
+    max_context=4096,
+)
+# Export
+llm.export_rkllm(output_path)
+```
+Calibration dataset: 21 diverse prompt/completion pairs (English + Chinese) generated with `generate_data_quant.py` from the [rknn-llm examples](https://github.com/airockchip/rknn-llm/tree/main/examples/rkllm_api_demo/export).
+## File Listing
+| File | Description |
+|---|---|
+| `Qwen3-1.7B-w8a8-rk3588.rkllm` | Quantized model for RK3588 NPU |
+## Compatibility Notes
+- **Minimum runtime**: RKLLM Runtime v1.2.1 (for thinking mode). v1.2.3 recommended.
+- **RKNPU driver**: ≥ 0.9.6
+- **SoCs**: RK3588 / RK3588S (3 NPU cores). Not compatible with RK3576 (2 cores) without reconversion.
+- **RAM**: ~2GB loaded. Runs comfortably on 8GB+ boards.
+## Acknowledgements
+- [Qwen Team](https://huggingface.co/Qwen) for the base model
+- [Rockchip / airockchip](https://github.com/airockchip/rknn-llm) for the RKLLM toolkit and runtime
+- Converted by [GatekeeperZA](https://huggingface.co/GatekeeperZA)