File size: 6,526 Bytes
3d5f9a3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 |
---
license: other
license_name: internlm-license
license_link: https://huggingface.co/internlm/internlm2-chat-1_8b/blob/main/LICENSE
base_model: internlm/internlm2-chat-1_8b
tags:
- internlm2
- rk3588
- npu
- rockchip
- quantized
- w8a8
- rkllm
- edge
language:
- en
- zh
pipeline_tag: text-generation
library_name: rkllm
---
# InternLM2-Chat-1.8B β RKLLM v1.2.3 (w8a8, RK3588)
RKLLM conversion of [internlm/internlm2-chat-1_8b](https://huggingface.co/internlm/internlm2-chat-1_8b) for Rockchip RK3588 NPU inference.
Converted with **RKLLM Toolkit v1.2.3**. This model provides a different architecture option alongside Qwen3 models on the RK3588, offering strong multilingual support (English + Chinese) and good general-purpose chat capability at ~15.6 tokens/sec.
## Key Details
| | |
|---|---|
| **Base Model** | internlm/internlm2-chat-1_8b |
| **Parameters** | 1.8B |
| **Toolkit Version** | RKLLM Toolkit v1.2.3 |
| **Runtime Version** | RKLLM Runtime β₯ v1.2.0 (v1.2.3 recommended) |
| **Quantization** | w8a8 (8-bit weights, 8-bit activations) |
| **Quantization Algorithm** | normal |
| **Target Platform** | RK3588 |
| **NPU Cores** | 3 |
| **Max Context Length** | 4,096 tokens |
| **Optimization Level** | 1 |
| **Thinking Mode** | β Not supported (standard instruct model) |
| **Languages** | English, Chinese |
## Performance (RK3588 Official Benchmark)
From the [RKLLM v1.2.3 benchmark](https://github.com/airockchip/rknn-llm/blob/main/benchmark.md) (w8a8, SeqLen=128, New_tokens=64):
| Metric | Value |
|--------|-------|
| **Decode Speed** | 15.58 tokens/sec |
| **Prefill (TTFT)** | 374 ms |
| **Memory Usage** | ~1,766 MB |
## Why InternLM2-1.8B?
InternLM2 brings **architectural diversity** to an RK3588 model lineup. If you already run Qwen3 models, adding InternLM2 gives you a different model family with its own strengths:
- **Strong bilingual capability** β trained extensively on both English and Chinese data
- **Good instruction following** β RLHF-aligned for chat applications
- **Efficient memory usage** β ~1,766 MB is significantly less than 3-4B models (~3.7-4.3 GB)
- **Fast inference** β 15.58 tok/s is solidly in the "responsive chat" bracket
- **200K native context** β the base model supports ultra-long contexts (RKLLM conversion caps at 4K for NPU efficiency, but the architecture handles long dependencies well)
### Benchmarks (Base Model)
| Benchmark | InternLM2-Chat-1.8B | InternLM2-1.8B (base) |
|-----------|---------------------|----------------------|
| MMLU | 47.1 | 46.9 |
| AGIEval | 38.8 | 33.4 |
| BBH | 35.2 | 37.5 |
| GSM8K | 39.7 | 31.2 |
| MATH | 11.8 | 5.6 |
| HumanEval | 32.9 | 25.0 |
| MBPP (Sanitized) | 23.2 | 22.2 |
Source: [OpenCompass](https://github.com/open-compass/opencompass)
## Hardware Tested
- **Orange Pi 5 Plus** β RK3588, 16 GB RAM, Armbian Linux
- RKNPU driver 0.9.8
- RKLLM Runtime v1.2.3
## Usage
### 1. Download
Place the `.rkllm` file in a model directory on your RK3588 board:
```bash
mkdir -p ~/models/InternLM2-1.8B
cd ~/models/InternLM2-1.8B
# Copy the .rkllm file into this directory
```
### 2. Run with the official RKLLM API demo
```bash
# Clone the runtime
git clone https://github.com/airockchip/rknn-llm.git
cd rknn-llm/examples/rkllm_api_demo
# Run (aarch64)
./build/rkllm_api_demo /path/to/InternLM2-1.8B-w8a8-rk3588.rkllm 2048 4096
```
### 3. Chat template
InternLM2 uses the following chat format:
```
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
How does photosynthesis work?<|im_end|>
<|im_start|>assistant
```
The RKLLM runtime handles this automatically β no manual template needed.
### 4. With a custom OpenAI-compatible server
Any server that wraps the RKLLM binary/library will work. The model responds to standard chat completion requests. See the [RKLLM API Server](https://github.com/GatekeeperZA/RKLLM-API-Server) project for a full OpenAI-compatible implementation with multi-model support.
## Conversion Script
```python
from rkllm.api import RKLLM
model_path = "internlm/internlm2-chat-1_8b" # or local path
output_path = "./InternLM2-1.8B-w8a8-rk3588.rkllm"
dataset_path = "./data_quant.json" # calibration data
# Load
llm = RKLLM()
llm.load_huggingface(model=model_path, model_lora=None, device="cpu")
# Build
llm.build(
do_quantization=True,
optimization_level=1,
quantized_dtype="w8a8",
quantized_algorithm="normal",
target_platform="rk3588",
num_npu_core=3,
extra_qparams=None,
dataset=dataset_path,
max_context=4096,
)
# Export
llm.export_rkllm(output_path)
```
Calibration dataset: 21 diverse prompt/completion pairs generated with `generate_data_quant.py` from the [rknn-llm examples](https://github.com/airockchip/rknn-llm/tree/main/examples/rkllm_api_demo/export).
## File Listing
| File | Description |
|------|-------------|
| `InternLM2-1.8B-w8a8-rk3588.rkllm` | Quantized model for RK3588 NPU |
## Compatibility Notes
- **Minimum runtime:** RKLLM Runtime v1.2.0. v1.2.3 recommended.
- **RKNPU driver:** β₯ 0.9.6
- **SoCs:** RK3588 / RK3588S (3 NPU cores). Not compatible with RK3576 (2 cores) without reconversion.
- **RAM:** ~1.8 GB loaded. Runs comfortably on 8 GB+ boards.
- **No thinking mode:** InternLM2 is a standard instruct/chat model β it does not produce `<think>β¦</think>` reasoning blocks. For thinking mode, use [Qwen3-1.7B-RKLLM-v1.2.3](https://huggingface.co/GatekeeperZA/Qwen3-1.7B-RKLLM-v1.2.3).
## Known Issues
- The folder name containing the model must **not** include dots (e.g., `InternLM2-1.8B` not `InternLM2.1.8B`) due to Python module import issues during conversion.
- InternLM2 uses a custom tokenizer (`trust_remote_code=True` required during conversion).
## Acknowledgements
- [InternLM Team (Shanghai AI Laboratory)](https://huggingface.co/internlm) for the base model
- [Rockchip / airockchip](https://github.com/airockchip/rknn-llm) for the RKLLM toolkit and runtime
- Converted by [GatekeeperZA](https://huggingface.co/GatekeeperZA)
## Citation
```bibtex
@misc{cai2024internlm2,
title={InternLM2 Technical Report},
author={Zheng Cai and Maosong Cao and Haojiong Chen and Kai Chen and others},
year={2024},
eprint={2403.17297},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
|