wli1995

Upload folder using huggingface_hub

8a5b6b6 verified 6 months ago

2.51 kB

license: mit
language:
  - zh
  - en
base_model:
  - Qwen/Qwen2.5-1.5B-Instruct-GPTQ-INT8
  - Qwen/Qwen2.5-1.5B-Instruct-GPTQ-INT4
pipeline_tag: text-generation
library_name: transformers
tags:
  - Context
  - Qwen2.5-1.5B-Instruct-GPTQ-INT8
  - Qwen2.5-1.5B-Instruct-GPTQ-INT4

Qwen2.5-1.5B-Instruct-python

This version of Qwen2.5-1.5B-Instruct-python has been converted to run on the Axera NPU using w8a16 and w4a16 quantization.

This model has been optimized with the following LoRA:

Compatible with Pulsar2 version: 4.1

Feature

Support for longer contexts, in this sample it's 2.5k
Support context dialogue
System prompt kvcache is supported

Convert tools links:

For those who are interested in model conversion, you can try to export axmodel through the original repo : https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8

Pulsar2 Link, How to Convert LLM from Huggingface to axmodel

AXera NPU AXEngine LLM Runtime

AXera NPU AXCL LLM Runtime

Convert script

The follow show how to convert Qwen2.5-1.5B-Instruct-GPTQ-Int8

pulsar2 llm_build --input_path Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8  \
                  --output_path Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8-ctx-ax650 \
                  --hidden_state_type bf16 --kv_cache_len 2047 --prefill_len 128 \
                  --last_kv_cache_len 128 \
                  --last_kv_cache_len 256 \
                  --last_kv_cache_len 384 \
                  --last_kv_cache_len 512 \
                  --last_kv_cache_len 640 \
                  --last_kv_cache_len 768 \
                  --last_kv_cache_len 896 \
                  --last_kv_cache_len 1024 \
                  --chip AX650 -c 1 --parallel 8

Support Platform

AX650
- AX650N DEMO Board
- M4N-Dock(爱芯派Pro)
- M.2 Accelerator card
AX630C
- TBD

How to use

Download all files from this repository to the device

root@ax650:/mnt/qtang/llm-test/Qwen2.5-1.5B-Instruct-python# tree -L 1
.
├── chat.py
├── infer.py
├── infer_torch.py
├── Qwen2.5-1.5B-Instruct-GPTQ-Int8
├── Qwen2.5-1.5B-Instruct-GPTQ-Int8_axmodel
└── README.md

2 directories, 4 files