Update README.md
Browse files
README.md
CHANGED
|
@@ -18,7 +18,7 @@ pipeline_tag: text-generation
|
|
| 18 |
语言 [中文](https://huggingface.co/YCWTG/Qwen3-Coder-Next-int2-mixed-AutoRound/blob/main/README_zh.md)|English
|
| 19 |
## Model Details
|
| 20 |
|
| 21 |
-
This model is an
|
| 22 |
|
| 23 |
### Quantization Strategy (Intel MoE Recipe)
|
| 24 |
|
|
@@ -29,21 +29,18 @@ This model is an int2 model with group_size 512 and symmetric quantization of [Q
|
|
| 29 |
| shared_expert_gate | 16-bit | Skipped (shape not divisible by 32) |
|
| 30 |
| lm_head | Original | Excluded by AutoRound |
|
| 31 |
|
| 32 |
-
###
|
|
|
|
|
|
|
| 33 |
|
| 34 |
-
|
| 35 |
-
|-------|----------|-------|
|
| 36 |
-
| BF16 | 52.90% | - |
|
| 37 |
-
| **W2A16** | **51.27%** | **-1.63%** |
|
| 38 |
-
|
| 39 |
-
## How to Use
|
| 40 |
|
| 41 |
### HF Usage
|
| 42 |
|
| 43 |
```python
|
| 44 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 45 |
|
| 46 |
-
model_name = "YCWTG/Qwen3-Coder-Next-int2-AutoRound
|
| 47 |
|
| 48 |
# load the tokenizer and the model
|
| 49 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
|
@@ -75,7 +72,7 @@ print("content:", content)
|
|
| 75 |
```bash
|
| 76 |
from auto_round import AutoRound
|
| 77 |
|
| 78 |
-
model_name = "/
|
| 79 |
|
| 80 |
# Build layer config for mixed-bits (Intel recipe)
|
| 81 |
layer_config = {}
|
|
@@ -108,7 +105,7 @@ autoround = AutoRound(
|
|
| 108 |
low_gpu_mem_usage=True,
|
| 109 |
enable_alg_ext=True
|
| 110 |
)
|
| 111 |
-
output_dir="
|
| 112 |
autoround.quantize_and_save(output_dir,format="auto_round" )
|
| 113 |
|
| 114 |
```
|
|
|
|
| 18 |
语言 [中文](https://huggingface.co/YCWTG/Qwen3-Coder-Next-int2-mixed-AutoRound/blob/main/README_zh.md)|English
|
| 19 |
## Model Details
|
| 20 |
|
| 21 |
+
This model is an **mixed-bits INT2 quantized** model with group_size 512 and symmetric quantization of [Qwen/Qwen3-Coder-Next](https://huggingface.co/Qwen/Qwen3-Coder-Next) generated by [intel/auto-round](https://github.com/intel/auto-round). Please follow the license of the original model.
|
| 22 |
|
| 23 |
### Quantization Strategy (Intel MoE Recipe)
|
| 24 |
|
|
|
|
| 29 |
| shared_expert_gate | 16-bit | Skipped (shape not divisible by 32) |
|
| 30 |
| lm_head | Original | Excluded by AutoRound |
|
| 31 |
|
| 32 |
+
### Model Size
|
| 33 |
+
- **Original BF16**: ~160GB
|
| 34 |
+
- **mixed INT2**: ~25GB
|
| 35 |
|
| 36 |
+
## Quickstart
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
### HF Usage
|
| 39 |
|
| 40 |
```python
|
| 41 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 42 |
|
| 43 |
+
model_name = "YCWTG/Qwen3-Coder-Next-int2-mixed-AutoRound"
|
| 44 |
|
| 45 |
# load the tokenizer and the model
|
| 46 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
|
|
|
| 72 |
```bash
|
| 73 |
from auto_round import AutoRound
|
| 74 |
|
| 75 |
+
model_name = "Qwen/Qwen3-Coder-Next"
|
| 76 |
|
| 77 |
# Build layer config for mixed-bits (Intel recipe)
|
| 78 |
layer_config = {}
|
|
|
|
| 105 |
low_gpu_mem_usage=True,
|
| 106 |
enable_alg_ext=True
|
| 107 |
)
|
| 108 |
+
output_dir="~/.cache/model/Qwen3-Coder-Next-int2-mixed-AutoRound"
|
| 109 |
autoround.quantize_and_save(output_dir,format="auto_round" )
|
| 110 |
|
| 111 |
```
|