YCWTG commited on
Commit
fc3d512
·
verified ·
1 Parent(s): 11bca80

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -11
README.md CHANGED
@@ -18,7 +18,7 @@ pipeline_tag: text-generation
18
  语言 [中文](https://huggingface.co/YCWTG/Qwen3-Coder-Next-int2-mixed-AutoRound/blob/main/README_zh.md)|English
19
  ## Model Details
20
 
21
- This model is an int2 model with group_size 512 and symmetric quantization of [Qwen/Qwen3-Coder-Next](https://huggingface.co/Qwen/Qwen3-Coder-Next) generated by [intel/auto-round](https://github.com/intel/auto-round). Please follow the license of the original model.
22
 
23
  ### Quantization Strategy (Intel MoE Recipe)
24
 
@@ -29,21 +29,18 @@ This model is an int2 model with group_size 512 and symmetric quantization of [Q
29
  | shared_expert_gate | 16-bit | Skipped (shape not divisible by 32) |
30
  | lm_head | Original | Excluded by AutoRound |
31
 
32
- ### MMLU-Pro
 
 
33
 
34
- | Model | Accuracy | Delta |
35
- |-------|----------|-------|
36
- | BF16 | 52.90% | - |
37
- | **W2A16** | **51.27%** | **-1.63%** |
38
-
39
- ## How to Use
40
 
41
  ### HF Usage
42
 
43
  ```python
44
  from transformers import AutoModelForCausalLM, AutoTokenizer
45
 
46
- model_name = "YCWTG/Qwen3-Coder-Next-int2-AutoRound-best"
47
 
48
  # load the tokenizer and the model
49
  tokenizer = AutoTokenizer.from_pretrained(model_name)
@@ -75,7 +72,7 @@ print("content:", content)
75
  ```bash
76
  from auto_round import AutoRound
77
 
78
- model_name = "/home/ycwtg/PycharmProjects/MMLU-Pro/~/.cache/model/Qwen3-Coder-Next"
79
 
80
  # Build layer config for mixed-bits (Intel recipe)
81
  layer_config = {}
@@ -108,7 +105,7 @@ autoround = AutoRound(
108
  low_gpu_mem_usage=True,
109
  enable_alg_ext=True
110
  )
111
- output_dir="/home/ycwtg/PycharmProjects/MMLU-Pro/~/.cache/model/Qwen3-Coder-Next-int2-mixed-AutoRound"
112
  autoround.quantize_and_save(output_dir,format="auto_round" )
113
 
114
  ```
 
18
  语言 [中文](https://huggingface.co/YCWTG/Qwen3-Coder-Next-int2-mixed-AutoRound/blob/main/README_zh.md)|English
19
  ## Model Details
20
 
21
+ This model is an **mixed-bits INT2 quantized** model with group_size 512 and symmetric quantization of [Qwen/Qwen3-Coder-Next](https://huggingface.co/Qwen/Qwen3-Coder-Next) generated by [intel/auto-round](https://github.com/intel/auto-round). Please follow the license of the original model.
22
 
23
  ### Quantization Strategy (Intel MoE Recipe)
24
 
 
29
  | shared_expert_gate | 16-bit | Skipped (shape not divisible by 32) |
30
  | lm_head | Original | Excluded by AutoRound |
31
 
32
+ ### Model Size
33
+ - **Original BF16**: ~160GB
34
+ - **mixed INT2**: ~25GB
35
 
36
+ ## Quickstart
 
 
 
 
 
37
 
38
  ### HF Usage
39
 
40
  ```python
41
  from transformers import AutoModelForCausalLM, AutoTokenizer
42
 
43
+ model_name = "YCWTG/Qwen3-Coder-Next-int2-mixed-AutoRound"
44
 
45
  # load the tokenizer and the model
46
  tokenizer = AutoTokenizer.from_pretrained(model_name)
 
72
  ```bash
73
  from auto_round import AutoRound
74
 
75
+ model_name = "Qwen/Qwen3-Coder-Next"
76
 
77
  # Build layer config for mixed-bits (Intel recipe)
78
  layer_config = {}
 
105
  low_gpu_mem_usage=True,
106
  enable_alg_ext=True
107
  )
108
+ output_dir="~/.cache/model/Qwen3-Coder-Next-int2-mixed-AutoRound"
109
  autoround.quantize_and_save(output_dir,format="auto_round" )
110
 
111
  ```