YCWTG
/

Qwen3-Coder-Next-int2-mixed-AutoRound

Text Generation

Mixture of Experts

Model card Files Files and versions

YCWTG commited on 4 days ago

Commit

fc3d512

·

verified ·

1 Parent(s): 11bca80

Update README.md

Files changed (1) hide show

README.md +8 -11

README.md CHANGED Viewed

@@ -18,7 +18,7 @@ pipeline_tag: text-generation
 语言 [中文](https://huggingface.co/YCWTG/Qwen3-Coder-Next-int2-mixed-AutoRound/blob/main/README_zh.md)|English
 ## Model Details
-This model is an int2 model with group_size 512 and symmetric quantization of [Qwen/Qwen3-Coder-Next](https://huggingface.co/Qwen/Qwen3-Coder-Next) generated by [intel/auto-round](https://github.com/intel/auto-round). Please follow the license of the original model.
 ### Quantization Strategy (Intel MoE Recipe)
@@ -29,21 +29,18 @@ This model is an int2 model with group_size 512 and symmetric quantization of [Q
 | shared_expert_gate | 16-bit | Skipped (shape not divisible by 32) |
 | lm_head | Original | Excluded by AutoRound |
-### MMLU-Pro
-| Model | Accuracy | Delta |
-|-------|----------|-------|
-| BF16 | 52.90% | - |
-| **W2A16** | **51.27%** | **-1.63%** |
-## How to Use
 ### HF Usage
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
-model_name = "YCWTG/Qwen3-Coder-Next-int2-AutoRound-best"
 # load the tokenizer and the model
 tokenizer = AutoTokenizer.from_pretrained(model_name)
@@ -75,7 +72,7 @@ print("content:", content)
 ```bash
 from auto_round import AutoRound
-model_name = "/home/ycwtg/PycharmProjects/MMLU-Pro/~/.cache/model/Qwen3-Coder-Next"
 # Build layer config for mixed-bits (Intel recipe)
 layer_config = {}
@@ -108,7 +105,7 @@ autoround = AutoRound(
     low_gpu_mem_usage=True,
     enable_alg_ext=True
 )
-output_dir="/home/ycwtg/PycharmProjects/MMLU-Pro/~/.cache/model/Qwen3-Coder-Next-int2-mixed-AutoRound"
 autoround.quantize_and_save(output_dir,format="auto_round" )
 ```

 语言 [中文](https://huggingface.co/YCWTG/Qwen3-Coder-Next-int2-mixed-AutoRound/blob/main/README_zh.md)|English
 ## Model Details
+This model is an **mixed-bits INT2 quantized** model with group_size 512 and symmetric quantization of [Qwen/Qwen3-Coder-Next](https://huggingface.co/Qwen/Qwen3-Coder-Next) generated by [intel/auto-round](https://github.com/intel/auto-round). Please follow the license of the original model.
 ### Quantization Strategy (Intel MoE Recipe)
 | shared_expert_gate | 16-bit | Skipped (shape not divisible by 32) |
 | lm_head | Original | Excluded by AutoRound |
+### Model Size
+- **Original BF16**: ~160GB
+- **mixed INT2**: ~25GB
+## Quickstart
 ### HF Usage
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "YCWTG/Qwen3-Coder-Next-int2-mixed-AutoRound"
 # load the tokenizer and the model
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 ```bash
 from auto_round import AutoRound
+model_name = "Qwen/Qwen3-Coder-Next"
 # Build layer config for mixed-bits (Intel recipe)
 layer_config = {}
     low_gpu_mem_usage=True,
     enable_alg_ext=True
 )
+output_dir="~/.cache/model/Qwen3-Coder-Next-int2-mixed-AutoRound"
 autoround.quantize_and_save(output_dir,format="auto_round" )
 ```