Vishva007 commited on
Commit
8f2262f
·
verified ·
1 Parent(s): 7f38fe8

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +93 -0
README.md ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen3-VL-2B-Instruct
3
+ library_name: auto-round
4
+ license: apache-2.0
5
+ tags:
6
+ - auto-round
7
+ - intel
8
+ - qwen
9
+ - qwen3-vl
10
+ - vision-language-model
11
+ - quantization
12
+ - 4-bit
13
+ - W4A16
14
+ pipeline_tag: image-text-to-text
15
+ model_type: qwen3_vl
16
+ ---
17
+
18
+ # Qwen3-VL-2B-Instruct-W4A16-AutoRound
19
+
20
+ ## Model Overview
21
+ This is a **4-bit quantized** version of the powerful [Qwen/Qwen3-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-2B-Instruct) vision-language model.
22
+
23
+ It was optimized using **Intel's AutoRound** algorithm, which calibrates weights for 800 iterations to minimize quantization loss. This version retains the original **FP16 vision tower**, ensuring that visual capabilities (OCR, spatial reasoning, chart analysis) remain degradation-free.
24
+
25
+ ### Quantization Specifications
26
+ - **Method**: [AutoRound](https://github.com/intel/auto-round) (Advanced Weight-Only Quantization)
27
+ - **Scheme**: `W4A16` (4-bit weights, 16-bit activations)
28
+ - **Symmetric**: `True`
29
+ - **Group Size**: 128
30
+ - **Vision Tower**: Kept in FP16 (Unquantized for max accuracy)
31
+ - **Calibration**: 512 samples, 800 iterations
32
+
33
+ ## Quickstart
34
+
35
+ ### 1. Installation
36
+ To use this model in its native AutoRound format, you need the `auto-round` library.
37
+
38
+ ```bash
39
+ pip install auto-round transformers torch
40
+ ```
41
+
42
+ ### 2. Inference Code
43
+
44
+ ```python
45
+ from transformers import AutoModelForCausalLM, AutoTokenizer, AutoProcessor
46
+ from auto_round import AutoRoundConfig
47
+
48
+ model_id = "Vishva007/Qwen3-VL-2B-Instruct-W4A16-AutoRound"
49
+
50
+ # Load Model
51
+ model = AutoModelForCausalLM.from_pretrained(
52
+ model_id,
53
+ device_map="auto",
54
+ trust_remote_code=True
55
+ )
56
+ processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
57
+
58
+ # Prepare Input
59
+ messages = [
60
+ {
61
+ "role": "user",
62
+ "content": [
63
+ {"type": "image", "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"},
64
+ {"type": "text", "text": "Describe this image detailly."},
65
+ ],
66
+ }
67
+ ]
68
+
69
+ text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
70
+ image_inputs, video_inputs = process_vision_info(messages)
71
+
72
+ inputs = processor(
73
+ text=[text],
74
+ images=image_inputs,
75
+ videos=video_inputs,
76
+ padding=True,
77
+ return_tensors="pt",
78
+ ).to(model.device)
79
+
80
+ # Generate
81
+ generated_ids = model.generate(**inputs, max_new_tokens=128)
82
+ print(processor.batch_decode(generated_ids, skip_special_tokens=True))
83
+ ```
84
+
85
+ ## Citation
86
+ ```bibtex
87
+ @article{cheng2023optimize,
88
+ title={Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs},
89
+ author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao},
90
+ journal={arXiv preprint arXiv:2309.05516},
91
+ year={2023}
92
+ }
93
+ ```