Configuration Parsing Warning: Config file config.json cannot be fetched (too big)
Configuration Parsing Warning: Config file tokenizer_config.json cannot be fetched (too big)
Qwen2.5-VL-3B-Instruct-per-grp-quant
Introduction
This model was quantized using Quark 0.11 amd_quark-0.11Quantization Strategy
- Quantized Layers: All linear layers
- Weight: uint4 asymmetric per-group with group_size=128.
Quick Start
- Download the model
- Run the quantization script in the example folder using the following command line:
python run_qwen2_5_vl_quantization.py
Evaluation
Quark currently uses perplexity(PPL) as the evaluation metric for accuracy loss before and after quantization.The specific PPL algorithm can be referenced in the quantize_quark.py. The quantization evaluation results are conducted in pseudo-quantization mode, which may slightly differ from the actual quantized inference accuracy. These results are provided for reference only.
Evaluation scores
| Benchmark | Qwen2.5-VL-3B-Instruct | Qwen2.5-VL-3B-Instruct-per-grp-quant(this model) |
| Perplexity-wikitext2 | 11.1107 | 13.7743 |
License
Modifications copyright(c) 2024 Advanced Micro Devices,Inc. All rights reserved.
- Downloads last month
- 7
Model tree for RyzenAI/Qwen2.5-VL-3B-Instruct-per-grp-quant
Base model
Qwen/Qwen2.5-VL-3B-Instruct