Configuration Parsing Warning:Config file config.json cannot be fetched (too big)

Configuration Parsing Warning:Config file tokenizer_config.json cannot be fetched (too big)

Qwen2.5-VL-3B-Instruct-per-grp-quant

Introduction
This model was quantized using Quark 0.11 amd_quark-0.11
Quantization Strategy
- Quantized Layers: All linear layers
- Weight: uint4 asymmetric per-group with group_size=128.
Quick Start

Download the model
Run the quantization script in the example folder using the following command line:
```
python run_qwen2_5_vl_quantization.py
```

Evaluation

Quark currently uses perplexity(PPL) as the evaluation metric for accuracy loss before and after quantization.The specific PPL algorithm can be referenced in the quantize_quark.py. The quantization evaluation results are conducted in pseudo-quantization mode, which may slightly differ from the actual quantized inference accuracy. These results are provided for reference only.

Evaluation scores

Benchmark	Qwen2.5-VL-3B-Instruct	Qwen2.5-VL-3B-Instruct-per-grp-quant(this model)
Perplexity-wikitext2	11.1107	13.7743

License

Downloads last month: 2

Safetensors

Model size

0.7B params

Tensor type

I32

BF16

Model tree for RyzenAI/Qwen2.5-VL-3B-Instruct-per-grp-quant

Base model

Qwen/Qwen2.5-VL-3B-Instruct

Finetuned

(750)

this model

Qwen2.5-VL-3B-Instruct-per-grp-quant

Introduction

Quantization Strategy

Quick Start

Evaluation

Evaluation scores

License

Model tree for RyzenAI/Qwen2.5-VL-3B-Instruct-per-grp-quant