dchenna's picture
Update README.md
d8e279f verified
metadata
pipeline_tag: text-generation
base_model:
  - Qwen/Qwen3-VL-4B-Instruct
metrics:
  - perplexity

Qwen3-VL-4B-Instruct-per-grp-quant

  • Introduction

    This model was quantized using amd_quark-0.11
  • Quantization Strategy

    • Quantized Layers: All linear layers
    • Weight: uint4 asymmetric per-group with group_size=128.
  • Quick Start

  1. Downalod the Qwen3-VL-4B-Instruct model.
  2. Run the quantization script in the example folder using the following command line:
    python run_qwen3_vl_4b_quant_model.py
    

Evaluation

Quark currently uses perplexity(PPL) as the evaluation metric for accuracy loss before and after quantization.The specific PPL algorithm can be referenced in the quantize_quark.py. The quantization evaluation results are conducted in pseudo-quantization mode, which may slightly differ from the actual quantized inference accuracy. These results are provided for reference only.

Evaluation scores

Benchmark Qwen3-VL-4B-Instruct Qwen3-VL-4B-Instruct-per-grp-quant (this model)
Perplexity-wikitext2 10.5369 11.6644

License

Modifications copyright(c) 2024 Advanced Micro Devices,Inc. All rights reserved.