Configuration Parsing Warning: Config file config.json cannot be fetched (too big)

Configuration Parsing Warning: Config file tokenizer_config.json cannot be fetched (too big)

Qwen2.5-VL-3B-Instruct-per-grp-quant

  • Introduction

    This model was quantized using Quark 0.11 amd_quark-0.11
  • Quantization Strategy

    • Quantized Layers: All linear layers
    • Weight: uint4 asymmetric per-group with group_size=128.
  • Quick Start

  1. Download the model
  2. Run the quantization script in the example folder using the following command line:
    python run_qwen2_5_vl_quantization.py
    

Evaluation

Quark currently uses perplexity(PPL) as the evaluation metric for accuracy loss before and after quantization.The specific PPL algorithm can be referenced in the quantize_quark.py. The quantization evaluation results are conducted in pseudo-quantization mode, which may slightly differ from the actual quantized inference accuracy. These results are provided for reference only.

Evaluation scores

Benchmark Qwen2.5-VL-3B-Instruct Qwen2.5-VL-3B-Instruct-per-grp-quant(this model)
Perplexity-wikitext2 11.1107 13.7743

License

Modifications copyright(c) 2024 Advanced Micro Devices,Inc. All rights reserved.

Downloads last month
7
Safetensors
Model size
0.7B params
Tensor type
I32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for RyzenAI/Qwen2.5-VL-3B-Instruct-per-grp-quant

Finetuned
(697)
this model