|
|
--- |
|
|
base_model: |
|
|
- playable/Playable1 |
|
|
--- |
|
|
### Quark Quantized Playable1 |
|
|
|
|
|
This is a fine-tuned and quark quantized version of Qwen/Qwen2.5-Coder-7B-Instruct using the 'iat-05-1' adapter. |
|
|
|
|
|
### Model Details |
|
|
- Base Model: Qwen/Qwen2.5-Coder-7B-Instruct |
|
|
- Adapter: iat-05-1 |
|
|
- Quantization: Quark / UINT4 / AWQ / BFLOAT16 |
|
|
- Format: SafeTensors |
|
|
- Perplexity Score: 10.953088 |
|
|
- Dataset: wikitext-2-raw-v1 |
|
|
|
|
|
### Quark Info |
|
|
Quantizing with the quantization configuration: |
|
|
```bash |
|
|
Config( |
|
|
global_quant_config=QuantizationConfig( |
|
|
input_tensors=None, |
|
|
output_tensors=None, |
|
|
weight=QuantizationSpec( |
|
|
dtype=Dtype.uint4, |
|
|
observer_cls=<class 'quark.torch.quantization.observer.observer.PerGroupMinMaxObserver'>, |
|
|
is_dynamic=False, |
|
|
qscheme=QSchemeType.per_group, |
|
|
ch_axis=-1, |
|
|
group_size=128, |
|
|
symmetric=False, |
|
|
round_method=RoundType.half_even, |
|
|
scale_type=ScaleType.float, |
|
|
scale_format=None, |
|
|
scale_calculation_mode=None, |
|
|
qat_spec=None, |
|
|
mx_element_dtype=None, |
|
|
zero_point_type=ZeroPointType.int32, |
|
|
is_scale_quant=False, |
|
|
), |
|
|
bias=None, |
|
|
target_device=None, |
|
|
), |
|
|
layer_type_quant_config={}, |
|
|
layer_quant_config={}, |
|
|
kv_cache_quant_config={}, |
|
|
kv_cache_group=['*k_proj', '*v_proj'], |
|
|
min_kv_scale=0.0, |
|
|
softmax_quant_spec=None, |
|
|
exclude=['[]'], |
|
|
algo_config=[ |
|
|
AWQConfig( |
|
|
name="awq", |
|
|
scaling_layers=[{'prev_op': 'input_layernorm', 'layers': ['self_attn.q_proj', 'self_attn.k_proj', 'self_attn.v_proj'], 'inp': 'self_attn.q_proj', 'module2inspect': 'self_attn'}, {'prev_op': 'self_attn.v_proj', 'layers': ['self_attn.o_proj'], 'inp': 'self_attn.o_proj'}, {'prev_op': 'post_attention_layernorm', 'layers': ['mlp.gate_proj', 'mlp.up_proj'], 'inp': 'mlp.gate_proj', 'module2inspect': 'mlp'}, {'prev_op': 'mlp.up_proj', 'layers': ['mlp.down_proj'], 'inp': 'mlp.down_proj'}], |
|
|
model_decoder_layers="model.layers", |
|
|
), |
|
|
], |
|
|
quant_mode=QuantizationMode.eager_mode, |
|
|
log_severity_level=1, |
|
|
version="0.10", |
|
|
) |
|
|
``` |
|
|
|