amd
/

Llama-3.1-8B-Instruct-FP8-KV

Model card Files Files and versions

add deployment description

#5

by luow-amd - opened Sep 9, 2024

base: refs/heads/main

←

from: refs/pr/5

Discussion Files changed

Files changed (1) hide show

README.md +7 -1

README.md CHANGED Viewed

@@ -2,6 +2,7 @@
 license: llama3.1
 ---
 # Meta-Llama-3.1-8B-Instruct-FP8-KV
   This model was created by applying [Quark](https://quark.docs.amd.com/latest/index.html) with calibration samples from Pile dataset.
 - ## Quantization Stragegy
   - ***Quantized Layers***：All linear layers excluding "lm_head"
@@ -32,9 +33,12 @@ python3 quantize_quark.py \
         --multi_gpu \
         --model_export quark_safetensors
 ```
 ## Evaluation
 Quark currently uses perplexity(PPL) as the evaluation metric for accuracy loss before and after quantization.The specific PPL algorithm can be referenced in the quantize_quark.py.
 #### Evaluation scores
 <table>
@@ -57,6 +61,8 @@ Quark currently uses perplexity(PPL) as the evaluation metric for accuracy loss
 </table>
 #### License
 Copyright (c) 2018-2024 Advanced Micro Devices, Inc. All Rights Reserved.

 license: llama3.1
 ---
 # Meta-Llama-3.1-8B-Instruct-FP8-KV
+- ## Introduction
   This model was created by applying [Quark](https://quark.docs.amd.com/latest/index.html) with calibration samples from Pile dataset.
 - ## Quantization Stragegy
   - ***Quantized Layers***：All linear layers excluding "lm_head"
         --multi_gpu \
         --model_export quark_safetensors
 ```
+## Deployment
+Quark has its own export format and allows FP8 quantized models to be efficiently deployed using the vLLM backend(vllm-compatible).
 ## Evaluation
 Quark currently uses perplexity(PPL) as the evaluation metric for accuracy loss before and after quantization.The specific PPL algorithm can be referenced in the quantize_quark.py.
+The quantization evaluation results are conducted in pseudo-quantization mode, which may slightly differ from the actual quantized inference accuracy. These results are provided for reference only.
 #### Evaluation scores
 <table>
 </table>
 #### License
 Copyright (c) 2018-2024 Advanced Micro Devices, Inc. All Rights Reserved.