amd
/

Llama-3.1-70B-Instruct-FP8-KV

Model card Files Files and versions

luow-amd commited on Sep 9, 2024

Commit

9df306a

·

verified ·

1 Parent(s): e3ba3bb

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -5,7 +5,7 @@ license: llama3.1
 - ## Introduction
   This model was created by applying [Quark](https://quark.docs.amd.com/latest/index.html) with calibration samples from Pile dataset.
 - ## Quantization Stragegy
-  - ***Quantized Layers***：All linear layers excluding "lm_head"
   - ***Weight***: FP8 symmetric per-tensor
   - ***Activation***: FP8 symmetric per-tensor
   - ***KV Cache***: FP8 symmetric  per-tensor

 - ## Introduction
   This model was created by applying [Quark](https://quark.docs.amd.com/latest/index.html) with calibration samples from Pile dataset.
 - ## Quantization Stragegy
+  - ***Quantized Layers***: All linear layers excluding "lm_head"
   - ***Weight***: FP8 symmetric per-tensor
   - ***Activation***: FP8 symmetric per-tensor
   - ***KV Cache***: FP8 symmetric  per-tensor