Create README.md

3a3b32b verified almost 2 years ago

1.55 kB

tags:
  - fp8
  - vllm

Meta-Llama-3-70B-Instruct-FP8

Model Overview

Meta-Llama-3-70B-Instruct quantized to FP8 weights and activations using per-tensor quantization, ready for inference with vLLM >= 0.5.0.

	Meta-Llama-3-70B-Instruct	Meta-Llama-3-70B-Instruct-FP8 (this model)
arc-c 25-shot	72.69	72.61
hellaswag 10-shot	85.50	85.41
mmlu 5-shot	80.18	80.06
truthfulqa 0-shot	62.90	62.73
winogrande 5-shot	83.34	83.03
gsm8k 5-shot	92.49	91.12
Average Accuracy	79.51	79.16
Recovery	100%	99.55%