KV Cache Quantization
Collection
Collection on FP8 Quantization of Weights, Activations and KV Cache • 12 items • Updated
How to use nm-testing/Llama-3.3-70B-Instruct-QKV-Cache-FP8-Per-Head with Transformers:
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("nm-testing/Llama-3.3-70B-Instruct-QKV-Cache-FP8-Per-Head", dtype="auto")