KV Cache Quantization
Collection
Collection on FP8 Quantization of Weights, Activations and KV Cache • 12 items • Updated
How to use nm-testing/Llama-3.1-8B-Instruct-QKV-Cache-FP8-Per-Tensor with Transformers:
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("nm-testing/Llama-3.1-8B-Instruct-QKV-Cache-FP8-Per-Tensor", dtype="auto")