initial release: H2O KV cache eviction for DeepseekV3 / MLA architectures a8d4591
GENOMA LABS / research commited on
How to use GenomaLabs-com/kv-cache-eviction-mla with Transformers:
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("GenomaLabs-com/kv-cache-eviction-mla", dtype="auto")