tangram-gate
FastKVzip gate weights collected for the Tangram project.
Each gate is a small per-layer module trained to score KV importance for gated KV
eviction. Load them with the load_gate utility from the FastKVzip codebase; the
layout follows the convention {model_name_lowercased}/{tag}.pt.
Trained gates
| Model | File |
|---|---|
| openai/gpt-oss-20b | gpt-oss-20b/q8_dim16_sink16.pt |
| meta-llama/Llama-3.1-8B-Instruct | llama3.1-8b-instruct/q4_dim16_sink16.pt |
| meta-llama/Llama-3.1-8B-Instruct (earlier run) | llama3.1-8b-instruct/q4_dim16_sink16_v0.pt |
| Qwen/Qwen3-14B | qwen3-14b/q5_dim16_sink16.pt |
| Qwen/Qwen3-8B | qwen3-8b/q4_dim16_sink16.pt |
| Qwen/Qwen3-4B-Instruct-2507 | qwen3-4b-instruct-2507/q4_dim16_sink16.pt |
| Qwen/Qwen2.5-7B-Instruct-1M | qwen2.5-7b-instruct-1m/q7_dim16_sink16.pt |
| google/gemma-3-12b-it | gemma-3-12b-it/q2_dim16_sink16.pt |
Attribution
FastKVzip — "Fast KVzip: Efficient and Accurate LLM Inference with Gated KV Eviction", Jang-Hyun Kim, Dongyoon Han, Sangdoo Yun (NAVER AI Lab). Original gate weights: https://huggingface.co/Jang-Hyun/Fast-KVzip
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support