tangram-gate

FastKVzip gate weights collected for the Tangram project.

Each gate is a small per-layer module trained to score KV importance for gated KV eviction. Load them with the load_gate utility from the FastKVzip codebase; the layout follows the convention {model_name_lowercased}/{tag}.pt.

Trained gates

Model File
openai/gpt-oss-20b gpt-oss-20b/q8_dim16_sink16.pt
meta-llama/Llama-3.1-8B-Instruct llama3.1-8b-instruct/q4_dim16_sink16.pt
meta-llama/Llama-3.1-8B-Instruct (earlier run) llama3.1-8b-instruct/q4_dim16_sink16_v0.pt
Qwen/Qwen3-14B qwen3-14b/q5_dim16_sink16.pt
Qwen/Qwen3-8B qwen3-8b/q4_dim16_sink16.pt
Qwen/Qwen3-4B-Instruct-2507 qwen3-4b-instruct-2507/q4_dim16_sink16.pt
Qwen/Qwen2.5-7B-Instruct-1M qwen2.5-7b-instruct-1m/q7_dim16_sink16.pt
google/gemma-3-12b-it gemma-3-12b-it/q2_dim16_sink16.pt

Attribution

FastKVzip — "Fast KVzip: Efficient and Accurate LLM Inference with Gated KV Eviction", Jang-Hyun Kim, Dongyoon Han, Sangdoo Yun (NAVER AI Lab). Original gate weights: https://huggingface.co/Jang-Hyun/Fast-KVzip

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support