tangram-gate

FastKVzip gate weights collected for the Tangram project.

Each gate is a small per-layer module trained to score KV importance for gated KV eviction. Load them with the load_gate utility from the FastKVzip codebase; the layout follows the convention {model_name_lowercased}/{tag}.pt.

Trained gates

Model	File
openai/gpt-oss-20b	`gpt-oss-20b/q8_dim16_sink16.pt`
meta-llama/Llama-3.1-8B-Instruct	`llama3.1-8b-instruct/q4_dim16_sink16.pt`
meta-llama/Llama-3.1-8B-Instruct (earlier run)	`llama3.1-8b-instruct/q4_dim16_sink16_v0.pt`
Qwen/Qwen3-14B	`qwen3-14b/q5_dim16_sink16.pt`
Qwen/Qwen3-8B	`qwen3-8b/q4_dim16_sink16.pt`
Qwen/Qwen3-4B-Instruct-2507	`qwen3-4b-instruct-2507/q4_dim16_sink16.pt`
Qwen/Qwen2.5-7B-Instruct-1M	`qwen2.5-7b-instruct-1m/q7_dim16_sink16.pt`
google/gemma-3-12b-it	`gemma-3-12b-it/q2_dim16_sink16.pt`

Attribution

FastKVzip — "Fast KVzip: Efficient and Accurate LLM Inference with Gated KV Eviction", Jang-Hyun Kim, Dongyoon Han, Sangdoo Yun (NAVER AI Lab). Original gate weights: https://huggingface.co/Jang-Hyun/Fast-KVzip

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support