Text Generation
qwen3
math
trimkv
KV
Cache
Compression
nielsr HF Staff commited on
Commit
232b9f1
·
verified ·
1 Parent(s): bd1fc2a

Add pipeline tag and paper link to model card

Browse files

Hi! I'm Niels from the community science team at Hugging Face.

I've opened this PR to improve the model card for this repository. Specifically, I've:
- Added the `text-generation` pipeline tag to the metadata to improve discoverability on the Hub.
- Added a link to the original research paper: [Make Each Token Count: Towards Improving Long-Context Performance with KV Cache Eviction](https://huggingface.co/papers/2605.09649).
- Formatted the authors' information and linked the official GitHub repository.

This helps users understand the context and technical foundations of the model.

Files changed (1) hide show
  1. README.md +22 -11
README.md CHANGED
@@ -1,9 +1,10 @@
1
  ---
2
- license: apache-2.0
3
- datasets:
4
- - open-r1/OpenR1-Math-220k
5
  base_model:
6
  - Qwen/Qwen3-14B
 
 
 
 
7
  tags:
8
  - math
9
  - trimkv
@@ -12,15 +13,18 @@ tags:
12
  - Compression
13
  ---
14
 
15
- > TRIM-KV is an efficient and learnable key–value eviction strategy designed to improve the efficiency of large language models (LLMs) in long-horizon inference.
 
 
16
 
17
  The core idea behind TRIM-KV is to learn the intrinsic importance of each key–value pair at creation time, which we call *token retention*, and then decay this importance exponentially over time to mimic the standard inference running with eviction.
18
 
19
  The retention score is query-agnostic and captures the long-term utility of tokens. This is different from attention scores, which are query-dependent: they capture the short-term utility for predicting the next token and are recomputed at every step, making them local, myopic, and highly dependent on the transient decoding state.
20
 
21
-
22
  <a href="https://arxiv.org/pdf/2512.03324"><img src="https://img.shields.io/badge/arxiv-2512.03324-red?style=for-the-badge"></a>
23
 
 
 
24
 
25
  ### Why TRIM-KV?
26
 
@@ -62,8 +66,6 @@ And it's interpretable
62
  pip install -r requirements.txt
63
  ```
64
 
65
- This is a minimal set of requirements for training purposes. Additional dependencies may be needed for running specific experiments. We provided a full example of the environment used in our experiments in [`examples/env.yaml`](examples/env.yaml).
66
-
67
  ### Installation
68
 
69
  From the root of the repo:
@@ -72,7 +74,7 @@ From the root of the repo:
72
  git clone https://github.com/ngocbh/trimkv.git
73
  cd trimkv
74
  pip install -e .
75
- ````
76
 
77
  ---
78
 
@@ -84,7 +86,7 @@ from trimkv.models.qwen3 import TrimKVQwen3ForCausalLM
84
  from trimkv.cache_utils import TrimKVCache
85
  from transformers import AutoTokenizer
86
 
87
- model_path = "<TrimKV model_path here>"
88
  download_from = "huggingface" # options: "wandb", "local", "huggingface"
89
 
90
  model = TrimKVQwen3ForCausalLM.from_pretrained(
@@ -112,7 +114,7 @@ tokenizer = AutoTokenizer.from_pretrained(
112
  # Note: TRIM-KV uses TrimKVCache under the hood. So please pass TrimKVCache to model.generate
113
  ```
114
 
115
- For a runnable end-to-end example, see [`examples/test_qwen3.py`](examples/test_qwen3.py).
116
 
117
  ## Released Models
118
 
@@ -126,4 +128,13 @@ For a runnable end-to-end example, see [`examples/test_qwen3.py`](examples/test_
126
  | Phi-3-mini-128k-instruct | [TrimKV-Phi-3-mini-128k-instruct](https://huggingface.co/ngocbh/TrimKV-Phi-3-mini-128k-instruct) | LongAlpaca | 128K | 512 |
127
  | DeepSeek-R1-Distill-Llama-8B | [TrimKV-DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/ngocbh/TrimKV-DeepSeek-R1-Distill-Llama-8B) | OpenR1-Math-220k | 32K | 256 |
128
 
129
- ---
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
2
  base_model:
3
  - Qwen/Qwen3-14B
4
+ datasets:
5
+ - open-r1/OpenR1-Math-220k
6
+ license: apache-2.0
7
+ pipeline_tag: text-generation
8
  tags:
9
  - math
10
  - trimkv
 
13
  - Compression
14
  ---
15
 
16
+ # TrimKV: Token Retention for Memory-Bounded Key-Value Eviction
17
+
18
+ TRIM-KV is an efficient and learnable key–value eviction strategy designed to improve the efficiency of large language models (LLMs) in long-horizon inference. It was introduced in the paper [Make Each Token Count: Towards Improving Long-Context Performance with KV Cache Eviction](https://huggingface.co/papers/2605.09649) by Ngoc Bui, Hieu Trung Nguyen, Arman Cohan, and Rex Ying.
19
 
20
  The core idea behind TRIM-KV is to learn the intrinsic importance of each key–value pair at creation time, which we call *token retention*, and then decay this importance exponentially over time to mimic the standard inference running with eviction.
21
 
22
  The retention score is query-agnostic and captures the long-term utility of tokens. This is different from attention scores, which are query-dependent: they capture the short-term utility for predicting the next token and are recomputed at every step, making them local, myopic, and highly dependent on the transient decoding state.
23
 
 
24
  <a href="https://arxiv.org/pdf/2512.03324"><img src="https://img.shields.io/badge/arxiv-2512.03324-red?style=for-the-badge"></a>
25
 
26
+ - **Official Code:** [GitHub - ngocbh/trimkv](https://github.com/ngocbh/trimkv)
27
+ - **Paper:** [https://huggingface.co/papers/2605.09649](https://huggingface.co/papers/2605.09649)
28
 
29
  ### Why TRIM-KV?
30
 
 
66
  pip install -r requirements.txt
67
  ```
68
 
 
 
69
  ### Installation
70
 
71
  From the root of the repo:
 
74
  git clone https://github.com/ngocbh/trimkv.git
75
  cd trimkv
76
  pip install -e .
77
+ ```
78
 
79
  ---
80
 
 
86
  from trimkv.cache_utils import TrimKVCache
87
  from transformers import AutoTokenizer
88
 
89
+ model_path = "ngocbh/TrimKV-Qwen3-14B-Math"
90
  download_from = "huggingface" # options: "wandb", "local", "huggingface"
91
 
92
  model = TrimKVQwen3ForCausalLM.from_pretrained(
 
114
  # Note: TRIM-KV uses TrimKVCache under the hood. So please pass TrimKVCache to model.generate
115
  ```
116
 
117
+ For a runnable end-to-end example, see [`examples/test_qwen3.py`](https://github.com/ngocbh/trimkv/blob/main/examples/test_qwen3.py).
118
 
119
  ## Released Models
120
 
 
128
  | Phi-3-mini-128k-instruct | [TrimKV-Phi-3-mini-128k-instruct](https://huggingface.co/ngocbh/TrimKV-Phi-3-mini-128k-instruct) | LongAlpaca | 128K | 512 |
129
  | DeepSeek-R1-Distill-Llama-8B | [TrimKV-DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/ngocbh/TrimKV-DeepSeek-R1-Distill-Llama-8B) | OpenR1-Math-220k | 32K | 256 |
130
 
131
+ ## Citation
132
+
133
+ ```bibtex
134
+ @article{bui2025make,
135
+ title={Make Each Token Count: Towards Improving Long-Context Performance with KV Cache Eviction},
136
+ author={Bui, Ngoc and Nguyen, Hieu Trung and Cohan, Arman and Ying, Rex},
137
+ journal={arXiv preprint arXiv:2512.03324},
138
+ year={2025}
139
+ }
140
+ ```