Text Generation
qwen3
math
trimkv
KV
Cache
Compression
ngocbh commited on
Commit
abd260e
·
verified ·
1 Parent(s): 61a2685

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -31
README.md CHANGED
@@ -17,37 +17,15 @@ tags:
17
 
18
  > **TRIM-KV** is an efficient and learnable key–value eviction strategy designed to improve the efficiency of large language models (LLMs) in long-horizon inference.
19
 
20
- This model is a Qwen3-4B variant fine-tuned with TRIM-KV on the `OpenR1-Math-220k` dataset. It is based on the research paper [Make Each Token Count: Towards Improving Long-Context Performance with KV Cache Eviction](https://huggingface.co/papers/2605.09649).
21
 
22
- The core idea behind TRIM-KV is to learn the intrinsic importance of each key–value pair at creation time, which we call *token retention*, and then decay this importance exponentially over time to mimic the standard inference running with eviction.
23
 
24
- The retention score is query-agnostic and captures the long-term utility of tokens. This is different from attention scores, which are query-dependent: they capture the short-term utility for predicting the next token and are recomputed at every step, making them local, myopic, and highly dependent on the transient decoding state.
25
 
26
- - **Paper:** [Make Each Token Count: Towards Improving Long-Context Performance with KV Cache Eviction](https://huggingface.co/papers/2605.09649)
27
- - **Code:** [Official GitHub Repository](https://github.com/ngocbh/trimkv)
28
-
29
- ### Why TRIM-KV?
30
-
31
- It's fast
32
-
33
- <div align="center">
34
- <img width="1000" alt="teaser" src="https://github.com/ngocbh/trimkv/blob/main/assets/speed.png?raw=true"/>
35
- </div>
36
-
37
- It's smart
38
-
39
- <div align="center">
40
- <img width="1000" alt="teaser" src="https://github.com/ngocbh/trimkv/blob/main/assets/performance.png?raw=true"/>
41
- </div>
42
-
43
-
44
- And it's interpretable
45
-
46
- <div align="center">
47
- <img width="1000" alt="teaser" src="https://github.com/ngocbh/trimkv/blob/main/assets/eviction.png?raw=true"/>
48
- </div>
49
-
50
- ---
51
 
52
  ## Getting Started
53
 
@@ -56,9 +34,7 @@ And it's interpretable
56
  To use this model, you need to install the `trimkv` library from the [official repository](https://github.com/ngocbh/trimkv):
57
 
58
  ```sh
59
- git clone https://github.com/ngocbh/trimkv.git
60
- cd trimkv
61
- pip install -e .
62
  ```
63
 
64
  ### Quick Start
@@ -102,6 +78,12 @@ For a runnable end-to-end example, see [`examples/test_qwen3.py`](https://github
102
  ## Citation
103
 
104
  ```bibtex
 
 
 
 
 
 
105
  @article{bui2025make,
106
  title={Make Each Token Count: Towards Improving Long-Context Performance with KV Cache Eviction},
107
  author={Bui, Ngoc and Nguyen, Hieu Trung and Cohan, Arman and Ying, Rex},
 
17
 
18
  > **TRIM-KV** is an efficient and learnable key–value eviction strategy designed to improve the efficiency of large language models (LLMs) in long-horizon inference.
19
 
20
+ This model is a Qwen3-4B variant fine-tuned with TRIM-KV on the `OpenR1-Math-220k` dataset.
21
 
22
+ The core idea behind TRIM-KV is to learn the intrinsic importance of each key–value pair at creation time, which we call *token retention*, and then decay this importance exponentially over time to mimic standard inference running with eviction.
23
 
24
+ The retention score is query-agnostic and captures the long-term utility of tokens. This is different from attention scores, which are query-dependent: they capture the short-term utility for predicting the next token and are recomputed at every step.
25
 
26
+ - **Paper:** [Cache What Lasts: Token Retention for Memory-Bounded KV Cache in LLMs](https://huggingface.co/papers/2605.09649)
27
+ - **Code:** [GitHub - ngocbh/trimkv](https://github.com/ngocbh/trimkv)
28
+ - **Arxiv:** [2512.03324](https://arxiv.org/abs/2512.03324)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
  ## Getting Started
31
 
 
34
  To use this model, you need to install the `trimkv` library from the [official repository](https://github.com/ngocbh/trimkv):
35
 
36
  ```sh
37
+ pip install trimkv
 
 
38
  ```
39
 
40
  ### Quick Start
 
78
  ## Citation
79
 
80
  ```bibtex
81
+ @article{bui2025cache,
82
+ title={Cache what lasts: Token retention for memory-bounded kv cache in llms},
83
+ author={Bui, Ngoc and Sharma, Shubham and Lamba, Simran and Mishra, Saumitra and Ying, Rex},
84
+ journal={arXiv preprint arXiv:2512.03324},
85
+ year={2025}
86
+ }
87
  @article{bui2025make,
88
  title={Make Each Token Count: Towards Improving Long-Context Performance with KV Cache Eviction},
89
  author={Bui, Ngoc and Nguyen, Hieu Trung and Cohan, Arman and Ying, Rex},