Add pipeline tag and paper link to model card

Hi! I'm Niels from the community science team at Hugging Face.

I've opened this PR to improve the model card for this repository. Specifically, I've:
- Added the `text-generation` pipeline tag to the metadata to improve discoverability on the Hub.
- Added a link to the original research paper: [Make Each Token Count: Towards Improving Long-Context Performance with KV Cache Eviction](https://huggingface.co/papers/2605.09649).
- Formatted the authors' information and linked the official GitHub repository.

This helps users understand the context and technical foundations of the model.

Files changed (1) hide show

README.md +22 -11

README.md CHANGED Viewed

@@ -1,9 +1,10 @@
 ---
-license: apache-2.0
-datasets:
-- open-r1/OpenR1-Math-220k
 base_model:
 - Qwen/Qwen3-14B
 tags:
 - math
 - trimkv
@@ -12,15 +13,18 @@ tags:
 - Compression
 ---
-> TRIM-KV is an efficient and learnable key–value eviction strategy designed to improve the efficiency of large language models (LLMs) in long-horizon inference.
 The core idea behind TRIM-KV is to learn the intrinsic importance of each key–value pair at creation time, which we call *token retention*, and then decay this importance exponentially over time to mimic the standard inference running with eviction.
 The retention score is query-agnostic and captures the long-term utility of tokens. This is different from attention scores, which are query-dependent: they capture the short-term utility for predicting the next token and are recomputed at every step, making them local, myopic, and highly dependent on the transient decoding state.
 <a href="https://arxiv.org/pdf/2512.03324"><img src="https://img.shields.io/badge/arxiv-2512.03324-red?style=for-the-badge"></a>
 ### Why TRIM-KV?
@@ -62,8 +66,6 @@ And it's interpretable
 pip install -r requirements.txt
 ```
-This is a minimal set of requirements for training purposes. Additional dependencies may be needed for running specific experiments. We provided a full example of the environment used in our experiments in [`examples/env.yaml`](examples/env.yaml).
 ### Installation
 From the root of the repo:
@@ -72,7 +74,7 @@ From the root of the repo:
 git clone https://github.com/ngocbh/trimkv.git
 cd trimkv
 pip install -e .
-````
 ---
@@ -84,7 +86,7 @@ from trimkv.models.qwen3 import TrimKVQwen3ForCausalLM
 from trimkv.cache_utils import TrimKVCache
 from transformers import AutoTokenizer
-model_path = "<TrimKV model_path here>"
 download_from = "huggingface"  # options: "wandb", "local", "huggingface"
 model = TrimKVQwen3ForCausalLM.from_pretrained(
@@ -112,7 +114,7 @@ tokenizer = AutoTokenizer.from_pretrained(
 # Note: TRIM-KV uses TrimKVCache under the hood. So please pass TrimKVCache to model.generate
 ```
-For a runnable end-to-end example, see [`examples/test_qwen3.py`](examples/test_qwen3.py).
 ## Released Models
@@ -126,4 +128,13 @@ For a runnable end-to-end example, see [`examples/test_qwen3.py`](examples/test_
 | Phi-3-mini-128k-instruct | [TrimKV-Phi-3-mini-128k-instruct](https://huggingface.co/ngocbh/TrimKV-Phi-3-mini-128k-instruct) | LongAlpaca          |  128K  | 512 |
 | DeepSeek-R1-Distill-Llama-8B                    | [TrimKV-DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/ngocbh/TrimKV-DeepSeek-R1-Distill-Llama-8B)           |  OpenR1-Math-220k         | 32K   | 256     |
----

 ---
 base_model:
 - Qwen/Qwen3-14B
+datasets:
+- open-r1/OpenR1-Math-220k
+license: apache-2.0
+pipeline_tag: text-generation
 tags:
 - math
 - trimkv
 - Compression
 ---
+# TrimKV: Token Retention for Memory-Bounded Key-Value Eviction
+TRIM-KV is an efficient and learnable key–value eviction strategy designed to improve the efficiency of large language models (LLMs) in long-horizon inference. It was introduced in the paper [Make Each Token Count: Towards Improving Long-Context Performance with KV Cache Eviction](https://huggingface.co/papers/2605.09649) by Ngoc Bui, Hieu Trung Nguyen, Arman Cohan, and Rex Ying.
 The core idea behind TRIM-KV is to learn the intrinsic importance of each key–value pair at creation time, which we call *token retention*, and then decay this importance exponentially over time to mimic the standard inference running with eviction.
 The retention score is query-agnostic and captures the long-term utility of tokens. This is different from attention scores, which are query-dependent: they capture the short-term utility for predicting the next token and are recomputed at every step, making them local, myopic, and highly dependent on the transient decoding state.
 <a href="https://arxiv.org/pdf/2512.03324"><img src="https://img.shields.io/badge/arxiv-2512.03324-red?style=for-the-badge"></a>
+- **Official Code:** [GitHub - ngocbh/trimkv](https://github.com/ngocbh/trimkv)
+- **Paper:** [https://huggingface.co/papers/2605.09649](https://huggingface.co/papers/2605.09649)
 ### Why TRIM-KV?
 pip install -r requirements.txt
 ```
 ### Installation
 From the root of the repo:
 git clone https://github.com/ngocbh/trimkv.git
 cd trimkv
 pip install -e .
+```
 ---
 from trimkv.cache_utils import TrimKVCache
 from transformers import AutoTokenizer
+model_path = "ngocbh/TrimKV-Qwen3-14B-Math"
 download_from = "huggingface"  # options: "wandb", "local", "huggingface"
 model = TrimKVQwen3ForCausalLM.from_pretrained(
 # Note: TRIM-KV uses TrimKVCache under the hood. So please pass TrimKVCache to model.generate
 ```
+For a runnable end-to-end example, see [`examples/test_qwen3.py`](https://github.com/ngocbh/trimkv/blob/main/examples/test_qwen3.py).
 ## Released Models
 | Phi-3-mini-128k-instruct | [TrimKV-Phi-3-mini-128k-instruct](https://huggingface.co/ngocbh/TrimKV-Phi-3-mini-128k-instruct) | LongAlpaca          |  128K  | 512 |
 | DeepSeek-R1-Distill-Llama-8B                    | [TrimKV-DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/ngocbh/TrimKV-DeepSeek-R1-Distill-Llama-8B)           |  OpenR1-Math-220k         | 32K   | 256     |
+## Citation
+```bibtex
+@article{bui2025make,
+  title={Make Each Token Count: Towards Improving Long-Context Performance with KV Cache Eviction},
+  author={Bui, Ngoc and Nguyen, Hieu Trung and Cohan, Arman and Ying, Rex},
+  journal={arXiv preprint arXiv:2512.03324},
+  year={2025}
+}
+```