Add pipeline tag and link to paper

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +23 -11
README.md CHANGED
@@ -1,9 +1,10 @@
1
  ---
2
- license: apache-2.0
3
- datasets:
4
- - open-r1/OpenR1-Math-220k
5
  base_model:
6
  - Qwen/Qwen3-4B
 
 
 
 
7
  tags:
8
  - math
9
  - dbtrimkv
@@ -16,12 +17,16 @@ tags:
16
 
17
  This repository hosts the **DBTrimKV** retention-gate weights for `Qwen/Qwen3-4B` (32768-token training context, M = 128). The base-model weights are not included — they are loaded from `Qwen/Qwen3-4B` at runtime and the retention-gate weights from `trimkv_weights.pth` are overlaid on top.
18
 
 
 
19
  <a href="https://arxiv.org/pdf/2512.03324"><img src="https://img.shields.io/badge/arxiv-2512.03324-red?style=for-the-badge"></a>
20
 
21
  For the full list of released checkpoints, training recipes, and benchmark scripts, see the GitHub repository: **https://github.com/ngocbh/trimkv**.
22
 
23
  ## Quick start
24
 
 
 
25
  ```python
26
  import torch
27
  from trimkv.models.qwen3 import TrimKVQwen3ForCausalLM
@@ -60,14 +65,21 @@ See [`examples/test_qwen3.py`](https://github.com/ngocbh/trimkv/blob/main/exampl
60
 
61
  ## Training details
62
 
63
- - Base model: `Qwen/Qwen3-4B`
64
- - Variant: **DBTrimKV** (`retention_gate=rg10`)
65
- - Training dataset: open-r1/OpenR1-Math-220k
66
- - Training memory size M: `128`
67
- - Training context length: `32768`
68
- - Loss: `fwkl_ntp`
69
- - Attention impl: `rg_attn_flex`
70
 
71
  ## Citation
72
 
73
- For the up-to-date BibTeX entry, see the [GitHub repository](https://github.com/ngocbh/trimkv).
 
 
 
 
 
 
 
 
1
  ---
 
 
 
2
  base_model:
3
  - Qwen/Qwen3-4B
4
+ datasets:
5
+ - open-r1/OpenR1-Math-220k
6
+ license: apache-2.0
7
+ pipeline_tag: text-generation
8
  tags:
9
  - math
10
  - dbtrimkv
 
17
 
18
  This repository hosts the **DBTrimKV** retention-gate weights for `Qwen/Qwen3-4B` (32768-token training context, M = 128). The base-model weights are not included — they are loaded from `Qwen/Qwen3-4B` at runtime and the retention-gate weights from `trimkv_weights.pth` are overlaid on top.
19
 
20
+ This model was introduced in the paper [Make Each Token Count: Towards Improving Long-Context Performance with KV Cache Eviction](https://huggingface.co/papers/2605.09649).
21
+
22
  <a href="https://arxiv.org/pdf/2512.03324"><img src="https://img.shields.io/badge/arxiv-2512.03324-red?style=for-the-badge"></a>
23
 
24
  For the full list of released checkpoints, training recipes, and benchmark scripts, see the GitHub repository: **https://github.com/ngocbh/trimkv**.
25
 
26
  ## Quick start
27
 
28
+ To use this model, please install the `trimkv` library from the [GitHub repo](https://github.com/ngocbh/trimkv).
29
+
30
  ```python
31
  import torch
32
  from trimkv.models.qwen3 import TrimKVQwen3ForCausalLM
 
65
 
66
  ## Training details
67
 
68
+ - **Base model**: `Qwen/Qwen3-4B`
69
+ - **Variant**: **DBTrimKV** (`retention_gate=rg10`)
70
+ - **Training dataset**: `open-r1/OpenR1-Math-220k`
71
+ - **Training memory size M**: `128`
72
+ - **Training context length**: `32768`
73
+ - **Loss**: `fwkl_ntp`
74
+ - **Attention impl**: `rg_attn_flex`
75
 
76
  ## Citation
77
 
78
+ ```bibtex
79
+ @article{bui2025make,
80
+ title={Make Each Token Count: Towards Improving Long-Context Performance with KV Cache Eviction},
81
+ author={Bui, Ngoc and Nguyen, Hieu Trung and Cohan, Arman and Ying, Rex},
82
+ journal={arXiv preprint arXiv:2512.03324},
83
+ year={2025}
84
+ }
85
+ ```