ngocbh
/

DBTrimKV-Qwen3-4B-Math

Text Generation

Model card Files Files and versions

Add pipeline tag and link to paper

#1

by nielsr HF Staff - opened 13 days ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

Files changed (1) hide show

README.md +23 -11

README.md CHANGED Viewed

@@ -1,9 +1,10 @@
 ---
-license: apache-2.0
-datasets:
-- open-r1/OpenR1-Math-220k
 base_model:
 - Qwen/Qwen3-4B
 tags:
 - math
 - dbtrimkv
@@ -16,12 +17,16 @@ tags:
 This repository hosts the **DBTrimKV** retention-gate weights for `Qwen/Qwen3-4B` (32768-token training context, M = 128). The base-model weights are not included — they are loaded from `Qwen/Qwen3-4B` at runtime and the retention-gate weights from `trimkv_weights.pth` are overlaid on top.
 <a href="https://arxiv.org/pdf/2512.03324"><img src="https://img.shields.io/badge/arxiv-2512.03324-red?style=for-the-badge"></a>
 For the full list of released checkpoints, training recipes, and benchmark scripts, see the GitHub repository: **https://github.com/ngocbh/trimkv**.
 ## Quick start
 ```python
 import torch
 from trimkv.models.qwen3 import TrimKVQwen3ForCausalLM
@@ -60,14 +65,21 @@ See [`examples/test_qwen3.py`](https://github.com/ngocbh/trimkv/blob/main/exampl
 ## Training details
-- Base model: `Qwen/Qwen3-4B`
-- Variant: **DBTrimKV** (`retention_gate=rg10`)
-- Training dataset: open-r1/OpenR1-Math-220k
-- Training memory size M: `128`
-- Training context length: `32768`
-- Loss: `fwkl_ntp`
-- Attention impl: `rg_attn_flex`
 ## Citation
-For the up-to-date BibTeX entry, see the [GitHub repository](https://github.com/ngocbh/trimkv).

 ---
 base_model:
 - Qwen/Qwen3-4B
+datasets:
+- open-r1/OpenR1-Math-220k
+license: apache-2.0
+pipeline_tag: text-generation
 tags:
 - math
 - dbtrimkv
 This repository hosts the **DBTrimKV** retention-gate weights for `Qwen/Qwen3-4B` (32768-token training context, M = 128). The base-model weights are not included — they are loaded from `Qwen/Qwen3-4B` at runtime and the retention-gate weights from `trimkv_weights.pth` are overlaid on top.
+This model was introduced in the paper [Make Each Token Count: Towards Improving Long-Context Performance with KV Cache Eviction](https://huggingface.co/papers/2605.09649).
 <a href="https://arxiv.org/pdf/2512.03324"><img src="https://img.shields.io/badge/arxiv-2512.03324-red?style=for-the-badge"></a>
 For the full list of released checkpoints, training recipes, and benchmark scripts, see the GitHub repository: **https://github.com/ngocbh/trimkv**.
 ## Quick start
+To use this model, please install the `trimkv` library from the [GitHub repo](https://github.com/ngocbh/trimkv).
 ```python
 import torch
 from trimkv.models.qwen3 import TrimKVQwen3ForCausalLM
 ## Training details
+- **Base model**: `Qwen/Qwen3-4B`
+- **Variant**: **DBTrimKV** (`retention_gate=rg10`)
+- **Training dataset**: `open-r1/OpenR1-Math-220k`
+- **Training memory size M**: `128`
+- **Training context length**: `32768`
+- **Loss**: `fwkl_ntp`
+- **Attention impl**: `rg_attn_flex`
 ## Citation
+```bibtex
+@article{bui2025make,
+  title={Make Each Token Count: Towards Improving Long-Context Performance with KV Cache Eviction},
+  author={Bui, Ngoc and Nguyen, Hieu Trung and Cohan, Arman and Ying, Rex},
+  journal={arXiv preprint arXiv:2512.03324},
+  year={2025}
+}
+```