Update README.md
Browse files
README.md
CHANGED
|
@@ -18,6 +18,7 @@ The core idea behind TRIM-KV is to learn the intrinsic importance of each key–
|
|
| 18 |
|
| 19 |
The retention score is query-agnostic and captures the long-term utility of tokens. This is different from attention scores, which are query-dependent: they capture the short-term utility for predicting the next token and are recomputed at every step, making them local, myopic, and highly dependent on the transient decoding state.
|
| 20 |
|
|
|
|
| 21 |
|
| 22 |
### Why TRIM-KV?
|
| 23 |
|
|
@@ -119,7 +120,8 @@ For a runnable end-to-end example, see [`examples/test_qwen3.py`](examples/test_
|
|
| 119 |
| Qwen3-4B | [TRIM-KV-Qwen3-4B-Math](https://huggingface.co/ngocbh/TrimKV-Qwen3-4B-Math) | OpenR1-Math-220k | 16K | 512 |
|
| 120 |
| Qwen3-8B | [TRIM-KV-Qwen3-8B-Math](https://huggingface.co/ngocbh/TrimKV-Qwen3-8B-Math) | OpenR1-Math-220k | 16K | 512 |
|
| 121 |
| Qwen3-14B | [TRIM-KV-Qwen3-14B-Math](https://huggingface.co/ngocbh/TrimKV-Qwen3-14B-Math) | OpenR1-Math-220k | 16K | 512 |
|
| 122 |
-
| Qwen3-4B-Instruct-2507 | [TrimKV-Qwen3-4B-Instruct-2507](https://huggingface.co/ngocbh/TrimKV-Qwen3-4B-Instruct-2507) | Synth-Long, BookSum, Buddhi
|
| 123 |
| Phi-3-mini-128k-instruct | [TrimKV-Phi-3-mini-128k-instruct](https://huggingface.co/ngocbh/TrimKV-Phi-3-mini-128k-instruct) | LongAlpaca | 128K | 2048 |
|
|
|
|
| 124 |
|
| 125 |
---
|
|
|
|
| 18 |
|
| 19 |
The retention score is query-agnostic and captures the long-term utility of tokens. This is different from attention scores, which are query-dependent: they capture the short-term utility for predicting the next token and are recomputed at every step, making them local, myopic, and highly dependent on the transient decoding state.
|
| 20 |
|
| 21 |
+
<a href="https://arxiv.org/pdf/2512.03324"><img src="https://img.shields.io/badge/arxiv-2512.03324-red?style=for-the-badge"></a>
|
| 22 |
|
| 23 |
### Why TRIM-KV?
|
| 24 |
|
|
|
|
| 120 |
| Qwen3-4B | [TRIM-KV-Qwen3-4B-Math](https://huggingface.co/ngocbh/TrimKV-Qwen3-4B-Math) | OpenR1-Math-220k | 16K | 512 |
|
| 121 |
| Qwen3-8B | [TRIM-KV-Qwen3-8B-Math](https://huggingface.co/ngocbh/TrimKV-Qwen3-8B-Math) | OpenR1-Math-220k | 16K | 512 |
|
| 122 |
| Qwen3-14B | [TRIM-KV-Qwen3-14B-Math](https://huggingface.co/ngocbh/TrimKV-Qwen3-14B-Math) | OpenR1-Math-220k | 16K | 512 |
|
| 123 |
+
| Qwen3-4B-Instruct-2507 | [TrimKV-Qwen3-4B-Instruct-2507](https://huggingface.co/ngocbh/TrimKV-Qwen3-4B-Instruct-2507) | Synth-Long, BookSum, Buddhi | 128K | 4096 |
|
| 124 |
| Phi-3-mini-128k-instruct | [TrimKV-Phi-3-mini-128k-instruct](https://huggingface.co/ngocbh/TrimKV-Phi-3-mini-128k-instruct) | LongAlpaca | 128K | 2048 |
|
| 125 |
+
| DeepSeek-R1-Distill-Llama-8B | [TrimKV-DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/ngocbh/TrimKV-DeepSeek-R1-Distill-Llama-8B) | OpenR1-Math-220k | 32K | 512 |
|
| 126 |
|
| 127 |
---
|