Update README.md
Browse files
README.md
CHANGED
|
@@ -118,11 +118,12 @@ For a runnable end-to-end example, see [`examples/test_qwen3.py`](examples/test_
|
|
| 118 |
|
| 119 |
| Base Model | TRIM-KV Checkpoints | Training Datasets | Training Context Len | Training $M$ |
|
| 120 |
|------------------------------|-----------------------------------------------|--------------------------|-------------------------|--------------|
|
| 121 |
-
| Qwen3-1.7B | [TRIM-KV-Qwen3-1.7B-Math](https://huggingface.co/ngocbh/TrimKV-Qwen3-1.7B-Math) | OpenR1-Math-220k | 16K |
|
| 122 |
-
| Qwen3-4B | [TRIM-KV-Qwen3-4B-Math](https://huggingface.co/ngocbh/TrimKV-Qwen3-4B-Math) | OpenR1-Math-220k | 16K |
|
| 123 |
-
| Qwen3-8B | [TRIM-KV-Qwen3-8B-Math](https://huggingface.co/ngocbh/TrimKV-Qwen3-8B-Math) | OpenR1-Math-220k | 16K |
|
| 124 |
-
| Qwen3-14B | [TRIM-KV-Qwen3-14B-Math](https://huggingface.co/ngocbh/TrimKV-Qwen3-14B-Math) | OpenR1-Math-220k | 16K |
|
| 125 |
-
| Qwen3-4B-Instruct-2507 | [TrimKV-Qwen3-4B-Instruct-2507](https://huggingface.co/ngocbh/TrimKV-Qwen3-4B-Instruct-2507) | Synth-Long, BookSum, Buddhi
|
| 126 |
-
| Phi-3-mini-128k-instruct | [TrimKV-Phi-3-mini-128k-instruct](https://huggingface.co/ngocbh/TrimKV-Phi-3-mini-128k-instruct) | LongAlpaca | 128K |
|
|
|
|
| 127 |
|
| 128 |
---
|
|
|
|
| 118 |
|
| 119 |
| Base Model | TRIM-KV Checkpoints | Training Datasets | Training Context Len | Training $M$ |
|
| 120 |
|------------------------------|-----------------------------------------------|--------------------------|-------------------------|--------------|
|
| 121 |
+
| Qwen3-1.7B | [TRIM-KV-Qwen3-1.7B-Math](https://huggingface.co/ngocbh/TrimKV-Qwen3-1.7B-Math) | OpenR1-Math-220k | 16K | 256 |
|
| 122 |
+
| Qwen3-4B | [TRIM-KV-Qwen3-4B-Math](https://huggingface.co/ngocbh/TrimKV-Qwen3-4B-Math) | OpenR1-Math-220k | 16K | 256 |
|
| 123 |
+
| Qwen3-8B | [TRIM-KV-Qwen3-8B-Math](https://huggingface.co/ngocbh/TrimKV-Qwen3-8B-Math) | OpenR1-Math-220k | 16K | 256 |
|
| 124 |
+
| Qwen3-14B | [TRIM-KV-Qwen3-14B-Math](https://huggingface.co/ngocbh/TrimKV-Qwen3-14B-Math) | OpenR1-Math-220k | 16K | 256 |
|
| 125 |
+
| Qwen3-4B-Instruct-2507 | [TrimKV-Qwen3-4B-Instruct-2507](https://huggingface.co/ngocbh/TrimKV-Qwen3-4B-Instruct-2507) | Synth-Long, BookSum, Buddhi | 128K | 1024 |
|
| 126 |
+
| Phi-3-mini-128k-instruct | [TrimKV-Phi-3-mini-128k-instruct](https://huggingface.co/ngocbh/TrimKV-Phi-3-mini-128k-instruct) | LongAlpaca | 128K | 512 |
|
| 127 |
+
| DeepSeek-R1-Distill-Llama-8B | [TrimKV-DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/ngocbh/TrimKV-DeepSeek-R1-Distill-Llama-8B) | OpenR1-Math-220k | 32K | 256 |
|
| 128 |
|
| 129 |
---
|