Update: v9 results (7.9/10 quality), add kompress-small comparison
Browse files
README.md
CHANGED
|
@@ -29,16 +29,16 @@ model-index:
|
|
| 29 |
metrics:
|
| 30 |
- name: Quality Score (Claude-judged)
|
| 31 |
type: custom
|
| 32 |
-
value:
|
| 33 |
- name: LLMLingua-2 Quality Score
|
| 34 |
type: custom
|
| 35 |
-
value:
|
| 36 |
-
- name: Latency (median, Apple Silicon)
|
| 37 |
type: latency
|
| 38 |
-
value:
|
| 39 |
- name: LLMLingua-2 Latency
|
| 40 |
type: latency
|
| 41 |
-
value:
|
| 42 |
---
|
| 43 |
|
| 44 |
# Kompress: ModernBERT Token Compressor for LLM Context Windows
|
|
@@ -47,22 +47,18 @@ model-index:
|
|
| 47 |
|
| 48 |
## Results
|
| 49 |
|
| 50 |
-
|
|
| 51 |
-
|-------
|
| 52 |
-
| **
|
| 53 |
-
|
|
| 54 |
-
|
|
| 55 |
-
| Model size | 600MB (150M params) | 710MB (179M params) |
|
| 56 |
-
| Max sequence length | **8,192 tokens** | 512 tokens |
|
| 57 |
-
| Architecture | ModernBERT-base (2024) | mBERT (2018) |
|
| 58 |
|
| 59 |
-
### Quality on Real Agent Data
|
| 60 |
|
| 61 |
-
| Eval Set |
|
| 62 |
-
|----------|----------|-------------|
|
| 63 |
-
| Unstructured NL text | **
|
| 64 |
-
| Claude Code sessions
|
| 65 |
-
| Claude Code sessions (raw) | **6.6/10** | 5.2/10 |
|
| 66 |
|
| 67 |
Quality scores are judged by Claude Sonnet 4.6: "Can an LLM fully understand and act on the compressed version?" (1-10 scale).
|
| 68 |
|
|
@@ -157,17 +153,19 @@ The fix: prompt Claude to "select words like highlighting with a marker" rather
|
|
| 157 |
- HuggingFace Trainer with warmup + cosine schedule
|
| 158 |
- ~3 hours training time
|
| 159 |
|
| 160 |
-
##
|
| 161 |
-
|
| 162 |
-
| |
|
| 163 |
-
|---|---|---|
|
| 164 |
-
| Architecture | ModernBERT (
|
| 165 |
-
|
|
| 166 |
-
|
|
| 167 |
-
|
|
| 168 |
-
|
|
| 169 |
-
|
|
| 170 |
-
|
|
|
|
|
|
|
|
| 171 |
|
| 172 |
## Limitations
|
| 173 |
|
|
|
|
| 29 |
metrics:
|
| 30 |
- name: Quality Score (Claude-judged)
|
| 31 |
type: custom
|
| 32 |
+
value: 7.9
|
| 33 |
- name: LLMLingua-2 Quality Score
|
| 34 |
type: custom
|
| 35 |
+
value: 5.9
|
| 36 |
+
- name: Latency (median, Apple Silicon MPS)
|
| 37 |
type: latency
|
| 38 |
+
value: 84ms
|
| 39 |
- name: LLMLingua-2 Latency
|
| 40 |
type: latency
|
| 41 |
+
value: 117ms
|
| 42 |
---
|
| 43 |
|
| 44 |
# Kompress: ModernBERT Token Compressor for LLM Context Windows
|
|
|
|
| 47 |
|
| 48 |
## Results
|
| 49 |
|
| 50 |
+
| Model | Quality | Latency | Size | Params |
|
| 51 |
+
|-------|---------|---------|------|--------|
|
| 52 |
+
| **kompress-base** | **7.9/10** | **84ms** (MPS) | 600MB | 150M |
|
| 53 |
+
| [kompress-small](https://huggingface.co/chopratejas/kompress-small) | 7.4/10 | **13-29ms** (ONNX) | 279MB | 70M |
|
| 54 |
+
| LLMLingua-2 | 5.9/10 | 117ms | 710MB | 179M |
|
|
|
|
|
|
|
|
|
|
| 55 |
|
| 56 |
+
### Quality on Real Agent Data (Claude-judged)
|
| 57 |
|
| 58 |
+
| Eval Set | kompress-base | kompress-small | LLMLingua-2 |
|
| 59 |
+
|----------|--------------|----------------|-------------|
|
| 60 |
+
| Unstructured NL text | **7.9/10** | 7.4/10 | 5.9/10 |
|
| 61 |
+
| Claude Code sessions | **7.3/10** | **7.4/10** | 6.2/10 |
|
|
|
|
| 62 |
|
| 63 |
Quality scores are judged by Claude Sonnet 4.6: "Can an LLM fully understand and act on the compressed version?" (1-10 scale).
|
| 64 |
|
|
|
|
| 153 |
- HuggingFace Trainer with warmup + cosine schedule
|
| 154 |
- ~3 hours training time
|
| 155 |
|
| 156 |
+
## Model Family
|
| 157 |
+
|
| 158 |
+
| | kompress-base | [kompress-small](https://huggingface.co/chopratejas/kompress-small) | LLMLingua-2 |
|
| 159 |
+
|---|---|---|---|
|
| 160 |
+
| Architecture | ModernBERT 22-layer | ModernBERT 6-layer (distilled) | mBERT (2018) |
|
| 161 |
+
| Params | 150M | 70M | 179M |
|
| 162 |
+
| Size | 600MB | 279MB (ONNX: 275MB) | 710MB |
|
| 163 |
+
| Max context | 8,192 tokens | 8,192 tokens | 512 tokens |
|
| 164 |
+
| Quality | **7.9/10** | 7.4/10 | 5.9/10 |
|
| 165 |
+
| Latency | 84ms (MPS) | **13-29ms (ONNX)** | 117ms |
|
| 166 |
+
| Training data | 215K from 8 datasets | Distilled from base | 41K from MeetingBank |
|
| 167 |
+
| Labeling model | Claude Sonnet 4.6 | — | GPT-4 |
|
| 168 |
+
| Compression | Content-adaptive | Content-adaptive | Fixed ratio |
|
| 169 |
|
| 170 |
## Limitations
|
| 171 |
|