chopratejas commited on
Commit
6485b50
·
verified ·
1 Parent(s): 490f973

Update: v9 results (7.9/10 quality), add kompress-small comparison

Browse files
Files changed (1) hide show
  1. README.md +28 -30
README.md CHANGED
@@ -29,16 +29,16 @@ model-index:
29
  metrics:
30
  - name: Quality Score (Claude-judged)
31
  type: custom
32
- value: 6.9
33
  - name: LLMLingua-2 Quality Score
34
  type: custom
35
- value: 6.2
36
- - name: Latency (median, Apple Silicon)
37
  type: latency
38
- value: 49ms
39
  - name: LLMLingua-2 Latency
40
  type: latency
41
- value: 113ms
42
  ---
43
 
44
  # Kompress: ModernBERT Token Compressor for LLM Context Windows
@@ -47,22 +47,18 @@ model-index:
47
 
48
  ## Results
49
 
50
- | Metric | Kompress | LLMLingua-2 |
51
- |--------|----------|-------------|
52
- | **Quality (Claude-judged, 1-10)** | **6.9** | 6.2 |
53
- | **Latency (median)** | **49ms** | 113ms |
54
- | Speed | **2.3x faster** | baseline |
55
- | Model size | 600MB (150M params) | 710MB (179M params) |
56
- | Max sequence length | **8,192 tokens** | 512 tokens |
57
- | Architecture | ModernBERT-base (2024) | mBERT (2018) |
58
 
59
- ### Quality on Real Agent Data
60
 
61
- | Eval Set | Kompress | LLMLingua-2 |
62
- |----------|----------|-------------|
63
- | Unstructured NL text | **6.9/10** | 6.2/10 |
64
- | Claude Code sessions (text) | **6.7/10** | 5.2/10 |
65
- | Claude Code sessions (raw) | **6.6/10** | 5.2/10 |
66
 
67
  Quality scores are judged by Claude Sonnet 4.6: "Can an LLM fully understand and act on the compressed version?" (1-10 scale).
68
 
@@ -157,17 +153,19 @@ The fix: prompt Claude to "select words like highlighting with a marker" rather
157
  - HuggingFace Trainer with warmup + cosine schedule
158
  - ~3 hours training time
159
 
160
- ## Comparison with LLMLingua-2
161
-
162
- | | Kompress | LLMLingua-2 |
163
- |---|---|---|
164
- | Architecture | ModernBERT (2024) | mBERT (2018) |
165
- | Max context | 8,192 tokens | 512 tokens |
166
- | Training data | 215K from 8 datasets | 41K from MeetingBank |
167
- | Labeling model | Claude Sonnet 4.6 | GPT-4 |
168
- | Compression style | Content-adaptive | Fixed ratio |
169
- | Quality | **6.9/10** | 6.2/10 |
170
- | Latency | **49ms** | 113ms |
 
 
171
 
172
  ## Limitations
173
 
 
29
  metrics:
30
  - name: Quality Score (Claude-judged)
31
  type: custom
32
+ value: 7.9
33
  - name: LLMLingua-2 Quality Score
34
  type: custom
35
+ value: 5.9
36
+ - name: Latency (median, Apple Silicon MPS)
37
  type: latency
38
+ value: 84ms
39
  - name: LLMLingua-2 Latency
40
  type: latency
41
+ value: 117ms
42
  ---
43
 
44
  # Kompress: ModernBERT Token Compressor for LLM Context Windows
 
47
 
48
  ## Results
49
 
50
+ | Model | Quality | Latency | Size | Params |
51
+ |-------|---------|---------|------|--------|
52
+ | **kompress-base** | **7.9/10** | **84ms** (MPS) | 600MB | 150M |
53
+ | [kompress-small](https://huggingface.co/chopratejas/kompress-small) | 7.4/10 | **13-29ms** (ONNX) | 279MB | 70M |
54
+ | LLMLingua-2 | 5.9/10 | 117ms | 710MB | 179M |
 
 
 
55
 
56
+ ### Quality on Real Agent Data (Claude-judged)
57
 
58
+ | Eval Set | kompress-base | kompress-small | LLMLingua-2 |
59
+ |----------|--------------|----------------|-------------|
60
+ | Unstructured NL text | **7.9/10** | 7.4/10 | 5.9/10 |
61
+ | Claude Code sessions | **7.3/10** | **7.4/10** | 6.2/10 |
 
62
 
63
  Quality scores are judged by Claude Sonnet 4.6: "Can an LLM fully understand and act on the compressed version?" (1-10 scale).
64
 
 
153
  - HuggingFace Trainer with warmup + cosine schedule
154
  - ~3 hours training time
155
 
156
+ ## Model Family
157
+
158
+ | | kompress-base | [kompress-small](https://huggingface.co/chopratejas/kompress-small) | LLMLingua-2 |
159
+ |---|---|---|---|
160
+ | Architecture | ModernBERT 22-layer | ModernBERT 6-layer (distilled) | mBERT (2018) |
161
+ | Params | 150M | 70M | 179M |
162
+ | Size | 600MB | 279MB (ONNX: 275MB) | 710MB |
163
+ | Max context | 8,192 tokens | 8,192 tokens | 512 tokens |
164
+ | Quality | **7.9/10** | 7.4/10 | 5.9/10 |
165
+ | Latency | 84ms (MPS) | **13-29ms (ONNX)** | 117ms |
166
+ | Training data | 215K from 8 datasets | Distilled from base | 41K from MeetingBank |
167
+ | Labeling model | Claude Sonnet 4.6 | — | GPT-4 |
168
+ | Compression | Content-adaptive | Content-adaptive | Fixed ratio |
169
 
170
  ## Limitations
171