chopratejas
/

kompress-base

@@ -29,16 +29,16 @@ model-index:
     metrics:
     - name: Quality Score (Claude-judged)
       type: custom
-      value: 6.9
     - name: LLMLingua-2 Quality Score
       type: custom
-      value: 6.2
-    - name: Latency (median, Apple Silicon)
       type: latency
-      value: 49ms
     - name: LLMLingua-2 Latency
       type: latency
-      value: 113ms
 ---
 # Kompress: ModernBERT Token Compressor for LLM Context Windows
@@ -47,22 +47,18 @@ model-index:
 ## Results
-| Metric | Kompress | LLMLingua-2 |
-|--------|----------|-------------|
-| **Quality (Claude-judged, 1-10)** | **6.9** | 6.2 |
-| **Latency (median)** | **49ms** | 113ms |
-| Speed | **2.3x faster** | baseline |
-| Model size | 600MB (150M params) | 710MB (179M params) |
-| Max sequence length | **8,192 tokens** | 512 tokens |
-| Architecture | ModernBERT-base (2024) | mBERT (2018) |
-### Quality on Real Agent Data
-| Eval Set | Kompress | LLMLingua-2 |
-|----------|----------|-------------|
-| Unstructured NL text | **6.9/10** | 6.2/10 |
-| Claude Code sessions (text) | **6.7/10** | 5.2/10 |
-| Claude Code sessions (raw) | **6.6/10** | 5.2/10 |
 Quality scores are judged by Claude Sonnet 4.6: "Can an LLM fully understand and act on the compressed version?" (1-10 scale).
@@ -157,17 +153,19 @@ The fix: prompt Claude to "select words like highlighting with a marker" rather
 - HuggingFace Trainer with warmup + cosine schedule
 - ~3 hours training time
-## Comparison with LLMLingua-2
-| | Kompress | LLMLingua-2 |
-|---|---|---|
-| Architecture | ModernBERT (2024) | mBERT (2018) |
-| Max context | 8,192 tokens | 512 tokens |
-| Training data | 215K from 8 datasets | 41K from MeetingBank |
-| Labeling model | Claude Sonnet 4.6 | GPT-4 |
-| Compression style | Content-adaptive | Fixed ratio |
-| Quality | **6.9/10** | 6.2/10 |
-| Latency | **49ms** | 113ms |
 ## Limitations

     metrics:
     - name: Quality Score (Claude-judged)
       type: custom
+      value: 7.9
     - name: LLMLingua-2 Quality Score
       type: custom
+      value: 5.9
+    - name: Latency (median, Apple Silicon MPS)
       type: latency
+      value: 84ms
     - name: LLMLingua-2 Latency
       type: latency
+      value: 117ms
 ---
 # Kompress: ModernBERT Token Compressor for LLM Context Windows
 ## Results
+| Model | Quality | Latency | Size | Params |
+|-------|---------|---------|------|--------|
+| **kompress-base** | **7.9/10** | **84ms** (MPS) | 600MB | 150M |
+| [kompress-small](https://huggingface.co/chopratejas/kompress-small) | 7.4/10 | **13-29ms** (ONNX) | 279MB | 70M |
+| LLMLingua-2 | 5.9/10 | 117ms | 710MB | 179M |
+### Quality on Real Agent Data (Claude-judged)
+| Eval Set | kompress-base | kompress-small | LLMLingua-2 |
+|----------|--------------|----------------|-------------|
+| Unstructured NL text | **7.9/10** | 7.4/10 | 5.9/10 |
+| Claude Code sessions | **7.3/10** | **7.4/10** | 6.2/10 |
 Quality scores are judged by Claude Sonnet 4.6: "Can an LLM fully understand and act on the compressed version?" (1-10 scale).
 - HuggingFace Trainer with warmup + cosine schedule
 - ~3 hours training time
+## Model Family
+| | kompress-base | [kompress-small](https://huggingface.co/chopratejas/kompress-small) | LLMLingua-2 |
+|---|---|---|---|
+| Architecture | ModernBERT 22-layer | ModernBERT 6-layer (distilled) | mBERT (2018) |
+| Params | 150M | 70M | 179M |
+| Size | 600MB | 279MB (ONNX: 275MB) | 710MB |
+| Max context | 8,192 tokens | 8,192 tokens | 512 tokens |
+| Quality | **7.9/10** | 7.4/10 | 5.9/10 |
+| Latency | 84ms (MPS) | **13-29ms (ONNX)** | 117ms |
+| Training data | 215K from 8 datasets | Distilled from base | 41K from MeetingBank |
+| Labeling model | Claude Sonnet 4.6 | — | GPT-4 |
+| Compression | Content-adaptive | Content-adaptive | Fixed ratio |
 ## Limitations