v3: trained on 330K structured tool outputs (H100) — JSON, diffs, logs, code, SQL, agentic traces
b7a65fe verified metadata
license: apache-2.0
language:
- en
library_name: transformers
tags:
- token-compression
- prompt-compression
- context-compression
- agentic
- modernbert
- llmlingua
- headroom
- tool-outputs
- structured-data
pipeline_tag: token-classification
base_model: answerdotai/ModernBERT-base
datasets:
- SWE-bench/SWE-smith-trajectories
- glaiveai/glaive-function-calling-v2
- nebius/SWE-agent-trajectories
- Agent-Ark/Toucan-1.5M
- tuandunghcmut/toolbench-v1
- JetBrains-Research/diff-xyz
- code_search_net
- b-mc2/sql-create-context
model-index:
- name: kompress-base
results:
- task:
type: token-classification
name: Token Compression
metrics:
- type: f1
value: 0.9956
name: F1
- type: accuracy
value: 0.9926
name: Accuracy
Kompress: Token Compression for Structured Tool Outputs & Agentic Contexts
Kompress is a ModernBERT-based token compressor trained on 330K examples of structured tool outputs — JSON API responses, git diffs, error logs, source code, CLI output, database results, and agentic conversation traces. It is a drop-in replacement for LLMLingua-2.
Key Results
On Agentic / Structured Data (our target domain)
| Metric | Kompress | LLMLingua-2 |
|---|---|---|
| Entity Preservation | 82.1% | 36.0% |
| Compression Ratio | 48.1% | 206.0% (expands!) |
| Model Size | 600 MB | 1,400 MB |
| Context Window | 8,192 | 512 |
| Parameters | 149M | 355M |
On LLMLingua-2's Benchmarks
| Dataset | Kompress | LLMLingua-2 | Note |
|---|---|---|---|
| MeetingBank | 46.3% | 57.4% | LLMLingua's training domain |
| GSM8K | 97.8% | 98.9% | Both excellent; LLMLingua keeps 88% vs Kompress 50% |
Cross-Agent Generalization (Cursor IDE — never seen in training)
| Metric | Kompress | LLMLingua-2 |
|---|---|---|
| Entity Preservation | 91.1% | 13.5% |
| Compression Ratio | 49.9% | 85.8% |
Why Kompress?
LLMLingua-2 was trained on meeting transcripts. When applied to structured tool outputs, it:
- Destroys file paths:
/Users/foo/.claude/tasks/abc-123→abc - 123 abc 123 - Expands instead of compressing: 206% average ratio on agentic data
- Fragments UUIDs:
4e149fea-6eb8-4feb→4e149fea - 6eb8 - 4feb - Has no cross-chunk awareness: 512-token limit
Kompress fixes these with:
- Trained on structured data — 330K examples of real tool outputs: JSON, diffs, logs, code, CLI output, SQL
- Dual-head architecture — token classification + span importance CNN prevents entity splitting
- ModernBERT backbone — 8K context window, code-pretrained, RoPE attention
Training Data (330K examples)
| Source | Examples | Type |
|---|---|---|
| Toucan-1.5M (MCP tool outputs) | ~80K | Real MCP server tool responses |
| SWE-agent trajectories | ~60K | Bash output, file reads, git diffs |
| ToolBench | ~50K | REST API JSON responses |
| Glaive Function Calling | ~40K | Function call/response pairs |
| CodeSearchNet | ~40K | Source code (Python, JS, Java, Go, Ruby, PHP) |
| JetBrains diff-xyz | ~10K | Git unified diffs |
| SQL create-context | ~10K | Database schemas + queries |
| Claude Code sessions | ~15K | Real agentic coding traces (API-labeled) |
| SWE-bench trajectories | ~15K | Open-source coding agent traces |
| Glaive + SWE (API-labeled) | ~10K | Function calling + coding (API-labeled) |
Labeling: Heuristic rules for structured data (JSON→keep keys, diffs→keep +/- lines, logs→keep errors) + Claude Sonnet distillation for natural language segments.
Architecture
Input tokens → ModernBERT-base encoder (149M params, 8K context) →
Head 1: Token-level keep/discard (Linear → Softmax)
Head 2: Span importance (Conv1d → GELU → Conv1d → Sigmoid)
Final score = token_prob × (0.5 + 0.5 × span_score)
The span head (~200K extra params) learns contiguous importance regions, preventing entity splitting and maintaining coherence.
Quick Start
pip install kompress
from kompress.inference.pytorch_runner import KompressRunner
runner = KompressRunner(checkpoint_path="chopratejas/kompress-base")
result = runner.compress(
'{"users": [{"id": 1, "name": "Alice", "email": "alice@example.com"}, '
'{"id": 2, "name": "Bob", "email": "bob@example.com"}, '
'{"id": 3, "name": "Charlie", "email": "charlie@example.com"}]}',
target_ratio=0.5,
)
print(result.compressed)
# Keeps: keys, structure, unique values — discards repetitive patterns
Use with Headroom
from kompress.integration.headroom_bridge import patch_content_router
from headroom.transforms import ContentRouter
router = ContentRouter()
patch_content_router(router) # Swaps LLMLingua → Kompress
Training Details
- Base model: answerdotai/ModernBERT-base (149M params)
- Training: 3 epochs, batch=64, lr=2e-5, AdamW + torch.compile on NVIDIA H100
- Loss: CrossEntropy (token head) + 0.3 × BCE (span head)
- Final metrics: F1=0.9956, Precision=0.9959, Recall=0.9953, train_loss=0.068
- Training time: 2h39m on H100 (330K examples, 3 epochs)
License
Apache 2.0
Citation
@software{kompress2025,
title={Kompress: Token Compression for Structured Tool Outputs and Agentic Contexts},
author={Tejas Chopra},
year={2025},
url={https://huggingface.co/chopratejas/kompress-base},
}
Links
- GitHub — Source code, training pipeline, eval scripts
- Headroom — Context compression framework
- LLMLingua-2 — The model Kompress replaces
- ModernBERT — Base encoder