Kompress: Token Compression for Structured Tool Outputs & Agentic Contexts

Kompress is a ModernBERT-based token compressor trained on 330K examples of structured tool outputs — JSON API responses, git diffs, error logs, source code, CLI output, database results, and agentic conversation traces. It is a drop-in replacement for LLMLingua-2.

Key Results

On Agentic / Structured Data (our target domain)

Metric Kompress LLMLingua-2
Entity Preservation 82.1% 36.0%
Compression Ratio 48.1% 206.0% (expands!)
Model Size 600 MB 1,400 MB
Context Window 8,192 512
Parameters 149M 355M

On LLMLingua-2's Benchmarks

Dataset Kompress LLMLingua-2 Note
MeetingBank 46.3% 57.4% LLMLingua's training domain
GSM8K 97.8% 98.9% Both excellent; LLMLingua keeps 88% vs Kompress 50%

Cross-Agent Generalization (Cursor IDE — never seen in training)

Metric Kompress LLMLingua-2
Entity Preservation 91.1% 13.5%
Compression Ratio 49.9% 85.8%

Why Kompress?

LLMLingua-2 was trained on meeting transcripts. When applied to structured tool outputs, it:

  • Destroys file paths: /Users/foo/.claude/tasks/abc-123 → abc - 123 abc 123
  • Expands instead of compressing: 206% average ratio on agentic data
  • Fragments UUIDs: 4e149fea-6eb8-4feb → 4e149fea - 6eb8 - 4feb
  • Has no cross-chunk awareness: 512-token limit

Kompress fixes these with:

  1. Trained on structured data — 330K examples of real tool outputs: JSON, diffs, logs, code, CLI output, SQL
  2. Dual-head architecture — token classification + span importance CNN prevents entity splitting
  3. ModernBERT backbone — 8K context window, code-pretrained, RoPE attention

Training Data (330K examples)

Source Examples Type
Toucan-1.5M (MCP tool outputs) ~80K Real MCP server tool responses
SWE-agent trajectories ~60K Bash output, file reads, git diffs
ToolBench ~50K REST API JSON responses
Glaive Function Calling ~40K Function call/response pairs
CodeSearchNet ~40K Source code (Python, JS, Java, Go, Ruby, PHP)
JetBrains diff-xyz ~10K Git unified diffs
SQL create-context ~10K Database schemas + queries
Claude Code sessions ~15K Real agentic coding traces (API-labeled)
SWE-bench trajectories ~15K Open-source coding agent traces
Glaive + SWE (API-labeled) ~10K Function calling + coding (API-labeled)

Labeling: Heuristic rules for structured data (JSON→keep keys, diffs→keep +/- lines, logs→keep errors) + Claude Sonnet distillation for natural language segments.

Architecture

Input tokens → ModernBERT-base encoder (149M params, 8K context) →
    Head 1: Token-level keep/discard (Linear → Softmax)
    Head 2: Span importance (Conv1d → GELU → Conv1d → Sigmoid)

Final score = token_prob × (0.5 + 0.5 × span_score)

The span head (~200K extra params) learns contiguous importance regions, preventing entity splitting and maintaining coherence.

Quick Start

pip install kompress

from kompress.inference.pytorch_runner import KompressRunner

runner = KompressRunner(checkpoint_path="chopratejas/kompress-base")
result = runner.compress(
    '{"users": [{"id": 1, "name": "Alice", "email": "alice@example.com"}, '
    '{"id": 2, "name": "Bob", "email": "bob@example.com"}, '
    '{"id": 3, "name": "Charlie", "email": "charlie@example.com"}]}',
    target_ratio=0.5,
)
print(result.compressed)
# Keeps: keys, structure, unique values — discards repetitive patterns

Use with Headroom

from kompress.integration.headroom_bridge import patch_content_router
from headroom.transforms import ContentRouter

router = ContentRouter()
patch_content_router(router)  # Swaps LLMLingua → Kompress

Training Details

  • Base model: answerdotai/ModernBERT-base (149M params)
  • Training: 3 epochs, batch=64, lr=2e-5, AdamW + torch.compile on NVIDIA H100
  • Loss: CrossEntropy (token head) + 0.3 × BCE (span head)
  • Final metrics: F1=0.9956, Precision=0.9959, Recall=0.9953, train_loss=0.068
  • Training time: 2h39m on H100 (330K examples, 3 epochs)

License

Apache 2.0

Citation

@software{kompress2025,
  title={Kompress: Token Compression for Structured Tool Outputs and Agentic Contexts},
  author={Tejas Chopra},
  year={2025},
  url={https://huggingface.co/chopratejas/kompress-base},
}

Links

  • GitHub — Source code, training pipeline, eval scripts
  • Headroom — Context compression framework
  • LLMLingua-2 — The model Kompress replaces
  • ModernBERT — Base encoder
Downloads last month
53
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for chopratejas/kompress-base

Finetuned
(1120)
this model

Datasets used to train chopratejas/kompress-base

Paper for chopratejas/kompress-base

Evaluation results