kompress-base / README.md
chopratejas's picture
v3: trained on 330K structured tool outputs (H100) — JSON, diffs, logs, code, SQL, agentic traces
b7a65fe verified
metadata
license: apache-2.0
language:
  - en
library_name: transformers
tags:
  - token-compression
  - prompt-compression
  - context-compression
  - agentic
  - modernbert
  - llmlingua
  - headroom
  - tool-outputs
  - structured-data
pipeline_tag: token-classification
base_model: answerdotai/ModernBERT-base
datasets:
  - SWE-bench/SWE-smith-trajectories
  - glaiveai/glaive-function-calling-v2
  - nebius/SWE-agent-trajectories
  - Agent-Ark/Toucan-1.5M
  - tuandunghcmut/toolbench-v1
  - JetBrains-Research/diff-xyz
  - code_search_net
  - b-mc2/sql-create-context
model-index:
  - name: kompress-base
    results:
      - task:
          type: token-classification
          name: Token Compression
        metrics:
          - type: f1
            value: 0.9956
            name: F1
          - type: accuracy
            value: 0.9926
            name: Accuracy

Kompress: Token Compression for Structured Tool Outputs & Agentic Contexts

Kompress is a ModernBERT-based token compressor trained on 330K examples of structured tool outputs — JSON API responses, git diffs, error logs, source code, CLI output, database results, and agentic conversation traces. It is a drop-in replacement for LLMLingua-2.

Key Results

On Agentic / Structured Data (our target domain)

Metric Kompress LLMLingua-2
Entity Preservation 82.1% 36.0%
Compression Ratio 48.1% 206.0% (expands!)
Model Size 600 MB 1,400 MB
Context Window 8,192 512
Parameters 149M 355M

On LLMLingua-2's Benchmarks

Dataset Kompress LLMLingua-2 Note
MeetingBank 46.3% 57.4% LLMLingua's training domain
GSM8K 97.8% 98.9% Both excellent; LLMLingua keeps 88% vs Kompress 50%

Cross-Agent Generalization (Cursor IDE — never seen in training)

Metric Kompress LLMLingua-2
Entity Preservation 91.1% 13.5%
Compression Ratio 49.9% 85.8%

Why Kompress?

LLMLingua-2 was trained on meeting transcripts. When applied to structured tool outputs, it:

  • Destroys file paths: /Users/foo/.claude/tasks/abc-123abc - 123 abc 123
  • Expands instead of compressing: 206% average ratio on agentic data
  • Fragments UUIDs: 4e149fea-6eb8-4feb4e149fea - 6eb8 - 4feb
  • Has no cross-chunk awareness: 512-token limit

Kompress fixes these with:

  1. Trained on structured data — 330K examples of real tool outputs: JSON, diffs, logs, code, CLI output, SQL
  2. Dual-head architecture — token classification + span importance CNN prevents entity splitting
  3. ModernBERT backbone — 8K context window, code-pretrained, RoPE attention

Training Data (330K examples)

Source Examples Type
Toucan-1.5M (MCP tool outputs) ~80K Real MCP server tool responses
SWE-agent trajectories ~60K Bash output, file reads, git diffs
ToolBench ~50K REST API JSON responses
Glaive Function Calling ~40K Function call/response pairs
CodeSearchNet ~40K Source code (Python, JS, Java, Go, Ruby, PHP)
JetBrains diff-xyz ~10K Git unified diffs
SQL create-context ~10K Database schemas + queries
Claude Code sessions ~15K Real agentic coding traces (API-labeled)
SWE-bench trajectories ~15K Open-source coding agent traces
Glaive + SWE (API-labeled) ~10K Function calling + coding (API-labeled)

Labeling: Heuristic rules for structured data (JSON→keep keys, diffs→keep +/- lines, logs→keep errors) + Claude Sonnet distillation for natural language segments.

Architecture

Input tokens → ModernBERT-base encoder (149M params, 8K context) →
    Head 1: Token-level keep/discard (Linear → Softmax)
    Head 2: Span importance (Conv1d → GELU → Conv1d → Sigmoid)

Final score = token_prob × (0.5 + 0.5 × span_score)

The span head (~200K extra params) learns contiguous importance regions, preventing entity splitting and maintaining coherence.

Quick Start

pip install kompress

from kompress.inference.pytorch_runner import KompressRunner

runner = KompressRunner(checkpoint_path="chopratejas/kompress-base")
result = runner.compress(
    '{"users": [{"id": 1, "name": "Alice", "email": "alice@example.com"}, '
    '{"id": 2, "name": "Bob", "email": "bob@example.com"}, '
    '{"id": 3, "name": "Charlie", "email": "charlie@example.com"}]}',
    target_ratio=0.5,
)
print(result.compressed)
# Keeps: keys, structure, unique values — discards repetitive patterns

Use with Headroom

from kompress.integration.headroom_bridge import patch_content_router
from headroom.transforms import ContentRouter

router = ContentRouter()
patch_content_router(router)  # Swaps LLMLingua → Kompress

Training Details

  • Base model: answerdotai/ModernBERT-base (149M params)
  • Training: 3 epochs, batch=64, lr=2e-5, AdamW + torch.compile on NVIDIA H100
  • Loss: CrossEntropy (token head) + 0.3 × BCE (span head)
  • Final metrics: F1=0.9956, Precision=0.9959, Recall=0.9953, train_loss=0.068
  • Training time: 2h39m on H100 (330K examples, 3 epochs)

License

Apache 2.0

Citation

@software{kompress2025,
  title={Kompress: Token Compression for Structured Tool Outputs and Agentic Contexts},
  author={Tejas Chopra},
  year={2025},
  url={https://huggingface.co/chopratejas/kompress-base},
}

Links