v3: trained on 330K structured tool outputs (H100) — JSON, diffs, logs, code, SQL, agentic traces

b7a65fe verified 2 days ago

5.92 kB

license: apache-2.0
language:
  - en
library_name: transformers
tags:
  - token-compression
  - prompt-compression
  - context-compression
  - agentic
  - modernbert
  - llmlingua
  - headroom
  - tool-outputs
  - structured-data
pipeline_tag: token-classification
base_model: answerdotai/ModernBERT-base
datasets:
  - SWE-bench/SWE-smith-trajectories
  - glaiveai/glaive-function-calling-v2
  - nebius/SWE-agent-trajectories
  - Agent-Ark/Toucan-1.5M
  - tuandunghcmut/toolbench-v1
  - JetBrains-Research/diff-xyz
  - code_search_net
  - b-mc2/sql-create-context
model-index:
  - name: kompress-base
    results:
      - task:
          type: token-classification
          name: Token Compression
        metrics:
          - type: f1
            value: 0.9956
            name: F1
          - type: accuracy
            value: 0.9926
            name: Accuracy

Kompress: Token Compression for Structured Tool Outputs & Agentic Contexts

Kompress is a ModernBERT-based token compressor trained on 330K examples of structured tool outputs — JSON API responses, git diffs, error logs, source code, CLI output, database results, and agentic conversation traces. It is a drop-in replacement for LLMLingua-2.

Key Results

On Agentic / Structured Data (our target domain)

Metric	Kompress	LLMLingua-2
Entity Preservation	82.1%	36.0%
Compression Ratio	48.1%	206.0% (expands!)
Model Size	600 MB	1,400 MB
Context Window	8,192	512
Parameters	149M	355M

On LLMLingua-2's Benchmarks

Dataset	Kompress	LLMLingua-2	Note
MeetingBank	46.3%	57.4%	LLMLingua's training domain
GSM8K	97.8%	98.9%	Both excellent; LLMLingua keeps 88% vs Kompress 50%

Cross-Agent Generalization (Cursor IDE — never seen in training)

Metric	Kompress	LLMLingua-2
Entity Preservation	91.1%	13.5%
Compression Ratio	49.9%	85.8%

Why Kompress?

LLMLingua-2 was trained on meeting transcripts. When applied to structured tool outputs, it:

Destroys file paths: /Users/foo/.claude/tasks/abc-123 → abc - 123 abc 123
Expands instead of compressing: 206% average ratio on agentic data
Fragments UUIDs: 4e149fea-6eb8-4feb → 4e149fea - 6eb8 - 4feb
Has no cross-chunk awareness: 512-token limit

Kompress fixes these with:

Trained on structured data — 330K examples of real tool outputs: JSON, diffs, logs, code, CLI output, SQL
Dual-head architecture — token classification + span importance CNN prevents entity splitting
ModernBERT backbone — 8K context window, code-pretrained, RoPE attention

Training Data (330K examples)

Source	Examples	Type
Toucan-1.5M (MCP tool outputs)	~80K	Real MCP server tool responses
SWE-agent trajectories	~60K	Bash output, file reads, git diffs
ToolBench	~50K	REST API JSON responses
Glaive Function Calling	~40K	Function call/response pairs
CodeSearchNet	~40K	Source code (Python, JS, Java, Go, Ruby, PHP)
JetBrains diff-xyz	~10K	Git unified diffs
SQL create-context	~10K	Database schemas + queries
Claude Code sessions	~15K	Real agentic coding traces (API-labeled)
SWE-bench trajectories	~15K	Open-source coding agent traces
Glaive + SWE (API-labeled)	~10K	Function calling + coding (API-labeled)

Labeling: Heuristic rules for structured data (JSON→keep keys, diffs→keep +/- lines, logs→keep errors) + Claude Sonnet distillation for natural language segments.

Architecture

Input tokens → ModernBERT-base encoder (149M params, 8K context) →
    Head 1: Token-level keep/discard (Linear → Softmax)
    Head 2: Span importance (Conv1d → GELU → Conv1d → Sigmoid)

Final score = token_prob × (0.5 + 0.5 × span_score)

The span head (~200K extra params) learns contiguous importance regions, preventing entity splitting and maintaining coherence.

Quick Start

pip install kompress

from kompress.inference.pytorch_runner import KompressRunner

runner = KompressRunner(checkpoint_path="chopratejas/kompress-base")
result = runner.compress(
    '{"users": [{"id": 1, "name": "Alice", "email": "alice@example.com"}, '
    '{"id": 2, "name": "Bob", "email": "bob@example.com"}, '
    '{"id": 3, "name": "Charlie", "email": "charlie@example.com"}]}',
    target_ratio=0.5,
)
print(result.compressed)
# Keeps: keys, structure, unique values — discards repetitive patterns

Use with Headroom

from kompress.integration.headroom_bridge import patch_content_router
from headroom.transforms import ContentRouter

router = ContentRouter()
patch_content_router(router)  # Swaps LLMLingua → Kompress

Training Details

Base model: answerdotai/ModernBERT-base (149M params)
Training: 3 epochs, batch=64, lr=2e-5, AdamW + torch.compile on NVIDIA H100
Loss: CrossEntropy (token head) + 0.3 × BCE (span head)
Final metrics: F1=0.9956, Precision=0.9959, Recall=0.9953, train_loss=0.068
Training time: 2h39m on H100 (330K examples, 3 epochs)

License

Apache 2.0

Citation

@software{kompress2025,
  title={Kompress: Token Compression for Structured Tool Outputs and Agentic Contexts},
  author={Tejas Chopra},
  year={2025},
  url={https://huggingface.co/chopratejas/kompress-base},
}

chopratejas
/

kompress-base