Initial release: Kompress v1 — ModernBERT token compressor for agentic contexts

Browse files

Files changed (6) hide show

README.md +154 -0
config.json +109 -0
model.safetensors +3 -0
tokenizer.json +0 -0
tokenizer_config.json +16 -0
training_args.bin +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,154 @@

+---
+license: apache-2.0
+language:
+- en
+library_name: transformers
+tags:
+- token-compression
+- prompt-compression
+- context-compression
+- agentic
+- modernbert
+- llmlingua
+- headroom
+pipeline_tag: token-classification
+base_model: answerdotai/ModernBERT-base
+datasets:
+- SWE-bench/SWE-smith-trajectories
+- glaiveai/glaive-function-calling-v2
+model-index:
+- name: kompress-base
+  results:
+  - task:
+      type: token-classification
+      name: Token Compression
+    metrics:
+    - type: f1
+      value: 0.997
+      name: F1
+    - type: accuracy
+      value: 0.994
+      name: Accuracy
+---
+# Kompress: Token Compression for Agentic Contexts
+**Kompress** is a ModernBERT-based token compressor trained specifically for agentic LLM contexts. It is a drop-in replacement for [LLMLingua-2](https://arxiv.org/abs/2403.12968) that achieves **2.3x better entity preservation** while being **2.3x smaller** and supporting **16x longer context windows**.
+## Key Results
+| Metric | Kompress | LLMLingua-2 |
+|--------|----------|-------------|
+| Entity Preservation | **82.1%** | 36.0% |
+| Compression Ratio | **48.1%** | 206.0% (expands!) |
+| Model Size | **600 MB** | 1,400 MB |
+| Context Window | **8,192** | 512 |
+| Parameters | **149M** | 355M |
+| Trained on Agentic Data | Yes | No (meeting transcripts) |
+## Why Kompress?
+LLMLingua-2 was trained on meeting transcripts (MeetingBank). When applied to agentic contexts (tool outputs, code, file paths, error traces), it:
+- **Destroys file paths**: `/Users/foo/.claude/tasks/abc-123` becomes `abc - 123 abc 123 123`
+- **Splits entity names**: Keeps "John" but drops "Smith"
+- **Expands instead of compressing**: 206% average ratio on agentic data
+- **Has no cross-chunk awareness**: 512-token chunks, no global context
+Kompress fixes all of these with:
+1. **Agentic training data** — trained on real Claude Code sessions, SWE-bench trajectories, and function-calling traces
+2. **Dual-head architecture** — token classification + span importance CNN prevents entity splitting
+3. **ModernBERT backbone** — 8K context window, code-pretrained, RoPE attention
+## Architecture
+```
+Input tokens → ModernBERT-base encoder (149M params, 8K context) →
+    Head 1: Token-level keep/discard (Linear → Softmax)
+    Head 2: Span importance (Conv1d → GELU → Conv1d → Sigmoid)
+Final score = token_prob × (0.5 + 0.5 × span_score)
+```
+The span head (~200K extra params) learns contiguous importance regions, preventing the "split entity" and "incoherent fragments" problems of pure token-level classifiers.
+## Quick Start
+```python
+# Install
+pip install kompress
+# Compress text
+from kompress.inference.pytorch_runner import KompressRunner
+runner = KompressRunner(checkpoint_path="chopratejas/kompress-base")
+result = runner.compress(
+    "The function parse_config in /Users/dev/app/config.py returned None "
+    "because the YAML file was malformed at line 42. Error: yaml.scanner."
+    "ScannerError: mapping values are not allowed here.",
+    target_ratio=0.5,
+)
+print(result.compressed)
+# Keeps: parse_config, /Users/dev/app/config.py, None, YAML, line 42, ScannerError
+```
+## Use with Headroom
+Kompress is designed as a drop-in replacement for LLMLingua-2 in the [Headroom](https://github.com/chopratejas/headroom) compression pipeline:
+```python
+from kompress.integration.transform import KompressCompressor, KompressConfig
+from kompress.integration.headroom_bridge import patch_content_router
+# Option 1: Use directly
+compressor = KompressCompressor(KompressConfig(
+    checkpoint_path="chopratejas/kompress-base"
+))
+result = compressor.compress(long_tool_output)
+# Option 2: Patch existing Headroom pipeline
+from headroom.transforms import ContentRouter
+router = ContentRouter()
+patch_content_router(router)  # Swaps LLMLingua → Kompress
+```
+## Training Data
+Trained on 5,747 labeled examples from three diverse sources:
+| Source | Segments | Type |
+|--------|----------|------|
+| Claude Code sessions | 3,096 | Real agentic coding traces |
+| Glaive Function Calling | 1,815 | General tool-use across domains |
+| SWE-bench Trajectories | 836 | Open-source coding agent traces |
+Labels generated via Claude Sonnet distillation with task-conditioned, entity-aware prompts.
+## Training Details
+- **Base model**: [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) (149M params)
+- **Training**: 5 epochs, batch=32, lr=2e-5, AdamW, on NVIDIA A100
+- **Loss**: CrossEntropy (token head) + 0.3 × BCE (span head)
+- **Metrics**: F1=0.997, Precision=0.994, Recall=1.0
+## License
+Apache 2.0 — use it however you want.
+## Citation
+```bibtex
+@software{kompress2025,
+  title={Kompress: Token Compression for Agentic Contexts},
+  author={Tejas Chopra},
+  year={2025},
+  url={https://huggingface.co/chopratejas/kompress-base},
+}
+```
+## Links
+- [Headroom](https://github.com/chopratejas/headroom) — Context compression framework
+- [LLMLingua-2 paper](https://arxiv.org/abs/2403.12968) — The model Kompress replaces
+- [ModernBERT](https://huggingface.co/answerdotai/ModernBERT-base) — Base encoder

config.json ADDED Viewed

	@@ -0,0 +1,109 @@

+{
+  "return_dict": true,
+  "output_hidden_states": false,
+  "torchscript": false,
+  "dtype": "float32",
+  "pruned_heads": {},
+  "tie_word_embeddings": true,
+  "chunk_size_feed_forward": 0,
+  "is_encoder_decoder": false,
+  "is_decoder": false,
+  "cross_attention_hidden_size": null,
+  "add_cross_attention": false,
+  "tie_encoder_decoder": false,
+  "architectures": [
+    "HeadroomCompressor"
+  ],
+  "finetuning_task": null,
+  "id2label": {
+    "0": "LABEL_0",
+    "1": "LABEL_1"
+  },
+  "label2id": {
+    "LABEL_0": 0,
+    "LABEL_1": 1
+  },
+  "task_specific_params": {
+    "token_compression": {
+      "span_cnn_hidden": 256,
+      "span_kernel_sizes": [
+        5,
+        3
+      ],
+      "span_loss_weight": 0.3
+    }
+  },
+  "problem_type": null,
+  "tokenizer_class": null,
+  "prefix": null,
+  "bos_token_id": 50281,
+  "pad_token_id": 50283,
+  "eos_token_id": 50282,
+  "sep_token_id": 50282,
+  "decoder_start_token_id": null,
+  "max_length": 20,
+  "min_length": 0,
+  "do_sample": false,
+  "early_stopping": false,
+  "num_beams": 1,
+  "temperature": 1.0,
+  "top_k": 50,
+  "top_p": 1.0,
+  "typical_p": 1.0,
+  "repetition_penalty": 1.0,
+  "length_penalty": 1.0,
+  "no_repeat_ngram_size": 0,
+  "encoder_no_repeat_ngram_size": 0,
+  "bad_words_ids": null,
+  "num_return_sequences": 1,
+  "output_scores": false,
+  "return_dict_in_generate": false,
+  "forced_bos_token_id": null,
+  "forced_eos_token_id": null,
+  "remove_invalid_values": false,
+  "exponential_decay_length_penalty": null,
+  "suppress_tokens": null,
+  "begin_suppress_tokens": null,
+  "num_beam_groups": 1,
+  "diversity_penalty": 0.0,
+  "_name_or_path": "answerdotai/ModernBERT-base",
+  "transformers_version": "4.57.6",
+  "cls_token_id": 50281,
+  "gradient_checkpointing": false,
+  "layer_norm_eps": 1e-05,
+  "model_type": "modernbert",
+  "position_embedding_type": "absolute",
+  "tf_legacy_loss": false,
+  "use_bfloat16": false,
+  "vocab_size": 50368,
+  "max_position_embeddings": 8192,
+  "hidden_size": 768,
+  "intermediate_size": 1152,
+  "num_hidden_layers": 22,
+  "num_attention_heads": 12,
+  "initializer_range": 0.02,
+  "initializer_cutoff_factor": 2.0,
+  "norm_eps": 1e-05,
+  "norm_bias": false,
+  "global_rope_theta": 160000.0,
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "hidden_activation": "gelu",
+  "global_attn_every_n_layers": 3,
+  "local_attention": 128,
+  "local_rope_theta": 10000.0,
+  "embedding_dropout": 0.0,
+  "mlp_bias": false,
+  "mlp_dropout": 0.0,
+  "decoder_bias": true,
+  "classifier_pooling": "mean",
+  "classifier_dropout": 0.0,
+  "classifier_bias": false,
+  "classifier_activation": "gelu",
+  "deterministic_flash_attn": false,
+  "sparse_prediction": false,
+  "sparse_pred_ignore_index": -100,
+  "repad_logits_with_grad": false,
+  "output_attentions": false,
+  "num_labels": 2
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4dad1b717e3e5b47a169d7edca4c359b65e004a537473d956fd65731317f1017
+size 600015548

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,16 @@

+{
+  "backend": "tokenizers",
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "[CLS]",
+  "is_local": false,
+  "mask_token": "[MASK]",
+  "model_input_names": [
+    "input_ids",
+    "attention_mask"
+  ],
+  "model_max_length": 8192,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "tokenizer_class": "TokenizersBackend",
+  "unk_token": "[UNK]"
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a38655a76ccc51a01ef7d311276d42cfa6e09bbcd0b1bdbe6318161bbdb9b26f
+size 5201