chopratejas commited on
Commit
fa3f034
·
verified ·
1 Parent(s): 7f39c2e

Initial release: Kompress v1 — ModernBERT token compressor for agentic contexts

Browse files
Files changed (6) hide show
  1. README.md +154 -0
  2. config.json +109 -0
  3. model.safetensors +3 -0
  4. tokenizer.json +0 -0
  5. tokenizer_config.json +16 -0
  6. training_args.bin +3 -0
README.md ADDED
@@ -0,0 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ tags:
7
+ - token-compression
8
+ - prompt-compression
9
+ - context-compression
10
+ - agentic
11
+ - modernbert
12
+ - llmlingua
13
+ - headroom
14
+ pipeline_tag: token-classification
15
+ base_model: answerdotai/ModernBERT-base
16
+ datasets:
17
+ - SWE-bench/SWE-smith-trajectories
18
+ - glaiveai/glaive-function-calling-v2
19
+ model-index:
20
+ - name: kompress-base
21
+ results:
22
+ - task:
23
+ type: token-classification
24
+ name: Token Compression
25
+ metrics:
26
+ - type: f1
27
+ value: 0.997
28
+ name: F1
29
+ - type: accuracy
30
+ value: 0.994
31
+ name: Accuracy
32
+ ---
33
+
34
+ # Kompress: Token Compression for Agentic Contexts
35
+
36
+ **Kompress** is a ModernBERT-based token compressor trained specifically for agentic LLM contexts. It is a drop-in replacement for [LLMLingua-2](https://arxiv.org/abs/2403.12968) that achieves **2.3x better entity preservation** while being **2.3x smaller** and supporting **16x longer context windows**.
37
+
38
+ ## Key Results
39
+
40
+ | Metric | Kompress | LLMLingua-2 |
41
+ |--------|----------|-------------|
42
+ | Entity Preservation | **82.1%** | 36.0% |
43
+ | Compression Ratio | **48.1%** | 206.0% (expands!) |
44
+ | Model Size | **600 MB** | 1,400 MB |
45
+ | Context Window | **8,192** | 512 |
46
+ | Parameters | **149M** | 355M |
47
+ | Trained on Agentic Data | Yes | No (meeting transcripts) |
48
+
49
+ ## Why Kompress?
50
+
51
+ LLMLingua-2 was trained on meeting transcripts (MeetingBank). When applied to agentic contexts (tool outputs, code, file paths, error traces), it:
52
+
53
+ - **Destroys file paths**: `/Users/foo/.claude/tasks/abc-123` becomes `abc - 123 abc 123 123`
54
+ - **Splits entity names**: Keeps "John" but drops "Smith"
55
+ - **Expands instead of compressing**: 206% average ratio on agentic data
56
+ - **Has no cross-chunk awareness**: 512-token chunks, no global context
57
+
58
+ Kompress fixes all of these with:
59
+
60
+ 1. **Agentic training data** — trained on real Claude Code sessions, SWE-bench trajectories, and function-calling traces
61
+ 2. **Dual-head architecture** — token classification + span importance CNN prevents entity splitting
62
+ 3. **ModernBERT backbone** — 8K context window, code-pretrained, RoPE attention
63
+
64
+ ## Architecture
65
+
66
+ ```
67
+ Input tokens → ModernBERT-base encoder (149M params, 8K context) →
68
+ Head 1: Token-level keep/discard (Linear → Softmax)
69
+ Head 2: Span importance (Conv1d → GELU → Conv1d → Sigmoid)
70
+
71
+ Final score = token_prob × (0.5 + 0.5 × span_score)
72
+ ```
73
+
74
+ The span head (~200K extra params) learns contiguous importance regions, preventing the "split entity" and "incoherent fragments" problems of pure token-level classifiers.
75
+
76
+ ## Quick Start
77
+
78
+ ```python
79
+ # Install
80
+ pip install kompress
81
+
82
+ # Compress text
83
+ from kompress.inference.pytorch_runner import KompressRunner
84
+
85
+ runner = KompressRunner(checkpoint_path="chopratejas/kompress-base")
86
+ result = runner.compress(
87
+ "The function parse_config in /Users/dev/app/config.py returned None "
88
+ "because the YAML file was malformed at line 42. Error: yaml.scanner."
89
+ "ScannerError: mapping values are not allowed here.",
90
+ target_ratio=0.5,
91
+ )
92
+ print(result.compressed)
93
+ # Keeps: parse_config, /Users/dev/app/config.py, None, YAML, line 42, ScannerError
94
+ ```
95
+
96
+ ## Use with Headroom
97
+
98
+ Kompress is designed as a drop-in replacement for LLMLingua-2 in the [Headroom](https://github.com/chopratejas/headroom) compression pipeline:
99
+
100
+ ```python
101
+ from kompress.integration.transform import KompressCompressor, KompressConfig
102
+ from kompress.integration.headroom_bridge import patch_content_router
103
+
104
+ # Option 1: Use directly
105
+ compressor = KompressCompressor(KompressConfig(
106
+ checkpoint_path="chopratejas/kompress-base"
107
+ ))
108
+ result = compressor.compress(long_tool_output)
109
+
110
+ # Option 2: Patch existing Headroom pipeline
111
+ from headroom.transforms import ContentRouter
112
+ router = ContentRouter()
113
+ patch_content_router(router) # Swaps LLMLingua → Kompress
114
+ ```
115
+
116
+ ## Training Data
117
+
118
+ Trained on 5,747 labeled examples from three diverse sources:
119
+
120
+ | Source | Segments | Type |
121
+ |--------|----------|------|
122
+ | Claude Code sessions | 3,096 | Real agentic coding traces |
123
+ | Glaive Function Calling | 1,815 | General tool-use across domains |
124
+ | SWE-bench Trajectories | 836 | Open-source coding agent traces |
125
+
126
+ Labels generated via Claude Sonnet distillation with task-conditioned, entity-aware prompts.
127
+
128
+ ## Training Details
129
+
130
+ - **Base model**: [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) (149M params)
131
+ - **Training**: 5 epochs, batch=32, lr=2e-5, AdamW, on NVIDIA A100
132
+ - **Loss**: CrossEntropy (token head) + 0.3 × BCE (span head)
133
+ - **Metrics**: F1=0.997, Precision=0.994, Recall=1.0
134
+
135
+ ## License
136
+
137
+ Apache 2.0 — use it however you want.
138
+
139
+ ## Citation
140
+
141
+ ```bibtex
142
+ @software{kompress2025,
143
+ title={Kompress: Token Compression for Agentic Contexts},
144
+ author={Tejas Chopra},
145
+ year={2025},
146
+ url={https://huggingface.co/chopratejas/kompress-base},
147
+ }
148
+ ```
149
+
150
+ ## Links
151
+
152
+ - [Headroom](https://github.com/chopratejas/headroom) — Context compression framework
153
+ - [LLMLingua-2 paper](https://arxiv.org/abs/2403.12968) — The model Kompress replaces
154
+ - [ModernBERT](https://huggingface.co/answerdotai/ModernBERT-base) — Base encoder
config.json ADDED
@@ -0,0 +1,109 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "return_dict": true,
3
+ "output_hidden_states": false,
4
+ "torchscript": false,
5
+ "dtype": "float32",
6
+ "pruned_heads": {},
7
+ "tie_word_embeddings": true,
8
+ "chunk_size_feed_forward": 0,
9
+ "is_encoder_decoder": false,
10
+ "is_decoder": false,
11
+ "cross_attention_hidden_size": null,
12
+ "add_cross_attention": false,
13
+ "tie_encoder_decoder": false,
14
+ "architectures": [
15
+ "HeadroomCompressor"
16
+ ],
17
+ "finetuning_task": null,
18
+ "id2label": {
19
+ "0": "LABEL_0",
20
+ "1": "LABEL_1"
21
+ },
22
+ "label2id": {
23
+ "LABEL_0": 0,
24
+ "LABEL_1": 1
25
+ },
26
+ "task_specific_params": {
27
+ "token_compression": {
28
+ "span_cnn_hidden": 256,
29
+ "span_kernel_sizes": [
30
+ 5,
31
+ 3
32
+ ],
33
+ "span_loss_weight": 0.3
34
+ }
35
+ },
36
+ "problem_type": null,
37
+ "tokenizer_class": null,
38
+ "prefix": null,
39
+ "bos_token_id": 50281,
40
+ "pad_token_id": 50283,
41
+ "eos_token_id": 50282,
42
+ "sep_token_id": 50282,
43
+ "decoder_start_token_id": null,
44
+ "max_length": 20,
45
+ "min_length": 0,
46
+ "do_sample": false,
47
+ "early_stopping": false,
48
+ "num_beams": 1,
49
+ "temperature": 1.0,
50
+ "top_k": 50,
51
+ "top_p": 1.0,
52
+ "typical_p": 1.0,
53
+ "repetition_penalty": 1.0,
54
+ "length_penalty": 1.0,
55
+ "no_repeat_ngram_size": 0,
56
+ "encoder_no_repeat_ngram_size": 0,
57
+ "bad_words_ids": null,
58
+ "num_return_sequences": 1,
59
+ "output_scores": false,
60
+ "return_dict_in_generate": false,
61
+ "forced_bos_token_id": null,
62
+ "forced_eos_token_id": null,
63
+ "remove_invalid_values": false,
64
+ "exponential_decay_length_penalty": null,
65
+ "suppress_tokens": null,
66
+ "begin_suppress_tokens": null,
67
+ "num_beam_groups": 1,
68
+ "diversity_penalty": 0.0,
69
+ "_name_or_path": "answerdotai/ModernBERT-base",
70
+ "transformers_version": "4.57.6",
71
+ "cls_token_id": 50281,
72
+ "gradient_checkpointing": false,
73
+ "layer_norm_eps": 1e-05,
74
+ "model_type": "modernbert",
75
+ "position_embedding_type": "absolute",
76
+ "tf_legacy_loss": false,
77
+ "use_bfloat16": false,
78
+ "vocab_size": 50368,
79
+ "max_position_embeddings": 8192,
80
+ "hidden_size": 768,
81
+ "intermediate_size": 1152,
82
+ "num_hidden_layers": 22,
83
+ "num_attention_heads": 12,
84
+ "initializer_range": 0.02,
85
+ "initializer_cutoff_factor": 2.0,
86
+ "norm_eps": 1e-05,
87
+ "norm_bias": false,
88
+ "global_rope_theta": 160000.0,
89
+ "attention_bias": false,
90
+ "attention_dropout": 0.0,
91
+ "hidden_activation": "gelu",
92
+ "global_attn_every_n_layers": 3,
93
+ "local_attention": 128,
94
+ "local_rope_theta": 10000.0,
95
+ "embedding_dropout": 0.0,
96
+ "mlp_bias": false,
97
+ "mlp_dropout": 0.0,
98
+ "decoder_bias": true,
99
+ "classifier_pooling": "mean",
100
+ "classifier_dropout": 0.0,
101
+ "classifier_bias": false,
102
+ "classifier_activation": "gelu",
103
+ "deterministic_flash_attn": false,
104
+ "sparse_prediction": false,
105
+ "sparse_pred_ignore_index": -100,
106
+ "repad_logits_with_grad": false,
107
+ "output_attentions": false,
108
+ "num_labels": 2
109
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4dad1b717e3e5b47a169d7edca4c359b65e004a537473d956fd65731317f1017
3
+ size 600015548
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "backend": "tokenizers",
3
+ "clean_up_tokenization_spaces": true,
4
+ "cls_token": "[CLS]",
5
+ "is_local": false,
6
+ "mask_token": "[MASK]",
7
+ "model_input_names": [
8
+ "input_ids",
9
+ "attention_mask"
10
+ ],
11
+ "model_max_length": 8192,
12
+ "pad_token": "[PAD]",
13
+ "sep_token": "[SEP]",
14
+ "tokenizer_class": "TokenizersBackend",
15
+ "unk_token": "[UNK]"
16
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a38655a76ccc51a01ef7d311276d42cfa6e09bbcd0b1bdbe6318161bbdb9b26f
3
+ size 5201