KRLabsOrg
/

squeez-2b

Text Generation

image-text-to-text

tool-output-pruning

context-engineering

context-pruning

Model card Files Files and versions

adaamko commited on 2 days ago

Commit

e137a82

·

verified ·

1 Parent(s): 51b4c8b

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md +7 -11

README.md CHANGED Viewed

@@ -23,26 +23,22 @@ datasets:
 # Squeez-2B
-LLM coding agents spend **80-95% of their context window** on irrelevant tool output — passing test names, boilerplate headers, unchanged files. Squeez reads the raw output alongside a task description and returns **only the lines the agent needs to read next**, compressing tool output by ~91% on average while keeping 86% of the relevant information.
-Unlike keyword search (BM25) or generic semantic highlighting, Squeez is trained specifically on tool output from real software engineering workflows — test logs, grep results, build errors, git diffs, stack traces, and more.
 ## What is Squeez?
-Squeez is a **tool output pruner for coding agents**. When an agent runs a tool (pytest, grep, git log, npm build, kubectl, etc.), the output is often hundreds of lines — but only a handful matter for the current task. Squeez acts as a filter between the tool and the agent's context window:
 ```
 Tool output (500 lines) → Squeez → Relevant lines (30 lines) → Agent context
 ```
-This model (Squeez-2B) is a generative approach: [Qwen 3.5 2B](https://huggingface.co/Qwen/Qwen3.5-2B) fine-tuned to extract verbatim relevant lines from tool output, given a task-specific query.
-### Why a small fine-tuned model?
-- **Fast**: 2B parameters — runs on a single GPU or even CPU, serves via vLLM at high throughput
-- **Accurate**: Outperforms a 35B MoE model (Qwen 3.5 35B A3B) at zero-shot by **+13% Span F1**
-- **Faithful**: Returns verbatim lines only — no rewriting, no hallucination, no summarization
-- **Drop-in**: Works as a CLI pipe, Python library, or vLLM server — integrates with any agent framework
 ## Evaluation

 # Squeez-2B
+LLM coding agents spend 80-95% of their context window on irrelevant tool output. Squeez filters it down to the lines that actually matter, compressing tool output by ~91% while keeping 86% of the relevant information.
 ## What is Squeez?
+A tool output pruner for coding agents. When an agent runs a tool (pytest, grep, git log, npm build, kubectl, etc.), the output is often hundreds of lines but only a handful matter for the current task. Squeez sits between the tool and the agent's context window:
 ```
 Tool output (500 lines) → Squeez → Relevant lines (30 lines) → Agent context
 ```
+This model is [Qwen 3.5 2B](https://huggingface.co/Qwen/Qwen3.5-2B) fine-tuned to extract verbatim relevant lines from tool output given a task-specific query. It's trained on real software engineering tool output from SWE-bench (test logs, grep results, build errors, git diffs, stack traces, etc.), not generic text.
+- 2B parameters, runs on a single GPU, serves via vLLM
+- Outperforms Qwen 3.5 35B A3B zero-shot by +13% Span F1
+- Returns verbatim lines only, no rewriting or summarization
+- Works as a CLI pipe, Python library, or vLLM server
 ## Evaluation