Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -23,26 +23,22 @@ datasets:
|
|
| 23 |
|
| 24 |
# Squeez-2B
|
| 25 |
|
| 26 |
-
LLM coding agents spend
|
| 27 |
-
|
| 28 |
-
Unlike keyword search (BM25) or generic semantic highlighting, Squeez is trained specifically on tool output from real software engineering workflows β test logs, grep results, build errors, git diffs, stack traces, and more.
|
| 29 |
|
| 30 |
## What is Squeez?
|
| 31 |
|
| 32 |
-
|
| 33 |
|
| 34 |
```
|
| 35 |
Tool output (500 lines) β Squeez β Relevant lines (30 lines) β Agent context
|
| 36 |
```
|
| 37 |
|
| 38 |
-
This model
|
| 39 |
-
|
| 40 |
-
### Why a small fine-tuned model?
|
| 41 |
|
| 42 |
-
-
|
| 43 |
-
-
|
| 44 |
-
-
|
| 45 |
-
-
|
| 46 |
|
| 47 |
## Evaluation
|
| 48 |
|
|
|
|
| 23 |
|
| 24 |
# Squeez-2B
|
| 25 |
|
| 26 |
+
LLM coding agents spend 80-95% of their context window on irrelevant tool output. Squeez filters it down to the lines that actually matter, compressing tool output by ~91% while keeping 86% of the relevant information.
|
|
|
|
|
|
|
| 27 |
|
| 28 |
## What is Squeez?
|
| 29 |
|
| 30 |
+
A tool output pruner for coding agents. When an agent runs a tool (pytest, grep, git log, npm build, kubectl, etc.), the output is often hundreds of lines but only a handful matter for the current task. Squeez sits between the tool and the agent's context window:
|
| 31 |
|
| 32 |
```
|
| 33 |
Tool output (500 lines) β Squeez β Relevant lines (30 lines) β Agent context
|
| 34 |
```
|
| 35 |
|
| 36 |
+
This model is [Qwen 3.5 2B](https://huggingface.co/Qwen/Qwen3.5-2B) fine-tuned to extract verbatim relevant lines from tool output given a task-specific query. It's trained on real software engineering tool output from SWE-bench (test logs, grep results, build errors, git diffs, stack traces, etc.), not generic text.
|
|
|
|
|
|
|
| 37 |
|
| 38 |
+
- 2B parameters, runs on a single GPU, serves via vLLM
|
| 39 |
+
- Outperforms Qwen 3.5 35B A3B zero-shot by +13% Span F1
|
| 40 |
+
- Returns verbatim lines only, no rewriting or summarization
|
| 41 |
+
- Works as a CLI pipe, Python library, or vLLM server
|
| 42 |
|
| 43 |
## Evaluation
|
| 44 |
|