Text Generation
Transformers
Safetensors
English
qwen3_5
image-text-to-text
code
tool-output
pruning
coding-agents
extraction
conversational
Instructions to use KRLabsOrg/squeez-2b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use KRLabsOrg/squeez-2b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="KRLabsOrg/squeez-2b") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("KRLabsOrg/squeez-2b") model = AutoModelForImageTextToText.from_pretrained("KRLabsOrg/squeez-2b") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use KRLabsOrg/squeez-2b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "KRLabsOrg/squeez-2b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "KRLabsOrg/squeez-2b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/KRLabsOrg/squeez-2b
- SGLang
How to use KRLabsOrg/squeez-2b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "KRLabsOrg/squeez-2b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "KRLabsOrg/squeez-2b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "KRLabsOrg/squeez-2b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "KRLabsOrg/squeez-2b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use KRLabsOrg/squeez-2b with Docker Model Runner:
docker model run hf.co/KRLabsOrg/squeez-2b
File size: 7,823 Bytes
51b4c8b 07c6642 51b4c8b 07c6642 51b4c8b 37964d9 85f4123 51b4c8b 85f4123 51b4c8b 37964d9 51b4c8b 37964d9 3a4c2e6 0a80056 37964d9 51b4c8b 37964d9 51b4c8b 37964d9 51b4c8b 37964d9 51b4c8b 37964d9 95a9c82 37964d9 51b4c8b 37964d9 51b4c8b 37964d9 51b4c8b 37964d9 51b4c8b 37964d9 51b4c8b 37964d9 51b4c8b 37964d9 51b4c8b 37964d9 51b4c8b 37964d9 51b4c8b 37964d9 51b4c8b 07c6642 37964d9 51b4c8b 37964d9 51b4c8b 37964d9 51b4c8b 37964d9 07c6642 51b4c8b 37964d9 51b4c8b 37964d9 51b4c8b 37964d9 51b4c8b 37964d9 51b4c8b 37964d9 51b4c8b 37964d9 51b4c8b 37964d9 51b4c8b 37964d9 51b4c8b 37964d9 51b4c8b 37964d9 51b4c8b 37964d9 51b4c8b 37964d9 51b4c8b 37964d9 51b4c8b 3a4c2e6 51b4c8b 07c6642 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 | ---
base_model: Qwen/Qwen3.5-2B
datasets:
- KRLabsOrg/tool-output-extraction-swebench
language:
- en
license: apache-2.0
pipeline_tag: text-generation
library_name: transformers
tags:
- code
- tool-output
- pruning
- coding-agents
- extraction
thumbnail: https://raw.githubusercontent.com/KRLabsOrg/squeez/main/assets/squeez_mascot.png
---
<p align="center">
<img src="https://raw.githubusercontent.com/KRLabsOrg/squeez/main/assets/squeez_mascot.png" alt="Squeez mascot" width="180">
</p>
# Squeez-2B
**Squeez-2B** is a 2B parameter model fine-tuned from Qwen 3.5 2B for task-conditioned tool-output pruning in coding agents. Given a focused query and one raw tool observation, it extracts the smallest verbatim evidence block the agent should inspect next — removing **92%** of input tokens while retaining **0.86 recall**.
```
Tool output (500 lines) → Squeez → Relevant lines (30 lines) → Agent context
```
- Outperforms zero-shot Qwen 3.5 35B A3B by **+11 recall points**
- Returns verbatim lines only (no rewriting or summarization)
- Works as CLI pipe, Python library, or vLLM server
- Trained on **27 tool types** from real SWE-bench workflows and synthetic multi-ecosystem outputs
**Resources:** [Paper](https://arxiv.org/abs/2604.04979) | [Dataset](https://huggingface.co/datasets/KRLabsOrg/tool-output-extraction-swebench) | [Code & CLI](https://github.com/KRLabsOrg/squeez) | [Blog post](https://huggingface.co/blog/KRLabsOrg/squeez)
## Results
Evaluated on 618 manually curated held-out examples spanning 27 tool types.
| Model | Prec. | Recall | F1 | Compression |
|-------|-------|--------|-----|-------------|
| **Squeez-2B** | **0.80** | **0.86** | **0.80** | 0.92 |
| Qwen 3.5 35B A3B (zero-shot) | 0.74 | 0.75 | 0.73 | 0.92 |
| Kimi K2 (zero-shot) | 0.61 | 0.53 | 0.68 | 0.94 |
| Qwen 3.5 2B (untrained) | 0.42 | 0.53 | 0.55 | 0.82 |
The fine-tuned 2B model is also the most precise system in the comparison, indicating it has learned a tool-specific extraction policy rather than relying on generic instruction following.
### Qualitative patterns
| Pattern | Example | Squeez-2B | Baseline failure |
|---------|---------|-----------|-----------------|
| Precise selection | `git_log`, 21 lines — find one commit | Selects the single correct entry | Qwen 35B picks a plausible but wrong commit |
| Failure-block extraction | Service log, 176 lines — two similar TLS errors | Returns the correct 5-line block | Qwen 35B picks the wrong TLS error (different timestamp) |
| Correct empty prediction | `docker_logs`, 316 lines — no matching evidence | Returns empty output | Qwen 35B generates "No relevant lines found..." |
| Adjacent over-selection | Build output, 110 lines — Dockerfile error | Finds the right error + nearby noise | Qwen 35B misses the Dockerfile error entirely |
On the 59 negative examples in the test set, Squeez-2B correctly returns empty output 80% of the time. Qwen 35B returns empty only 7% of the time.
## Quick Start
### CLI (recommended)
```bash
pip install squeez
# With vLLM server
vllm serve KRLabsOrg/squeez-2b --dtype bfloat16 --max-model-len 16384
export SQUEEZ_SERVER_URL=http://localhost:8000/v1
pytest -q 2>&1 | squeez "find the failure block"
git log --oneline -50 | squeez "find the commit that changed CSRF handling"
cat src/auth/middleware.py | squeez "find the referer validation logic"
```
### Python API
```python
from squeez.inference.extractor import ToolOutputExtractor
# vLLM server
extractor = ToolOutputExtractor(base_url="http://localhost:8000/v1")
# Or local
extractor = ToolOutputExtractor(model_path="KRLabsOrg/squeez-2b")
filtered = extractor.extract(
task="Find the failing test block",
tool_output=raw_output,
)
```
### With transformers directly
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "KRLabsOrg/squeez-2b"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
messages = [
{"role": "system", "content": (
"You prune verbose tool output for a coding agent. "
"Given a focused extraction query and one tool output, return only the "
"smallest verbatim evidence block(s) the agent should read next. "
"Return the kept text inside <relevant_lines> tags. "
"Do not rewrite, summarize, or invent lines."
)},
{"role": "user", "content": (
"<query>
Find the failing authentication test
</query>
"
"<tool_output>
"
"PASSED tests/test_login.py::test_valid_credentials
"
"FAILED tests/test_login.py::test_token_refresh - AssertionError: expected 200 got 401
"
"PASSED tests/test_login.py::test_logout
"
"</tool_output>"
)},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.1, do_sample=True)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
# <relevant_lines>
# FAILED tests/test_login.py::test_token_refresh - AssertionError: expected 200 got 401
# </relevant_lines>
```
## Input/Output Format
**Input** — Chat messages with system prompt:
- System: extraction instructions (see above)
- User: `<query>{task}</query>
<tool_output>{raw_output}</tool_output>`
**Output** — Verbatim lines in XML tags:
```
<relevant_lines>
{only the lines that matter, copied verbatim}
</relevant_lines>
```
## Supported Tool Types (27)
**SWE-bench derived (14):** `read_file` | `grep` | `git_log` | `git_blame` | `git_diff` | `test_output` | `python` | `curl` | `pip_install` | `ls` | `lint_output` | `build_output` | `type_check` | `coverage`
**Synthetic multi-ecosystem (13):** `npm_build` | `tsc` | `npm_install` | `docker_logs` | `docker_build` | `make_cmake` | `kubectl` | `cargo_build` | `go_build` | `mvn_gradle` | `terraform` | `mypy_pyright` | `eslint`
## Training Details
| | |
|---|---|
| **Base model** | Qwen/Qwen3.5-2B |
| **Method** | LoRA (r=16, alpha=32) via Unsloth |
| **Training data** | 10,508 examples (SWE-bench + synthetic) |
| **Epochs** | 3 |
| **Max sequence length** | 20,000 tokens |
| **Learning rate** | 2e-4 |
| **Batch size** | 8 (32 effective with 4x gradient accumulation) |
| **Hardware** | Single NVIDIA A100 80GB |
| **Dataset** | [KRLabsOrg/tool-output-extraction-swebench](https://huggingface.co/datasets/KRLabsOrg/tool-output-extraction-swebench) |
## Usage with Coding Agents
Add to your `CLAUDE.md` or agent system prompt:
```
When you invoke a shell command, pipe it through `squeez` and describe what you need.
Examples:
- bun test 2>&1 | squeez "did the tests pass?"
- git log --oneline -50 | squeez "find the commit that broke CSRF"
- cat src/auth/middleware.py | squeez "find the referer validation logic"
```
## Limitations
- Best on software engineering tool output; not designed for general-purpose summarization
- Synthetic data generated by `openai/gpt-oss-120b` — may not fully reflect real-world distributions for all ecosystems
- Evaluates single tool observations, not full agent trajectories
- Max input: 20,000 tokens (training length); can be extended at serving time
## License
Apache 2.0
## Citation
```bibtex
@misc{kovács2026squeeztaskconditionedtooloutputpruning,
title={Squeez: Task-Conditioned Tool-Output Pruning for Coding Agents},
author={Ádám Kovács},
year={2026},
eprint={2604.04979},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2604.04979},
}
``` |