adaamko commited on
Commit
e137a82
Β·
verified Β·
1 Parent(s): 51b4c8b

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +7 -11
README.md CHANGED
@@ -23,26 +23,22 @@ datasets:
23
 
24
  # Squeez-2B
25
 
26
- LLM coding agents spend **80-95% of their context window** on irrelevant tool output β€” passing test names, boilerplate headers, unchanged files. Squeez reads the raw output alongside a task description and returns **only the lines the agent needs to read next**, compressing tool output by ~91% on average while keeping 86% of the relevant information.
27
-
28
- Unlike keyword search (BM25) or generic semantic highlighting, Squeez is trained specifically on tool output from real software engineering workflows β€” test logs, grep results, build errors, git diffs, stack traces, and more.
29
 
30
  ## What is Squeez?
31
 
32
- Squeez is a **tool output pruner for coding agents**. When an agent runs a tool (pytest, grep, git log, npm build, kubectl, etc.), the output is often hundreds of lines β€” but only a handful matter for the current task. Squeez acts as a filter between the tool and the agent's context window:
33
 
34
  ```
35
  Tool output (500 lines) β†’ Squeez β†’ Relevant lines (30 lines) β†’ Agent context
36
  ```
37
 
38
- This model (Squeez-2B) is a generative approach: [Qwen 3.5 2B](https://huggingface.co/Qwen/Qwen3.5-2B) fine-tuned to extract verbatim relevant lines from tool output, given a task-specific query.
39
-
40
- ### Why a small fine-tuned model?
41
 
42
- - **Fast**: 2B parameters β€” runs on a single GPU or even CPU, serves via vLLM at high throughput
43
- - **Accurate**: Outperforms a 35B MoE model (Qwen 3.5 35B A3B) at zero-shot by **+13% Span F1**
44
- - **Faithful**: Returns verbatim lines only β€” no rewriting, no hallucination, no summarization
45
- - **Drop-in**: Works as a CLI pipe, Python library, or vLLM server β€” integrates with any agent framework
46
 
47
  ## Evaluation
48
 
 
23
 
24
  # Squeez-2B
25
 
26
+ LLM coding agents spend 80-95% of their context window on irrelevant tool output. Squeez filters it down to the lines that actually matter, compressing tool output by ~91% while keeping 86% of the relevant information.
 
 
27
 
28
  ## What is Squeez?
29
 
30
+ A tool output pruner for coding agents. When an agent runs a tool (pytest, grep, git log, npm build, kubectl, etc.), the output is often hundreds of lines but only a handful matter for the current task. Squeez sits between the tool and the agent's context window:
31
 
32
  ```
33
  Tool output (500 lines) β†’ Squeez β†’ Relevant lines (30 lines) β†’ Agent context
34
  ```
35
 
36
+ This model is [Qwen 3.5 2B](https://huggingface.co/Qwen/Qwen3.5-2B) fine-tuned to extract verbatim relevant lines from tool output given a task-specific query. It's trained on real software engineering tool output from SWE-bench (test logs, grep results, build errors, git diffs, stack traces, etc.), not generic text.
 
 
37
 
38
+ - 2B parameters, runs on a single GPU, serves via vLLM
39
+ - Outperforms Qwen 3.5 35B A3B zero-shot by +13% Span F1
40
+ - Returns verbatim lines only, no rewriting or summarization
41
+ - Works as a CLI pipe, Python library, or vLLM server
42
 
43
  ## Evaluation
44