Spaces:
Running on Zero
A newer version of the Gradio SDK is available: 6.18.0
Get Started
Once python app.py is running, head to http://localhost:7860 in your browser. You'll see two tabs.
Compress tab
This is where the action is.
- Paste your text β could be a long prompt, meeting notes, an article, anything really
- Use the slider to set your token budget (anywhere from 100 to 1000)
- Hit Compress
As you type or adjust the slider, a status banner updates live:
- Green β the input is over budget, compression will run
- Red β the input is already within budget, nothing to do
On the right you'll see:
- The compressed version of your text
- How many tokens went in vs came out
- The compression ratio (how much it shrank)
- A quality score between 0 and 1 β closer to 1 means the meaning held up well
Once the result appears, π Helpful and π Not helpful buttons show up below the metrics. Click either one to rate the result β the feedback is saved instantly. A note field then slides in where you can optionally type what worked well or didn't (e.g. "lost key dates", "too short", "great summary") and hit Save note. Both the rating and the note are stored with the run and visible in the History tab.
Every run saves automatically in the background. You don't need to do anything.
Token Highlights
Below the input box there's a Show Token Highlights button. Click it and each token in your input gets rendered as a colour-coded chip β useful for seeing exactly where your budget is going. The panel updates live as you type. Click again to hide it.
Switching the compression model
Click Model Settings at the top of the tab to expand the accordion. Pick a model from the dropdown (or type a custom HuggingFace model ID) and hit Load Model. The current model is unloaded from memory first, then the new one loads β no restart needed. The status box confirms when it's ready.
Available presets: Qwen2.5-1.5B-Instruct (default), Qwen2.5-0.5B-Instruct, SmolLM2-1.7B-Instruct, Phi-3.5-mini-instruct, Llama-3.2-1B-Instruct.
Switching the scoring embedder
Below the compression model section in the same accordion, there's a separate Embedder Model dropdown. The embedder is what computes the quality score β changing it affects how accurately that score reflects meaning retention.
When you select a model from the dropdown, an info panel updates immediately to explain the trade-off:
- β‘ Fast models (MiniLM, bge-small) β low overhead, good baseline scores, CPU-friendly
- βοΈ Balanced models (mpnet, bge-base) β more discriminating scores, small speed cost
- π High quality models (mxbai-large) β most accurate scores, GPU recommended
- π¬ Best quality models (gte-Qwen2-1.5B) β catches subtle meaning loss, requires significant RAM/VRAM
Hit Load Embedder to apply the selection. The previous embedder is unloaded from memory before the new one loads.
History tab
Click over here to see everything that's been compressed so far.
The table loads automatically when you open the tab. Hit Refresh to pull in the latest runs. At the top you'll find the average quality score and compression ratio across all sessions β a quick way to see how the tool is performing over time.
Column visibility
By default the table shows: id, timestamp, model, compression_ratio, quality_score, feedback. Open the Column visibility accordion above the table to toggle any additional columns on or off β changes apply instantly without a refresh.
Side-by-side diff
Click any row in the table and a word-level diff panel opens below it. Words are colour-coded:
- Red strikethrough β dropped from the original
- Amber β rewritten by the model
- Green β inserted (rare connector words)
- Plain β survived unchanged
Deleting a run
Click a row to select it, then hit Delete Selected Row. The table refreshes and the aggregate stats update automatically.
π README.md