tiny-press / docs /get-started.md
sriharsha-cr's picture
Project files
ebc3bf5
# Get Started
Once `python app.py` is running, head to `http://localhost:7860` in your browser. You'll see two tabs.
## Compress tab
This is where the action is.
1. Paste your text β€” could be a long prompt, meeting notes, an article, anything really
2. Use the slider to set your token budget (anywhere from 100 to 1000)
3. Hit **Compress**
As you type or adjust the slider, a status banner updates live:
- **Green** β€” the input is over budget, compression will run
- **Red** β€” the input is already within budget, nothing to do
On the right you'll see:
- The compressed version of your text
- How many tokens went in vs came out
- The compression ratio (how much it shrank)
- A quality score between 0 and 1 β€” closer to 1 means the meaning held up well
Once the result appears, **πŸ‘ Helpful** and **πŸ‘Ž Not helpful** buttons show up below the metrics. Click either one to rate the result β€” the feedback is saved instantly. A note field then slides in where you can optionally type what worked well or didn't (e.g. "lost key dates", "too short", "great summary") and hit **Save note**. Both the rating and the note are stored with the run and visible in the History tab.
Every run saves automatically in the background. You don't need to do anything.
### Token Highlights
Below the input box there's a **Show Token Highlights** button. Click it and each token in your input gets rendered as a colour-coded chip β€” useful for seeing exactly where your budget is going. The panel updates live as you type. Click again to hide it.
### Switching the compression model
Click **Model Settings** at the top of the tab to expand the accordion. Pick a model from the dropdown (or type a custom HuggingFace model ID) and hit **Load Model**. The current model is unloaded from memory first, then the new one loads β€” no restart needed. The status box confirms when it's ready.
Available presets: Qwen2.5-1.5B-Instruct (default), Qwen2.5-0.5B-Instruct, SmolLM2-1.7B-Instruct, Phi-3.5-mini-instruct, Llama-3.2-1B-Instruct.
### Switching the scoring embedder
Below the compression model section in the same accordion, there's a separate **Embedder Model** dropdown. The embedder is what computes the quality score β€” changing it affects how accurately that score reflects meaning retention.
When you select a model from the dropdown, an info panel updates immediately to explain the trade-off:
- ⚑ **Fast** models (MiniLM, bge-small) β€” low overhead, good baseline scores, CPU-friendly
- βš–οΈ **Balanced** models (mpnet, bge-base) β€” more discriminating scores, small speed cost
- πŸ† **High quality** models (mxbai-large) β€” most accurate scores, GPU recommended
- πŸ”¬ **Best quality** models (gte-Qwen2-1.5B) β€” catches subtle meaning loss, requires significant RAM/VRAM
Hit **Load Embedder** to apply the selection. The previous embedder is unloaded from memory before the new one loads.
## History tab
Click over here to see everything that's been compressed so far.
The table loads automatically when you open the tab. Hit **Refresh** to pull in the latest runs. At the top you'll find the average quality score and compression ratio across all sessions β€” a quick way to see how the tool is performing over time.
### Column visibility
By default the table shows: `id`, `timestamp`, `model`, `compression_ratio`, `quality_score`, `feedback`. Open the **Column visibility** accordion above the table to toggle any additional columns on or off β€” changes apply instantly without a refresh.
### Side-by-side diff
Click any row in the table and a word-level diff panel opens below it. Words are colour-coded:
- Red strikethrough β€” dropped from the original
- Amber β€” rewritten by the model
- Green β€” inserted (rare connector words)
- Plain β€” survived unchanged
### Deleting a run
Click a row to select it, then hit **Delete Selected Row**. The table refreshes and the aggregate stats update automatically.
🏠 [README.md](../README.md)