# Get Started Once `python app.py` is running, head to `http://localhost:7860` in your browser. You'll see two tabs. ## Compress tab This is where the action is. 1. Paste your text — could be a long prompt, meeting notes, an article, anything really 2. Use the slider to set your token budget (anywhere from 100 to 1000) 3. Hit **Compress** As you type or adjust the slider, a status banner updates live: - **Green** — the input is over budget, compression will run - **Red** — the input is already within budget, nothing to do On the right you'll see: - The compressed version of your text - How many tokens went in vs came out - The compression ratio (how much it shrank) - A quality score between 0 and 1 — closer to 1 means the meaning held up well Once the result appears, **👍 Helpful** and **👎 Not helpful** buttons show up below the metrics. Click either one to rate the result — the feedback is saved instantly. A note field then slides in where you can optionally type what worked well or didn't (e.g. "lost key dates", "too short", "great summary") and hit **Save note**. Both the rating and the note are stored with the run and visible in the History tab. Every run saves automatically in the background. You don't need to do anything. ### Token Highlights Below the input box there's a **Show Token Highlights** button. Click it and each token in your input gets rendered as a colour-coded chip — useful for seeing exactly where your budget is going. The panel updates live as you type. Click again to hide it. ### Switching the compression model Click **Model Settings** at the top of the tab to expand the accordion. Pick a model from the dropdown (or type a custom HuggingFace model ID) and hit **Load Model**. The current model is unloaded from memory first, then the new one loads — no restart needed. The status box confirms when it's ready. Available presets: Qwen2.5-1.5B-Instruct (default), Qwen2.5-0.5B-Instruct, SmolLM2-1.7B-Instruct, Phi-3.5-mini-instruct, Llama-3.2-1B-Instruct. ### Switching the scoring embedder Below the compression model section in the same accordion, there's a separate **Embedder Model** dropdown. The embedder is what computes the quality score — changing it affects how accurately that score reflects meaning retention. When you select a model from the dropdown, an info panel updates immediately to explain the trade-off: - ⚡ **Fast** models (MiniLM, bge-small) — low overhead, good baseline scores, CPU-friendly - ⚖️ **Balanced** models (mpnet, bge-base) — more discriminating scores, small speed cost - 🏆 **High quality** models (mxbai-large) — most accurate scores, GPU recommended - 🔬 **Best quality** models (gte-Qwen2-1.5B) — catches subtle meaning loss, requires significant RAM/VRAM Hit **Load Embedder** to apply the selection. The previous embedder is unloaded from memory before the new one loads. ## History tab Click over here to see everything that's been compressed so far. The table loads automatically when you open the tab. Hit **Refresh** to pull in the latest runs. At the top you'll find the average quality score and compression ratio across all sessions — a quick way to see how the tool is performing over time. ### Column visibility By default the table shows: `id`, `timestamp`, `model`, `compression_ratio`, `quality_score`, `feedback`. Open the **Column visibility** accordion above the table to toggle any additional columns on or off — changes apply instantly without a refresh. ### Side-by-side diff Click any row in the table and a word-level diff panel opens below it. Words are colour-coded: - Red strikethrough — dropped from the original - Amber — rewritten by the model - Green — inserted (rare connector words) - Plain — survived unchanged ### Deleting a run Click a row to select it, then hit **Delete Selected Row**. The table refreshes and the aggregate stats update automatically. 🏠 [README.md](../README.md)