Spaces:

build-small-hackathon
/

tiny-press

Running on Zero

App Files Files Community

sriharsha-cr commited on 6 days ago

Commit

ebc3bf5

1 Parent(s): f349c7e

Project files

Browse files

Files changed (20) hide show

.gitignore +16 -0
README.md +123 -2
app.py +26 -0
config.py +68 -0
core/compressor.py +48 -0
core/diff.py +160 -0
core/scorer.py +11 -0
core/tokenizer_utils.py +13 -0
db/schema.sql +16 -0
db/store.py +87 -0
docs/architecture.md +55 -0
docs/enhancements.md +27 -0
docs/folder-structure.md +44 -0
docs/get-started.md +72 -0
docs/setup.md +95 -0
models/model_loader.py +98 -0
requirements.txt +8 -0
tinypress_colab.ipynb +336 -0
ui/compress_tab.py +327 -0
ui/history_tab.py +96 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,16 @@

+# I store my findings here, not public 😂
+my-notes
+# You dont need my credentials 🫣
+.env
+.venv
+# I did use vibe code 😉
+.claude
+CLAUDE.md
+AGENTS.md
+claude*
+# Caches
+__pycache__
+*.db

README.md CHANGED Viewed

@@ -4,10 +4,131 @@ emoji: 📊
 colorFrom: indigo
 colorTo: gray
 sdk: gradio
-sdk_version: 6.18.0
 python_version: '3.12'
 app_file: app.py
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 colorFrom: indigo
 colorTo: gray
 sdk: gradio
+sdk_version: "5.0"
 python_version: '3.12'
 app_file: app.py
 pinned: false
+license: mit
+short_description: Compress any text to a token budget — scored and diffed, fully local.
+models:
+    - Qwen/Qwen2.5-1.5B-Instruct
+tags:
+  - gradio
+  - build-small-hackathon
+  - thousand-token-wood
+  - text-compression
+  - prompt-optimization
+  - local-inference
 ---
+# TinyPress — Prompt Compression Engine
+> **HuggingFace Build Small Hackathon · Track: Thousand Token Wood**
+The constraint *is* the feature. Give TinyPress a long piece of text, set a token budget, and get back a compressed version that still carries the meaning — scored, saved, and diffed so you can see exactly what was kept and what was shed.
+No cloud. No API bill. Two small models running quietly on your machine.
+---
+## Why this fits Thousand Token Wood
+Working inside a tight token budget is not a limitation to work around — it is the problem worth solving. LLM context windows are finite, prompt costs are real, and bloated inputs degrade output quality. TinyPress treats the token count as a hard constraint and makes compression the primary interaction: you set the budget, the model meets it, and a quality score tells you how much meaning survived.
+---
+## Features
+| | |
+|---|---|
+| 🗜️ **Token-budget compression** | Set a target (100–1000 tokens) and compress to exactly that budget |
+| 📊 **Quality score** | Cosine similarity between original and compressed text — 0 to 1, higher is better |
+| 🟢🔴 **Live readiness banner** | Green when input is over budget and compression will run; red when already within budget |
+| 🔍 **Token highlight panel** | Every token rendered as a colour-coded chip so you can see where your budget is going |
+| 🔀 **Model hot-swap** | Switch the compression LLM mid-session without a restart (5 curated models, or any HF model ID) |
+| 🎯 **Embedder hot-swap** | Switch the scoring embedder with per-model trade-off info (speed vs quality vs RAM) |
+| 👍👎 **Feedback capture** | Rate every result, add an optional text note — saved instantly to SQLite |
+| 📜 **Run history** | Every compression persisted locally with full metrics and configurable column visibility |
+| 🔎 **Side-by-side diff** | Word-level colour diff — dropped (red), rewritten (amber), inserted (green), unchanged (plain) |
+---
+## Models
+| Role | Default | Alternatives |
+|---|---|---|
+| Compression LLM | `Qwen/Qwen2.5-1.5B-Instruct` | Qwen2.5-0.5B, SmolLM2-1.7B, Phi-3.5-mini, Llama-3.2-1B |
+| Quality scorer | `sentence-transformers/all-MiniLM-L6-v2` | mpnet-base, bge-small, bge-base, mxbai-large, gte-Qwen2-1.5B |
+All models are open-weight and under 32B. Everything runs locally — no API calls, no data leaves your machine.
+---
+## Get started
+```bash
+python -m venv .venv
+# Windows
+.venv\Scripts\activate
+# macOS / Linux
+source .venv/bin/activate
+pip install -r requirements.txt
+python app.py
+```
+Open `http://localhost:7860`. That's it.
+**Run it in Colab:** open `tinypress_colab.ipynb` — it installs dependencies, loads the models, and launches a public Gradio share URL. GPU runtime recommended for faster inference.
+Optional environment overrides:
+| Variable | Default | Description |
+|---|---|---|
+| `LLM_MODEL` | `Qwen/Qwen2.5-1.5B-Instruct` | Compression model |
+| `EMBEDDER_MODEL` | `sentence-transformers/all-MiniLM-L6-v2` | Scoring embedder |
+| `DB_PATH` | `tinypress.db` | SQLite database path |
+| `PORT` | `7860` | Gradio server port |
+---
+## Hardware
+| | Minimum | Recommended |
+|---|---|---|
+| RAM | 8 GB | 16 GB |
+| VRAM | CPU-only works | 4 GB GPU speeds up inference |
+| Disk | ~4 GB | ~4 GB |
+---
+## Architecture
+```
+Input text + token budget
+        │
+  core/compressor.py     — builds prompt, calls LLM, hard-trims if it overshoots
+        │
+  models/model_loader.py — Qwen2.5-1.5B (or swapped model), loaded once, reused
+        │
+  core/scorer.py         — cosine similarity via sentence-transformer embedder
+        │
+  db/store.py            — saves run to SQLite
+        │
+  ui/compress_tab.py     — shows result, metrics, feedback UI
+```
+Thin UI layer — Gradio handlers pass inputs to `core/`, return outputs. All logic lives in `core/` and `db/`.
+Full docs: [Architecture](docs/architecture.md) · [Setup](docs/setup.md) · [Get Started](docs/get-started.md) · [Folder Structure](docs/folder-structure.md)
+---
+## About
+Built by **[Sriharsha C R](https://www.linkedin.com/in/sriharsha-cr)** — AI Engineer and Cloud Native developer.
+[![LinkedIn](https://img.shields.io/badge/LinkedIn-sriharsha--cr-0a66c2?logo=linkedin&logoColor=white)](https://www.linkedin.com/in/sriharsha-cr)
+[![X / Twitter](https://img.shields.io/badge/X-@sriharsha__cr-000000?logo=x&logoColor=white)](https://x.com/sriharsha_cr)
+[![HuggingFace](https://img.shields.io/badge/HuggingFace-sriharsha--cr-ff9d00?logo=huggingface&logoColor=white)](https://huggingface.co/sriharsha-cr)
+[![GitHub](https://img.shields.io/badge/GitHub-SriharshaCR-181717?logo=github&logoColor=white)](https://github.com/SriharshaCR)

app.py ADDED Viewed

	@@ -0,0 +1,26 @@

+import gradio as gr
+import config
+from db.store import init_db
+from models.model_loader import get_llm, get_embedder
+from ui.compress_tab import build_compress_tab
+from ui.history_tab import build_history_tab
+def build_app() -> gr.Blocks:
+    with gr.Blocks(title=config.APP_TITLE) as app:
+        build_compress_tab()
+        build_history_tab()
+    return app
+if __name__ == "__main__":
+    print("Initialising database...")
+    init_db()
+    print("Loading models (first run may download weights)...")
+    get_llm()
+    get_embedder()
+    print("Starting TinyPress...")
+    app = build_app()
+    app.launch(server_port=config.SERVER_PORT)

config.py ADDED Viewed

	@@ -0,0 +1,68 @@

+import os
+# Model settings
+LLM_MODEL = os.getenv("LLM_MODEL", "Qwen/Qwen2.5-1.5B-Instruct")
+EMBEDDER_MODEL = os.getenv("EMBEDDER_MODEL", "sentence-transformers/all-MiniLM-L6-v2")
+# Curated <32B open-weight causal LMs for local inference (shown in the UI dropdown).
+AVAILABLE_MODELS = [
+    "Qwen/Qwen2.5-1.5B-Instruct",
+    "Qwen/Qwen2.5-0.5B-Instruct",
+    "HuggingFaceTB/SmolLM2-1.7B-Instruct",
+    "microsoft/Phi-3.5-mini-instruct",
+    "meta-llama/Llama-3.2-1B-Instruct",
+]
+# Curated sentence-transformer embedding models for quality scoring.
+AVAILABLE_EMBEDDER_MODELS = [
+    "sentence-transformers/all-MiniLM-L6-v2",
+    "sentence-transformers/all-mpnet-base-v2",
+    "BAAI/bge-small-en-v1.5",
+    "BAAI/bge-base-en-v1.5",
+    "mixedbread-ai/mxbai-embed-large-v1",
+    "Alibaba-NLP/gte-Qwen2-1.5B-instruct",
+]
+EMBEDDER_INFO = {
+    "sentence-transformers/all-MiniLM-L6-v2": (
+        "⚡ **Fast · 22M params · Default**  \n"
+        "Great baseline. Scores are reliable for typical compression ratios. "
+        "Runs comfortably on CPU — minimal overhead."
+    ),
+    "sentence-transformers/all-mpnet-base-v2": (
+        "⚖️ **Balanced · 110M params**  \n"
+        "Noticeably sharper quality scores than MiniLM, especially on longer texts. "
+        "Small speed trade-off; fine on CPU."
+    ),
+    "BAAI/bge-small-en-v1.5": (
+        "⚡ **Fast · 33M params**  \n"
+        "Strong quality-to-size ratio — often matches MiniLM on accuracy while being "
+        "slightly more sensitive to meaning shifts. Good CPU option."
+    ),
+    "BAAI/bge-base-en-v1.5": (
+        "⚖️ **Balanced · 109M params**  \n"
+        "Consistently strong on semantic similarity benchmarks. "
+        "Scores will be more discriminating — small differences in compression quality show up more clearly."
+    ),
+    "mixedbread-ai/mxbai-embed-large-v1": (
+        "🏆 **High quality · 335M params**  \n"
+        "Top-tier similarity scores. Quality readings will be the most accurate here, "
+        "but slower to load and run. GPU recommended."
+    ),
+    "Alibaba-NLP/gte-Qwen2-1.5B-instruct": (
+        "🔬 **Best quality · 1.5B params**  \n"
+        "Strongest semantic understanding in this list. Scores will reflect subtle meaning loss "
+        "that smaller models miss. Requires significant RAM/VRAM — GPU strongly recommended."
+    ),
+}
+# Compression settings
+DEFAULT_TARGET_TOKENS = 500
+MAX_NEW_TOKENS = 1024
+# Database
+DB_PATH = os.getenv("DB_PATH", "tinypress.db")
+# Gradio
+APP_TITLE = "TinyPress"
+SERVER_PORT = int(os.getenv("PORT", 7860))

core/compressor.py ADDED Viewed

	@@ -0,0 +1,48 @@

+import torch
+import config
+from core.tokenizer_utils import count_tokens
+from models.model_loader import get_llm
+_PROMPT_TEMPLATE = """You are a lossless compression assistant. Compress the following text to at most {target} tokens.
+Preserve all key facts, decisions, and intent. Do not add commentary. Output only the compressed text.
+TEXT:
+{text}
+COMPRESSED:"""
+def _generate(prompt: str) -> str:
+    model, tokenizer = get_llm()
+    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+    with torch.no_grad():
+        output_ids = model.generate(
+            **inputs,
+            max_new_tokens=config.MAX_NEW_TOKENS,
+            do_sample=False,
+            pad_token_id=tokenizer.eos_token_id,
+        )
+    new_tokens = output_ids[0][inputs["input_ids"].shape[1]:]
+    return tokenizer.decode(new_tokens, skip_special_tokens=True).strip()
+def compress(text: str, target_tokens: int) -> tuple[str, int, int]:
+    """Returns (compressed_text, input_token_count, output_token_count)."""
+    input_tokens = count_tokens(text)
+    # If already within budget, return as-is
+    if input_tokens <= target_tokens:
+        return text, input_tokens, input_tokens
+    prompt = _PROMPT_TEMPLATE.format(target=target_tokens, text=text)
+    compressed = _generate(prompt)
+    # Trim to hard token limit if model overshoots
+    _, tokenizer = get_llm()
+    ids = tokenizer.encode(compressed, add_special_tokens=False)
+    if len(ids) > target_tokens:
+        compressed = tokenizer.decode(ids[:target_tokens], skip_special_tokens=True)
+    output_tokens = count_tokens(compressed)
+    return compressed, input_tokens, output_tokens

core/diff.py ADDED Viewed

	@@ -0,0 +1,160 @@

+import difflib
+import html as _h
+def _word_diff(original: str, compressed: str) -> tuple[str, str]:
+    """
+    Word-level SequenceMatcher diff.
+    Returns (annotated_original_html, annotated_compressed_html).
+    Colour key:
+      original  — red strikethrough  = dropped
+      original  — plain              = survived unchanged
+      compressed — amber             = rewritten (replaced)
+      compressed — green             = inserted (rare; model added a connector word)
+      compressed — plain             = survived unchanged
+    """
+    orig_words = original.split()
+    comp_words = compressed.split()
+    matcher = difflib.SequenceMatcher(None, orig_words, comp_words, autojunk=False)
+    orig_parts: list[str] = []
+    comp_parts: list[str] = []
+    for tag, i1, i2, j1, j2 in matcher.get_opcodes():
+        ow = _h.escape(" ".join(orig_words[i1:i2]))
+        cw = _h.escape(" ".join(comp_words[j1:j2]))
+        if tag == "equal":
+            orig_parts.append(ow)
+            comp_parts.append(cw)
+        elif tag == "delete":
+            orig_parts.append(
+                f'<mark style="background:#fee2e2;color:#b91c1c;'
+                f'text-decoration:line-through;padding:1px 3px;border-radius:3px">{ow}</mark>'
+            )
+        elif tag == "insert":
+            comp_parts.append(
+                f'<mark style="background:#dcfce7;color:#15803d;'
+                f'padding:1px 3px;border-radius:3px">{cw}</mark>'
+            )
+        elif tag == "replace":
+            orig_parts.append(
+                f'<mark style="background:#fee2e2;color:#b91c1c;'
+                f'text-decoration:line-through;padding:1px 3px;border-radius:3px">{ow}</mark>'
+            )
+            comp_parts.append(
+                f'<mark style="background:#fef9c3;color:#92400e;'
+                f'padding:1px 3px;border-radius:3px">{cw}</mark>'
+            )
+    return " ".join(orig_parts), " ".join(comp_parts)
+def render_diff_html(record: dict) -> str:
+    """Build a self-contained side-by-side diff HTML block for a compression run."""
+    original   = record.get("input_text", "")
+    compressed = record.get("output_text", "")
+    if not original or not compressed:
+        return ""
+    orig_html, comp_html = _word_diff(original, compressed)
+    model      = _h.escape(record.get("model", "—"))
+    tokenizer  = _h.escape(record.get("tokenizer", "—"))
+    ts         = _h.escape(record.get("timestamp", "—"))
+    in_tok     = record.get("input_tokens", "—")
+    out_tok    = record.get("output_tokens", "—")
+    target_tok = record.get("target_tokens", "—")
+    ratio      = record.get("compression_ratio", 0)
+    quality    = record.get("quality_score", 0)
+    duration   = record.get("duration_ms", "—")
+    run_id     = record.get("id", "—")
+    feedback_val  = record.get("feedback")
+    feedback_note = _h.escape(record.get("feedback_comment") or "")
+    # Build optional feedback block
+    if feedback_val is not None:
+        badge_bg    = "#f0fdf4" if feedback_val == 1 else "#fef2f2"
+        badge_color = "#15803d" if feedback_val == 1 else "#b91c1c"
+        badge_text  = "👍 Helpful" if feedback_val == 1 else "👎 Not helpful"
+        feedback_block = (
+            f'<div style="display:flex;flex-wrap:wrap;align-items:center;gap:8px;'
+            f'margin-top:10px;padding:8px 12px;border-radius:6px;background:{badge_bg}">'
+            f'<span style="font-weight:600;font-size:0.8rem;color:{badge_color}">{badge_text}</span>'
+        )
+        if feedback_note:
+            feedback_block += (
+                f'<span style="font-size:0.8rem;color:#374151;font-style:italic">'
+                f'"{feedback_note}"</span>'
+            )
+        feedback_block += "</div>"
+    else:
+        feedback_block = ""
+    return f"""
+<div style="font-family:system-ui,sans-serif;margin-top:4px">
+  <!-- Primary meta chips -->
+  <div style="display:flex;flex-wrap:wrap;gap:6px;margin-bottom:6px;font-size:0.78rem">
+    <span style="background:#f3f4f6;padding:3px 9px;border-radius:12px;color:#374151">Run #{run_id}</span>
+    <span style="background:#f3f4f6;padding:3px 9px;border-radius:12px;color:#374151">{ts}</span>
+    <span style="background:#eff6ff;padding:3px 9px;border-radius:12px;color:#1d4ed8">{model}</span>
+    <span style="background:#f0fdf4;padding:3px 9px;border-radius:12px;color:#15803d">Quality {quality:.4f}</span>
+    <span style="background:#fff7ed;padding:3px 9px;border-radius:12px;color:#c2410c">Ratio {ratio:.4f}</span>
+    <span style="background:#faf5ff;padding:3px 9px;border-radius:12px;color:#7e22ce">⏱ {duration} ms</span>
+  </div>
+  <!-- Secondary meta chips -->
+  <div style="display:flex;flex-wrap:wrap;gap:6px;margin-bottom:12px;font-size:0.78rem">
+    <span style="background:#f3f4f6;padding:3px 9px;border-radius:12px;color:#374151">{in_tok} in → {out_tok} out (target {target_tok})</span>
+    <span style="background:#f3f4f6;padding:3px 9px;border-radius:12px;color:#374151">tokenizer: {tokenizer}</span>
+  </div>
+  <!-- Side-by-side panels -->
+  <div style="display:grid;grid-template-columns:1fr 1fr;gap:12px">
+    <!-- Original -->
+    <div style="border:1px solid #fecaca;border-radius:8px;overflow:hidden">
+      <div style="background:#fef2f2;padding:8px 14px;border-bottom:1px solid #fecaca;
+                  display:flex;justify-content:space-between;align-items:center">
+        <span style="font-weight:700;font-size:0.8rem;color:#b91c1c;letter-spacing:.04em">ORIGINAL</span>
+        <span style="font-size:0.75rem;color:#6b7280">{in_tok} tokens</span>
+      </div>
+      <div style="padding:14px;line-height:1.8;font-size:0.875rem;color:#1a1a1a;
+                  max-height:340px;overflow-y:auto;word-break:break-word">
+        {orig_html}
+      </div>
+    </div>
+    <!-- Compressed -->
+    <div style="border:1px solid #bbf7d0;border-radius:8px;overflow:hidden">
+      <div style="background:#f0fdf4;padding:8px 14px;border-bottom:1px solid #bbf7d0;
+                  display:flex;justify-content:space-between;align-items:center">
+        <span style="font-weight:700;font-size:0.8rem;color:#15803d;letter-spacing:.04em">COMPRESSED</span>
+        <span style="font-size:0.75rem;color:#6b7280">{out_tok} tokens</span>
+      </div>
+      <div style="padding:14px;line-height:1.8;font-size:0.875rem;color:#1a1a1a;
+                  max-height:340px;overflow-y:auto;word-break:break-word">
+        {comp_html}
+      </div>
+    </div>
+  </div>
+  {feedback_block}
+  <!-- Legend -->
+  <div style="display:flex;flex-wrap:wrap;gap:14px;margin-top:10px;font-size:0.75rem;color:#6b7280;align-items:center">
+    <mark style="background:#fee2e2;color:#b91c1c;text-decoration:line-through;padding:2px 7px;border-radius:3px">dropped</mark>
+    <mark style="background:#fef9c3;color:#92400e;padding:2px 7px;border-radius:3px">rewritten</mark>
+    <mark style="background:#dcfce7;color:#15803d;padding:2px 7px;border-radius:3px">inserted</mark>
+    <span>plain = unchanged</span>
+  </div>
+</div>
+"""

core/scorer.py ADDED Viewed

	@@ -0,0 +1,11 @@

+import numpy as np
+from models.model_loader import get_embedder
+def semantic_score(original: str, compressed: str) -> float:
+    embedder = get_embedder()
+    vecs = embedder.encode([original, compressed], convert_to_numpy=True)
+    cos = float(
+        np.dot(vecs[0], vecs[1]) / (np.linalg.norm(vecs[0]) * np.linalg.norm(vecs[1]))
+    )
+    return round(max(0.0, min(1.0, cos)), 4)

core/tokenizer_utils.py ADDED Viewed

	@@ -0,0 +1,13 @@

+from models.model_loader import get_llm
+def count_tokens(text: str) -> int:
+    _, tokenizer = get_llm()
+    return len(tokenizer.encode(text, add_special_tokens=False))
+def get_token_strings(text: str) -> list[str]:
+    """Return the decoded surface string for every token in text."""
+    _, tokenizer = get_llm()
+    ids = tokenizer.encode(text, add_special_tokens=False)
+    return [tokenizer.decode([i]) for i in ids]

db/schema.sql ADDED Viewed

	@@ -0,0 +1,16 @@

+CREATE TABLE IF NOT EXISTS compression_runs (
+    id                INTEGER PRIMARY KEY AUTOINCREMENT,
+    timestamp         TEXT    NOT NULL,
+    model             TEXT    NOT NULL,
+    tokenizer         TEXT    NOT NULL,
+    input_tokens      INTEGER NOT NULL,
+    output_tokens     INTEGER NOT NULL,
+    target_tokens     INTEGER NOT NULL,
+    compression_ratio REAL    NOT NULL,
+    quality_score     REAL    NOT NULL,
+    duration_ms       REAL    NOT NULL,
+    input_text        TEXT    NOT NULL,
+    output_text       TEXT    NOT NULL,
+    feedback          INTEGER,
+    feedback_comment  TEXT
+);

db/store.py ADDED Viewed

	@@ -0,0 +1,87 @@

+import sqlite3
+import config
+from pathlib import Path
+def _connect():
+    conn = sqlite3.connect(config.DB_PATH)
+    conn.row_factory = sqlite3.Row
+    return conn
+def init_db():
+    schema = Path(__file__).parent / "schema.sql"
+    conn = _connect()
+    conn.executescript(schema.read_text())
+    # Migrate existing databases that pre-date new columns.
+    for col, typedef in [("tokenizer", "TEXT NOT NULL DEFAULT ''"), ("duration_ms", "REAL NOT NULL DEFAULT 0"), ("feedback", "INTEGER"), ("feedback_comment", "TEXT")]:
+        try:
+            conn.execute(f"ALTER TABLE compression_runs ADD COLUMN {col} {typedef}")
+        except sqlite3.OperationalError:
+            pass  # column already exists
+    conn.commit()
+    conn.close()
+def save_run(record: dict) -> int:
+    conn = _connect()
+    cursor = conn.execute(
+        """
+        INSERT INTO compression_runs
+            (timestamp, model, tokenizer, input_tokens, output_tokens, target_tokens,
+             compression_ratio, quality_score, duration_ms, input_text, output_text)
+        VALUES
+            (:timestamp, :model, :tokenizer, :input_tokens, :output_tokens, :target_tokens,
+             :compression_ratio, :quality_score, :duration_ms, :input_text, :output_text)
+        """,
+        record,
+    )
+    run_id = cursor.lastrowid
+    conn.commit()
+    conn.close()
+    return run_id
+def update_feedback(run_id: int, value: int):
+    conn = _connect()
+    conn.execute(
+        "UPDATE compression_runs SET feedback = ? WHERE id = ?",
+        (value, run_id),
+    )
+    conn.commit()
+    conn.close()
+def update_feedback_comment(run_id: int, comment: str):
+    conn = _connect()
+    conn.execute(
+        "UPDATE compression_runs SET feedback_comment = ? WHERE id = ?",
+        (comment, run_id),
+    )
+    conn.commit()
+    conn.close()
+def delete_run(run_id: int):
+    conn = _connect()
+    conn.execute("DELETE FROM compression_runs WHERE id = ?", (run_id,))
+    conn.commit()
+    conn.close()
+def get_run(run_id: int) -> dict | None:
+    conn = _connect()
+    row = conn.execute(
+        "SELECT * FROM compression_runs WHERE id = ?", (run_id,)
+    ).fetchone()
+    conn.close()
+    return dict(row) if row else None
+def get_runs(limit: int = 100) -> list[dict]:
+    conn = _connect()
+    rows = conn.execute(
+        "SELECT * FROM compression_runs ORDER BY id DESC LIMIT ?", (limit,)
+    ).fetchall()
+    conn.close()
+    return [dict(r) for r in rows]

docs/architecture.md ADDED Viewed

	@@ -0,0 +1,55 @@

+# Architecture
+TinyPress is built modular — each concern lives in its own place, nothing bleeds into something it shouldn't.
+## How a compression request flows
+```
+User Input (Gradio UI)
+        │
+        ▼
+  core/compressor.py       ← builds the prompt, calls the model, trims if it overshoots
+        │
+        ▼
+  models/model_loader.py   ← Qwen2.5-1.5B-Instruct, loaded once and reused
+        │
+        ▼
+  core/scorer.py           ← checks how much meaning survived using all-MiniLM-L6-v2
+        │
+        ▼
+  db/store.py              ← saves the run to SQLite
+        │
+        ▼
+  ui/compress_tab.py       ← shows the result and metrics back to the user
+```
+## What each module does
+| Module | Responsibility |
+|---|---|
+| `app.py` | Starts everything — DB init, model load, Gradio launch |
+| `config.py` | One place for all settings — model names, token limits, DB path, port |
+| `ui/compress_tab.py` | The compression interface — input, slider, output, metrics |
+| `ui/history_tab.py` | History view — past runs, averages, trends |
+| `core/compressor.py` | Builds the compression prompt, runs generation, hard-trims if needed |
+| `core/scorer.py` | Cosine similarity between original and compressed text |
+| `core/tokenizer_utils.py` | Token counting and per-token string extraction using the LLM's own tokenizer |
+| `core/diff.py` | Word-level SequenceMatcher diff — produces annotated HTML for the history side-by-side view |
+| `models/model_loader.py` | Singleton model store — loads LLM + embedder on demand, supports hot-swapping both via `switch_llm` / `switch_embedder` |
+| `db/store.py` | SQLite operations — init, save a run, fetch history, delete a run; auto-migrates older DBs |
+| `db/schema.sql` | The `compression_runs` table definition |
+## A few decisions worth knowing
+**Models load once at startup.** This matters on a laptop — you don't want to reload a 1.5B model on every request. Both the LLM and the embedder are held in memory after the first load.
+**Model hot-swapping without a restart.** The Model Settings accordion in the UI lets you pick a different compression model or scoring embedder mid-session. Both `switch_llm` and `switch_embedder` in `model_loader.py` unload the current model (deletes the references, calls `gc.collect`, and flushes the CUDA cache if a GPU is present) before loading the new one — so you don't end up with two large models in memory at once.
+**Hard token trim as a safety net.** If the model overshoots the target budget, the output gets trimmed at the tokenizer level. It's a fallback, not the primary path — the prompt already asks the model to stay within budget.
+**Thin UI layer.** The Gradio handlers in `ui/` don't contain logic. They take inputs, call into `core/`, and return outputs. All the real work happens in `core/` and `db/`.
+**DB auto-migration.** `store.py` runs `ALTER TABLE … ADD COLUMN` for `tokenizer`, `duration_ms`, `feedback`, and `feedback_comment` on startup — so existing databases from earlier builds upgrade silently rather than crashing. `feedback` is nullable (`INTEGER`): `NULL` = no rating, `1` = 👍, `-1` = 👎. `feedback_comment` holds the optional text note.
+🏠 [README.md](../README.md)

docs/enhancements.md ADDED Viewed

	@@ -0,0 +1,27 @@

+# Enhancements
+The hackathon MVP covers the core loop well. Here's where it could go next.
+## Quick wins
+- **Batch compression** — let users paste multiple texts and compress them all at once
+- **Export history** — download past runs as a CSV straight from the History tab
+- **Named presets** — save favourite token budget + model combinations and reuse them
+- **`.env` support** — load config from a `.env` file instead of setting env vars manually
+## Worth building next
+- **Iterative compression** — if the quality score drops below a threshold, automatically retry with a slightly relaxed budget
+- **Custom focus instructions** — let the user say "keep all numbers" or "preserve action items only" before compressing
+- **Chunked compression** — handle inputs that exceed the model's context window by splitting, compressing each chunk, then merging
+- **REST API** — a simple `/compress` endpoint via Flask so other tools can call TinyPress programmatically
+## Longer term
+- **VS Code extension** — compress selected text without leaving the editor
+- **CLI tool** — `tinypress compress --budget 500 input.txt` for terminal users
+- **Hosted version** — a SaaS wrapper with usage tracking and team history
+- **Domain-specific fine-tuning** — train a compressor specialised for legal, medical, or code content
+🏠 [README.md](../README.md)

docs/folder-structure.md ADDED Viewed

	@@ -0,0 +1,44 @@

+# Folder Structure
+```
+app/
+├── app.py                        # Entry point
+├── config.py                      # All tunable settings
+├── requirements.txt               # Pinned Python dependencies
+├── tinypress.db                   # SQLite DB (auto-created on first run)
+│
+├── ui/
+│   ├── compress_tab.py            # Compression UI tab
+│   └── history_tab.py             # Metrics history tab
+│
+├── core/
+│   ├── compressor.py              # Compression pipeline logic
+│   ├── scorer.py                  # Semantic similarity scoring
+│   ├── tokenizer_utils.py         # Token counting helpers
+│   └── diff.py                    # Word-level diff + HTML renderer for history view
+│
+├── models/
+│   └── model_loader.py            # Lazy model + embedder loading
+│
+├── db/
+│   ├── schema.sql                 # SQLite table definitions
+│   └── store.py                   # DB read/write operations
+│
+├── docs/                          # Project documentation
+│   ├── architecture.md
+│   ├── folder-structure.md
+│   ├── setup.md
+│   ├── get-started.md
+│   └── enhancements.md
+│
+├── my-notes/                      # Planning notes (not part of the app)
+│   └── overall-idea.md
+│
+└── claude-grounding/              # Context files for Claude (not part of the app)
+    ├── hackathon.md
+    ├── tech-stack.md
+    └── about-me.md
+```
+🏠 [README.md](../README.md)

docs/get-started.md ADDED Viewed

	@@ -0,0 +1,72 @@

+# Get Started
+Once `python app.py` is running, head to `http://localhost:7860` in your browser. You'll see two tabs.
+## Compress tab
+This is where the action is.
+1. Paste your text — could be a long prompt, meeting notes, an article, anything really
+2. Use the slider to set your token budget (anywhere from 100 to 1000)
+3. Hit **Compress**
+As you type or adjust the slider, a status banner updates live:
+- **Green** — the input is over budget, compression will run
+- **Red** — the input is already within budget, nothing to do
+On the right you'll see:
+- The compressed version of your text
+- How many tokens went in vs came out
+- The compression ratio (how much it shrank)
+- A quality score between 0 and 1 — closer to 1 means the meaning held up well
+Once the result appears, **👍 Helpful** and **👎 Not helpful** buttons show up below the metrics. Click either one to rate the result — the feedback is saved instantly. A note field then slides in where you can optionally type what worked well or didn't (e.g. "lost key dates", "too short", "great summary") and hit **Save note**. Both the rating and the note are stored with the run and visible in the History tab.
+Every run saves automatically in the background. You don't need to do anything.
+### Token Highlights
+Below the input box there's a **Show Token Highlights** button. Click it and each token in your input gets rendered as a colour-coded chip — useful for seeing exactly where your budget is going. The panel updates live as you type. Click again to hide it.
+### Switching the compression model
+Click **Model Settings** at the top of the tab to expand the accordion. Pick a model from the dropdown (or type a custom HuggingFace model ID) and hit **Load Model**. The current model is unloaded from memory first, then the new one loads — no restart needed. The status box confirms when it's ready.
+Available presets: Qwen2.5-1.5B-Instruct (default), Qwen2.5-0.5B-Instruct, SmolLM2-1.7B-Instruct, Phi-3.5-mini-instruct, Llama-3.2-1B-Instruct.
+### Switching the scoring embedder
+Below the compression model section in the same accordion, there's a separate **Embedder Model** dropdown. The embedder is what computes the quality score — changing it affects how accurately that score reflects meaning retention.
+When you select a model from the dropdown, an info panel updates immediately to explain the trade-off:
+- ⚡ **Fast** models (MiniLM, bge-small) — low overhead, good baseline scores, CPU-friendly
+- ⚖️ **Balanced** models (mpnet, bge-base) — more discriminating scores, small speed cost
+- 🏆 **High quality** models (mxbai-large) — most accurate scores, GPU recommended
+- 🔬 **Best quality** models (gte-Qwen2-1.5B) — catches subtle meaning loss, requires significant RAM/VRAM
+Hit **Load Embedder** to apply the selection. The previous embedder is unloaded from memory before the new one loads.
+## History tab
+Click over here to see everything that's been compressed so far.
+The table loads automatically when you open the tab. Hit **Refresh** to pull in the latest runs. At the top you'll find the average quality score and compression ratio across all sessions — a quick way to see how the tool is performing over time.
+### Column visibility
+By default the table shows: `id`, `timestamp`, `model`, `compression_ratio`, `quality_score`, `feedback`. Open the **Column visibility** accordion above the table to toggle any additional columns on or off — changes apply instantly without a refresh.
+### Side-by-side diff
+Click any row in the table and a word-level diff panel opens below it. Words are colour-coded:
+- Red strikethrough — dropped from the original
+- Amber — rewritten by the model
+- Green — inserted (rare connector words)
+- Plain — survived unchanged
+### Deleting a run
+Click a row to select it, then hit **Delete Selected Row**. The table refreshes and the aggregate stats update automatically.
+🏠 [README.md](../README.md)

docs/setup.md ADDED Viewed

	@@ -0,0 +1,95 @@

+# Setup
+## What you need
+**Hardware**
+| | Minimum (CPU) | Recommended (GPU) |
+|---|---|---|
+| RAM | 8 GB | 8 GB+ |
+| VRAM | — | 4 GB (e.g. NVIDIA T4) |
+| Disk | ~4 GB free | ~4 GB free |
+| Inference speed | Slow (float32) | Fast (float16, auto device map) |
+The default model (Qwen2.5-1.5B-Instruct) fits in 4 GB VRAM. Larger models from the dropdown (e.g. Phi-3.5-mini) need more headroom.
+**Software**
+- Python 3.10 or above
+**Network**
+- Internet required on first run only — model weights (~3.5 GB total) download from HuggingFace and are cached locally
+- Fully offline after that
+## Steps
+**1. Navigate to the project folder**
+```bash
+cd app
+```
+**2. Create and activate the virtual environment**
+```bash
+python -m venv .venv
+.venv\Scripts\activate        # Windows
+# source .venv/bin/activate   # macOS/Linux
+```
+**3. Install dependencies**
+```bash
+pip install -r requirements.txt
+```
+**4. Optionally tweak the defaults**
+You can override any of these via environment variables if needed:
+| Variable | Default | What it does |
+|---|---|---|
+| `LLM_MODEL` | `Qwen/Qwen2.5-1.5B-Instruct` | The model used for compression |
+| `EMBEDDER_MODEL` | `sentence-transformers/all-MiniLM-L6-v2` | Used to score compression quality |
+| `DB_PATH` | `tinypress.db` | Where the SQLite database lives |
+| `PORT` | `7860` | Port the Gradio app listens on |
+**5. Run it**
+```bash
+python app.py
+```
+The first time you run it, model weights will download from HuggingFace automatically. After that, everything runs from local cache.
+## Managing dependencies
+**Installing a new package**
+```bash
+pip install <package-name>
+pip freeze > requirements.txt
+```
+**Removing a package**
+```bash
+pip uninstall <package-name>
+pip freeze > requirements.txt
+```
+Always run `pip freeze > requirements.txt` after any install or uninstall — that keeps the file in sync with what's actually in your environment.
+## Deactivating the virtual environment
+When you're done, just run:
+```bash
+deactivate
+```
+That drops you back to your system Python. Next time, activate again with `.venv\Scripts\activate` before working on the project.
+🏠 [README.md](../README.md)

models/model_loader.py ADDED Viewed

	@@ -0,0 +1,98 @@

+from transformers import AutoTokenizer, AutoModelForCausalLM
+from sentence_transformers import SentenceTransformer
+import torch
+import gc
+import config
+_llm = None
+_tokenizer = None
+_embedder = None
+_current_model_id = None
+_current_embedder_id = None
+def get_current_model_id() -> str | None:
+    return _current_model_id
+def get_current_tokenizer_id() -> str | None:
+    # Tokenizer is always loaded from the same HF repo as the model.
+    return _current_model_id
+def get_current_embedder_id() -> str | None:
+    return _current_embedder_id
+def get_llm():
+    global _llm, _tokenizer
+    if _llm is None:
+        _load_llm(config.LLM_MODEL)
+    return _llm, _tokenizer
+def switch_llm(model_id: str) -> str:
+    global _current_model_id
+    if _current_model_id == model_id:
+        return f"Already using {model_id}"
+    _unload_llm()
+    _load_llm(model_id)
+    return f"Loaded: {model_id}"
+def _load_llm(model_id: str):
+    """Load model + its paired tokenizer. Both come from the same model_id."""
+    global _llm, _tokenizer, _current_model_id
+    _tokenizer = AutoTokenizer.from_pretrained(model_id)
+    _llm = AutoModelForCausalLM.from_pretrained(
+        model_id,
+        torch_dtype=torch.float32,
+        device_map="auto",
+    )
+    _llm.eval()
+    _current_model_id = model_id
+def _unload_llm():
+    """Free GPU/CPU memory before loading a different model."""
+    global _llm, _tokenizer, _current_model_id
+    del _llm
+    del _tokenizer
+    _llm = None
+    _tokenizer = None
+    _current_model_id = None
+    gc.collect()
+    if torch.cuda.is_available():
+        torch.cuda.empty_cache()
+def get_embedder():
+    global _embedder, _current_embedder_id
+    if _embedder is None:
+        _load_embedder(config.EMBEDDER_MODEL)
+    return _embedder
+def switch_embedder(model_id: str) -> str:
+    global _current_embedder_id
+    if _current_embedder_id == model_id:
+        return f"Already using {model_id}"
+    _unload_embedder()
+    _load_embedder(model_id)
+    return f"Loaded: {model_id}"
+def _load_embedder(model_id: str):
+    global _embedder, _current_embedder_id
+    _embedder = SentenceTransformer(model_id)
+    _current_embedder_id = model_id
+def _unload_embedder():
+    global _embedder, _current_embedder_id
+    del _embedder
+    _embedder = None
+    _current_embedder_id = None
+    gc.collect()
+    if torch.cuda.is_available():
+        torch.cuda.empty_cache()

requirements.txt ADDED Viewed

	@@ -0,0 +1,8 @@

+gradio==5.0
+transformers>=4.40.0
+sentence-transformers>=3.0.0
+torch>=2.2.0
+numpy>=1.26.0
+pandas>=2.0.0
+accelerate>=0.30.0
+huggingface_hub==0.25.2

tinypress_colab.ipynb ADDED Viewed

	@@ -0,0 +1,336 @@

+{
+ "nbformat": 4,
+ "nbformat_minor": 5,
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.10.0"
+  },
+  "colab": {
+   "provenance": [],
+   "gpuType": "T4"
+  },
+  "accelerator": "GPU"
+ },
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "cell-title",
+   "metadata": {},
+   "source": "# TinyPress — Prompt Compression Engine\n\n**HuggingFace Build Small Hackathon · Track: Thousand Token Wood**\n\n| Layer | Detail |\n|-------|--------|\n| Compression | `Qwen/Qwen2.5-1.5B-Instruct` (default, switchable) |\n| Scoring | `sentence-transformers/all-MiniLM-L6-v2` (default, switchable) |\n| UI | Gradio 5 — public share URL |\n| Storage | SQLite at `/content/tinypress.db` |\n\n**Features**\n- Compress text to a user-defined token budget\n- Live 🔴 / 🟢 compression readiness banner\n- Per-token colour highlight panel (toggle on/off)\n- Dynamic compression model switching (5 curated <32B models)\n- Dynamic scoring embedder switching (6 models, with per-model impact info)\n- 👍 / 👎 feedback on every compression result, with optional text comment\n- Compression run history persisted to SQLite\n- Column picker in History tab — compact default view, expandable to all fields\n- Per-row delete in history\n- Side-by-side word-level diff viewer with feedback badge and token detail\n\n> **Recommended runtime:** GPU  →  Runtime → Change runtime type → T4 GPU\n\n---\n\n### About the author\n\nBuilt by **Sriharsha C R** — AI Engineer, Cloud Native developer, and knowledge sharer.\nIf this was useful, feel free to connect — always happy to chat about AI, LLMs, or anything in between.\n\n[![LinkedIn](https://img.shields.io/badge/LinkedIn-sriharsha--cr-0a66c2?logo=linkedin&logoColor=white)](https://www.linkedin.com/in/sriharsha-cr)\n[![X / Twitter](https://img.shields.io/badge/X-@sriharsha__cr-000000?logo=x&logoColor=white)](https://x.com/sriharsha_cr)\n[![HuggingFace](https://img.shields.io/badge/HuggingFace-sriharsha--cr-ff9d00?logo=huggingface&logoColor=white)](https://huggingface.co/sriharsha-cr)\n[![GitHub](https://img.shields.io/badge/GitHub-SriharshaCR-181717?logo=github&logoColor=white)](https://github.com/SriharshaCR)"
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-s1-hdr",
+   "metadata": {},
+   "source": [
+    "## Step 1 — Install dependencies"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-install",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install -q \\\n",
+    "    \"gradio==5.0\" \\\n",
+    "    \"transformers>=4.40.0\" \\\n",
+    "    \"sentence-transformers>=3.0.0\" \\\n",
+    "    \"torch>=2.2.0\" \\\n",
+    "    \"numpy>=1.26.0\" \\\n",
+    "    \"pandas>=2.0.0\" \\\n",
+    "    \"accelerate>=0.30.0\" \\\n",
+    "    \"huggingface_hub==0.25.2\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-s2-hdr",
+   "metadata": {},
+   "source": [
+    "## Step 2 — Runtime check"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-runtime",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "\n",
+    "device = 'cuda' if torch.cuda.is_available() else 'cpu'\n",
+    "dtype  = torch.float16 if device == 'cuda' else torch.float32\n",
+    "\n",
+    "print(f'Device : {device}')\n",
+    "if device == 'cuda':\n",
+    "    print(f'GPU    : {torch.cuda.get_device_name(0)}')\n",
+    "    print(f'VRAM   : {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB')\n",
+    "print(f'dtype  : {dtype}')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-s3-hdr",
+   "metadata": {},
+   "source": [
+    "## Step 3 — Configuration"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-config",
+   "metadata": {},
+   "outputs": [],
+   "source": "import os\n\nLLM_MODEL      = os.getenv('LLM_MODEL',      'Qwen/Qwen2.5-1.5B-Instruct')\nEMBEDDER_MODEL = os.getenv('EMBEDDER_MODEL', 'sentence-transformers/all-MiniLM-L6-v2')\nDB_PATH        = os.getenv('DB_PATH',        '/content/tinypress.db')\nSERVER_PORT    = int(os.getenv('PORT', 7860))\n\nDEFAULT_TARGET_TOKENS = 500\nMAX_NEW_TOKENS        = 1024\nAPP_TITLE             = 'TinyPress'\n\n# Curated <32B open-weight causal LMs for local / Colab inference.\nAVAILABLE_MODELS = [\n    'Qwen/Qwen2.5-1.5B-Instruct',\n    'Qwen/Qwen2.5-0.5B-Instruct',\n    'HuggingFaceTB/SmolLM2-1.7B-Instruct',\n    'microsoft/Phi-3.5-mini-instruct',\n    'meta-llama/Llama-3.2-1B-Instruct',\n]\n\n# Curated sentence-transformer embedding models for quality scoring.\nAVAILABLE_EMBEDDER_MODELS = [\n    'sentence-transformers/all-MiniLM-L6-v2',\n    'sentence-transformers/all-mpnet-base-v2',\n    'BAAI/bge-small-en-v1.5',\n    'BAAI/bge-base-en-v1.5',\n    'mixedbread-ai/mxbai-embed-large-v1',\n    'Alibaba-NLP/gte-Qwen2-1.5B-instruct',\n]\n\nEMBEDDER_INFO = {\n    'sentence-transformers/all-MiniLM-L6-v2': (\n        '⚡ **Fast · 22M params · Default**  \\n'\n        'Great baseline. Scores are reliable for typical compression ratios. '\n        'Runs comfortably on CPU — minimal overhead.'\n    ),\n    'sentence-transformers/all-mpnet-base-v2': (\n        '⚖️ **Balanced · 110M params**  \\n'\n        'Noticeably sharper quality scores than MiniLM, especially on longer texts. '\n        'Small speed trade-off; fine on CPU.'\n    ),\n    'BAAI/bge-small-en-v1.5': (\n        '⚡ **Fast · 33M params**  \\n'\n        'Strong quality-to-size ratio — often matches MiniLM on accuracy while being '\n        'slightly more sensitive to meaning shifts. Good CPU option.'\n    ),\n    'BAAI/bge-base-en-v1.5': (\n        '⚖️ **Balanced · 109M params**  \\n'\n        'Consistently strong on semantic similarity benchmarks. '\n        'Scores will be more discriminating — small differences in compression quality show up more clearly.'\n    ),\n    'mixedbread-ai/mxbai-embed-large-v1': (\n        '🏆 **High quality · 335M params**  \\n'\n        'Top-tier similarity scores. Quality readings will be the most accurate here, '\n        'but slower to load and run. GPU recommended.'\n    ),\n    'Alibaba-NLP/gte-Qwen2-1.5B-instruct': (\n        '🔬 **Best quality · 1.5B params**  \\n'\n        'Strongest semantic understanding in this list. Scores will reflect subtle meaning loss '\n        'that smaller models miss. Requires significant RAM/VRAM — GPU strongly recommended.'\n    ),\n}\n\nprint(f'LLM      : {LLM_MODEL}')\nprint(f'Embedder : {EMBEDDER_MODEL}')\nprint(f'DB       : {DB_PATH}')"
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-s4-hdr",
+   "metadata": {},
+   "source": [
+    "## Step 4 — Model loader"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-model-loader",
+   "metadata": {},
+   "outputs": [],
+   "source": "from transformers import AutoTokenizer, AutoModelForCausalLM\nfrom sentence_transformers import SentenceTransformer\nimport gc\n\n_llm               = None\n_tokenizer         = None\n_embedder          = None\n_current_model_id    = None\n_current_embedder_id = None\n\n\ndef get_current_model_id():\n    return _current_model_id\n\n\ndef get_current_tokenizer_id():\n    # Tokenizer is always loaded from the same HF repo as the model.\n    return _current_model_id\n\n\ndef get_current_embedder_id():\n    return _current_embedder_id\n\n\ndef get_llm():\n    global _llm, _tokenizer\n    if _llm is None:\n        _load_llm(LLM_MODEL)\n    return _llm, _tokenizer\n\n\ndef switch_llm(model_id: str) -> str:\n    global _current_model_id\n    if _current_model_id == model_id:\n        return f'Already using {model_id}'\n    _unload_llm()\n    _load_llm(model_id)\n    return f'Loaded: {model_id}'\n\n\ndef _load_llm(model_id: str):\n    \"\"\"Load model + its paired tokenizer. Both come from the same model_id.\"\"\"\n    global _llm, _tokenizer, _current_model_id\n    print(f'Loading LLM: {model_id} ...')\n    _tokenizer = AutoTokenizer.from_pretrained(model_id)\n    _llm = AutoModelForCausalLM.from_pretrained(\n        model_id,\n        torch_dtype=dtype,\n        device_map='auto',\n    )\n    _llm.eval()\n    _current_model_id = model_id\n    print(f'LLM ready: {model_id}')\n\n\ndef _unload_llm():\n    \"\"\"Free GPU/CPU memory before loading a different model.\"\"\"\n    global _llm, _tokenizer, _current_model_id\n    del _llm, _tokenizer\n    _llm = None\n    _tokenizer = None\n    _current_model_id = None\n    gc.collect()\n    if torch.cuda.is_available():\n        torch.cuda.empty_cache()\n\n\ndef get_embedder():\n    global _embedder, _current_embedder_id\n    if _embedder is None:\n        _load_embedder(EMBEDDER_MODEL)\n    return _embedder\n\n\ndef switch_embedder(model_id: str) -> str:\n    global _current_embedder_id\n    if _current_embedder_id == model_id:\n        return f'Already using {model_id}'\n    _unload_embedder()\n    _load_embedder(model_id)\n    return f'Loaded: {model_id}'\n\n\ndef _load_embedder(model_id: str):\n    global _embedder, _current_embedder_id\n    print(f'Loading embedder: {model_id} ...')\n    _embedder = SentenceTransformer(model_id)\n    _current_embedder_id = model_id\n    print(f'Embedder ready: {model_id}')\n\n\ndef _unload_embedder():\n    global _embedder, _current_embedder_id\n    del _embedder\n    _embedder = None\n    _current_embedder_id = None\n    gc.collect()\n    if torch.cuda.is_available():\n        torch.cuda.empty_cache()\n\n\nprint('Model loader defined.')"
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-s5-hdr",
+   "metadata": {},
+   "source": [
+    "## Step 5 — Core pipeline"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-core",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "# ── tokenizer utils ───────────────────────────────────────────────────────────\n",
+    "\n",
+    "def count_tokens(text: str) -> int:\n",
+    "    _, tokenizer = get_llm()\n",
+    "    return len(tokenizer.encode(text, add_special_tokens=False))\n",
+    "\n",
+    "\n",
+    "def get_token_strings(text: str) -> list:\n",
+    "    \"\"\"Return the decoded surface string for every token in text.\"\"\"\n",
+    "    _, tokenizer = get_llm()\n",
+    "    ids = tokenizer.encode(text, add_special_tokens=False)\n",
+    "    return [tokenizer.decode([i]) for i in ids]\n",
+    "\n",
+    "\n",
+    "# ── compressor ────────────────────────────────────────────────────────────────\n",
+    "\n",
+    "_PROMPT_TEMPLATE = (\n",
+    "    'You are a lossless compression assistant. '\n",
+    "    'Compress the following text to at most {target} tokens.\\n'\n",
+    "    'Preserve all key facts, decisions, and intent. '\n",
+    "    'Do not add commentary. Output only the compressed text.\\n\\n'\n",
+    "    'TEXT:\\n{text}\\n\\nCOMPRESSED:'\n",
+    ")\n",
+    "\n",
+    "\n",
+    "def _generate(prompt: str) -> str:\n",
+    "    model, tokenizer = get_llm()\n",
+    "    inputs = tokenizer(prompt, return_tensors='pt').to(model.device)\n",
+    "    with torch.no_grad():\n",
+    "        output_ids = model.generate(\n",
+    "            **inputs,\n",
+    "            max_new_tokens=MAX_NEW_TOKENS,\n",
+    "            do_sample=False,\n",
+    "            pad_token_id=tokenizer.eos_token_id,\n",
+    "        )\n",
+    "    new_tokens = output_ids[0][inputs['input_ids'].shape[1]:]\n",
+    "    return tokenizer.decode(new_tokens, skip_special_tokens=True).strip()\n",
+    "\n",
+    "\n",
+    "def compress(text: str, target_tokens: int) -> tuple:\n",
+    "    \"\"\"Returns (compressed_text, input_token_count, output_token_count).\"\"\"\n",
+    "    input_tokens = count_tokens(text)\n",
+    "    if input_tokens <= target_tokens:\n",
+    "        return text, input_tokens, input_tokens\n",
+    "\n",
+    "    prompt     = _PROMPT_TEMPLATE.format(target=target_tokens, text=text)\n",
+    "    compressed = _generate(prompt)\n",
+    "\n",
+    "    # Hard-trim if model overshoots.\n",
+    "    _, tokenizer = get_llm()\n",
+    "    ids = tokenizer.encode(compressed, add_special_tokens=False)\n",
+    "    if len(ids) > target_tokens:\n",
+    "        compressed = tokenizer.decode(ids[:target_tokens], skip_special_tokens=True)\n",
+    "\n",
+    "    output_tokens = count_tokens(compressed)\n",
+    "    return compressed, input_tokens, output_tokens\n",
+    "\n",
+    "\n",
+    "# ── scorer ────────────────────────────────────────────────────────────────────\n",
+    "\n",
+    "def semantic_score(original: str, compressed: str) -> float:\n",
+    "    embedder = get_embedder()\n",
+    "    vecs = embedder.encode([original, compressed], convert_to_numpy=True)\n",
+    "    cos  = float(\n",
+    "        np.dot(vecs[0], vecs[1]) / (np.linalg.norm(vecs[0]) * np.linalg.norm(vecs[1]))\n",
+    "    )\n",
+    "    return round(max(0.0, min(1.0, cos)), 4)\n",
+    "\n",
+    "\n",
+    "print('Core pipeline defined.')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-s6-hdr",
+   "metadata": {},
+   "source": [
+    "## Step 6 — Diff renderer"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-diff",
+   "metadata": {},
+   "outputs": [],
+   "source": "import difflib\nimport html as _h\n\n\ndef _word_diff(original: str, compressed: str) -> tuple:\n    \"\"\"\n    Word-level SequenceMatcher diff.\n    Returns (annotated_original_html, annotated_compressed_html).\n    Colour key:\n      original  — red strikethrough = dropped\n      compressed — amber            = rewritten\n      compressed — green            = inserted\n      plain                         = unchanged\n    \"\"\"\n    orig_words = original.split()\n    comp_words = compressed.split()\n    matcher = difflib.SequenceMatcher(None, orig_words, comp_words, autojunk=False)\n\n    orig_parts, comp_parts = [], []\n\n    for tag, i1, i2, j1, j2 in matcher.get_opcodes():\n        ow = _h.escape(' '.join(orig_words[i1:i2]))\n        cw = _h.escape(' '.join(comp_words[j1:j2]))\n\n        if tag == 'equal':\n            orig_parts.append(ow)\n            comp_parts.append(cw)\n\n        elif tag == 'delete':\n            orig_parts.append(\n                f'<mark style=\"background:#fee2e2;color:#b91c1c;'\n                f'text-decoration:line-through;padding:1px 3px;border-radius:3px\">{ow}</mark>'\n            )\n\n        elif tag == 'insert':\n            comp_parts.append(\n                f'<mark style=\"background:#dcfce7;color:#15803d;'\n                f'padding:1px 3px;border-radius:3px\">{cw}</mark>'\n            )\n\n        elif tag == 'replace':\n            orig_parts.append(\n                f'<mark style=\"background:#fee2e2;color:#b91c1c;'\n                f'text-decoration:line-through;padding:1px 3px;border-radius:3px\">{ow}</mark>'\n            )\n            comp_parts.append(\n                f'<mark style=\"background:#fef9c3;color:#92400e;'\n                f'padding:1px 3px;border-radius:3px\">{cw}</mark>'\n            )\n\n    return ' '.join(orig_parts), ' '.join(comp_parts)\n\n\ndef render_diff_html(record: dict) -> str:\n    \"\"\"Build a self-contained side-by-side diff HTML block for a compression run.\"\"\"\n    original   = record.get('input_text', '')\n    compressed = record.get('output_text', '')\n    if not original or not compressed:\n        return ''\n\n    orig_html, comp_html = _word_diff(original, compressed)\n\n    model      = _h.escape(record.get('model', '—'))\n    tokenizer  = _h.escape(record.get('tokenizer', '—'))\n    ts         = _h.escape(record.get('timestamp', '—'))\n    in_tok     = record.get('input_tokens',  '—')\n    out_tok    = record.get('output_tokens', '—')\n    target_tok = record.get('target_tokens', '—')\n    ratio      = record.get('compression_ratio', 0)\n    quality    = record.get('quality_score', 0)\n    duration   = record.get('duration_ms', '—')\n    run_id     = record.get('id', '—')\n\n    feedback_val  = record.get('feedback')\n    feedback_note = _h.escape(record.get('feedback_comment') or '')\n\n    # Build optional feedback block\n    if feedback_val is not None:\n        badge_bg    = '#f0fdf4' if feedback_val == 1 else '#fef2f2'\n        badge_color = '#15803d' if feedback_val == 1 else '#b91c1c'\n        badge_text  = '👍 Helpful' if feedback_val == 1 else '👎 Not helpful'\n        feedback_block = (\n            f'<div style=\"display:flex;flex-wrap:wrap;align-items:center;gap:8px;'\n            f'margin-top:10px;padding:8px 12px;border-radius:6px;background:{badge_bg}\">'\n            f'<span style=\"font-weight:600;font-size:0.8rem;color:{badge_color}\">{badge_text}</span>'\n        )\n        if feedback_note:\n            feedback_block += (\n                f'<span style=\"font-size:0.8rem;color:#374151;font-style:italic\">'\n                f'\"{feedback_note}\"</span>'\n            )\n        feedback_block += '</div>'\n    else:\n        feedback_block = ''\n\n    return f\"\"\"\n<div style=\"font-family:system-ui,sans-serif;margin-top:4px\">\n\n  <!-- Primary meta chips -->\n  <div style=\"display:flex;flex-wrap:wrap;gap:6px;margin-bottom:6px;font-size:0.78rem\">\n    <span style=\"background:#f3f4f6;padding:3px 9px;border-radius:12px;color:#374151\">Run #{run_id}</span>\n    <span style=\"background:#f3f4f6;padding:3px 9px;border-radius:12px;color:#374151\">{ts}</span>\n    <span style=\"background:#eff6ff;padding:3px 9px;border-radius:12px;color:#1d4ed8\">{model}</span>\n    <span style=\"background:#f0fdf4;padding:3px 9px;border-radius:12px;color:#15803d\">Quality {quality:.4f}</span>\n    <span style=\"background:#fff7ed;padding:3px 9px;border-radius:12px;color:#c2410c\">Ratio {ratio:.4f}</span>\n    <span style=\"background:#faf5ff;padding:3px 9px;border-radius:12px;color:#7e22ce\">&#9201; {duration} ms</span>\n  </div>\n\n  <!-- Secondary meta chips -->\n  <div style=\"display:flex;flex-wrap:wrap;gap:6px;margin-bottom:12px;font-size:0.78rem\">\n    <span style=\"background:#f3f4f6;padding:3px 9px;border-radius:12px;color:#374151\">{in_tok} in → {out_tok} out (target {target_tok})</span>\n    <span style=\"background:#f3f4f6;padding:3px 9px;border-radius:12px;color:#374151\">tokenizer: {tokenizer}</span>\n  </div>\n\n  <!-- Side-by-side panels -->\n  <div style=\"display:grid;grid-template-columns:1fr 1fr;gap:12px\">\n    <div style=\"border:1px solid #fecaca;border-radius:8px;overflow:hidden\">\n      <div style=\"background:#fef2f2;padding:8px 14px;border-bottom:1px solid #fecaca;\n                  display:flex;justify-content:space-between;align-items:center\">\n        <span style=\"font-weight:700;font-size:0.8rem;color:#b91c1c\">ORIGINAL</span>\n        <span style=\"font-size:0.75rem;color:#6b7280\">{in_tok} tokens</span>\n      </div>\n      <div style=\"padding:14px;line-height:1.8;font-size:0.875rem;color:#1a1a1a;\n                  max-height:340px;overflow-y:auto;word-break:break-word\">{orig_html}</div>\n    </div>\n    <div style=\"border:1px solid #bbf7d0;border-radius:8px;overflow:hidden\">\n      <div style=\"background:#f0fdf4;padding:8px 14px;border-bottom:1px solid #bbf7d0;\n                  display:flex;justify-content:space-between;align-items:center\">\n        <span style=\"font-weight:700;font-size:0.8rem;color:#15803d\">COMPRESSED</span>\n        <span style=\"font-size:0.75rem;color:#6b7280\">{out_tok} tokens</span>\n      </div>\n      <div style=\"padding:14px;line-height:1.8;font-size:0.875rem;color:#1a1a1a;\n                  max-height:340px;overflow-y:auto;word-break:break-word\">{comp_html}</div>\n    </div>\n  </div>\n\n  {feedback_block}\n\n  <!-- Legend -->\n  <div style=\"display:flex;flex-wrap:wrap;gap:14px;margin-top:10px;font-size:0.75rem;color:#6b7280;align-items:center\">\n    <mark style=\"background:#fee2e2;color:#b91c1c;text-decoration:line-through;padding:2px 7px;border-radius:3px\">dropped</mark>\n    <mark style=\"background:#fef9c3;color:#92400e;padding:2px 7px;border-radius:3px\">rewritten</mark>\n    <mark style=\"background:#dcfce7;color:#15803d;padding:2px 7px;border-radius:3px\">inserted</mark>\n    <span>plain = unchanged</span>\n  </div>\n\n</div>\n\"\"\"\n\n\nprint('Diff renderer defined.')"
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-s7-hdr",
+   "metadata": {},
+   "source": [
+    "## Step 7 — Database"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-db",
+   "metadata": {},
+   "outputs": [],
+   "source": "import sqlite3\n\n_SCHEMA = \"\"\"\nCREATE TABLE IF NOT EXISTS compression_runs (\n    id                INTEGER PRIMARY KEY AUTOINCREMENT,\n    timestamp         TEXT    NOT NULL,\n    model             TEXT    NOT NULL,\n    tokenizer         TEXT    NOT NULL,\n    input_tokens      INTEGER NOT NULL,\n    output_tokens     INTEGER NOT NULL,\n    target_tokens     INTEGER NOT NULL,\n    compression_ratio REAL    NOT NULL,\n    quality_score     REAL    NOT NULL,\n    duration_ms       REAL    NOT NULL,\n    input_text        TEXT    NOT NULL,\n    output_text       TEXT    NOT NULL,\n    feedback          INTEGER,\n    feedback_comment  TEXT\n);\n\"\"\"\n\n\ndef _connect():\n    conn = sqlite3.connect(DB_PATH)\n    conn.row_factory = sqlite3.Row\n    return conn\n\n\ndef init_db():\n    conn = _connect()\n    conn.executescript(_SCHEMA)\n    for col, typedef in [\n        ('tokenizer',        'TEXT NOT NULL DEFAULT \"\"'),\n        ('duration_ms',      'REAL NOT NULL DEFAULT 0'),\n        ('feedback',         'INTEGER'),\n        ('feedback_comment', 'TEXT'),\n    ]:\n        try:\n            conn.execute(f'ALTER TABLE compression_runs ADD COLUMN {col} {typedef}')\n        except sqlite3.OperationalError:\n            pass\n    conn.commit()\n    conn.close()\n\n\ndef save_run(record: dict) -> int:\n    conn = _connect()\n    cursor = conn.execute(\n        '''\n        INSERT INTO compression_runs\n            (timestamp, model, tokenizer, input_tokens, output_tokens, target_tokens,\n             compression_ratio, quality_score, duration_ms, input_text, output_text)\n        VALUES\n            (:timestamp, :model, :tokenizer, :input_tokens, :output_tokens, :target_tokens,\n             :compression_ratio, :quality_score, :duration_ms, :input_text, :output_text)\n        ''',\n        record,\n    )\n    run_id = cursor.lastrowid\n    conn.commit()\n    conn.close()\n    return run_id\n\n\ndef update_feedback(run_id: int, value: int):\n    conn = _connect()\n    conn.execute('UPDATE compression_runs SET feedback = ? WHERE id = ?', (value, run_id))\n    conn.commit()\n    conn.close()\n\n\ndef update_feedback_comment(run_id: int, comment: str):\n    conn = _connect()\n    conn.execute('UPDATE compression_runs SET feedback_comment = ? WHERE id = ?', (comment, run_id))\n    conn.commit()\n    conn.close()\n\n\ndef delete_run(run_id: int):\n    conn = _connect()\n    conn.execute('DELETE FROM compression_runs WHERE id = ?', (run_id,))\n    conn.commit()\n    conn.close()\n\n\ndef get_run(run_id: int):\n    conn = _connect()\n    row = conn.execute('SELECT * FROM compression_runs WHERE id = ?', (run_id,)).fetchone()\n    conn.close()\n    return dict(row) if row else None\n\n\ndef get_runs(limit: int = 100) -> list:\n    conn = _connect()\n    rows = conn.execute(\n        'SELECT * FROM compression_runs ORDER BY id DESC LIMIT ?', (limit,)\n    ).fetchall()\n    conn.close()\n    return [dict(r) for r in rows]\n\n\ninit_db()\nprint(f'Database ready at {DB_PATH}')"
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-s8-hdr",
+   "metadata": {},
+   "source": [
+    "## Step 8 — Load models\n",
+    "\n",
+    "Downloads and caches weights. GPU warm-cache: ~30 s. First run: a few minutes."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-load-models",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "get_llm()\n",
+    "get_embedder()\n",
+    "print('\\nAll models loaded and ready.')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-s9-hdr",
+   "metadata": {},
+   "source": [
+    "## Step 9 — Launch Gradio UI\n",
+    "\n",
+    "Prints a **public share URL** when ready. All features are live in the UI."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-gradio",
+   "metadata": {},
+   "outputs": [],
+   "source": "import html as _h\nimport time\nfrom datetime import datetime, timezone\n\nimport gradio as gr\nimport pandas as pd\n\n\n# ══════════════════════════════════════════════════════════════════════════════\n# COMPRESS TAB — handlers\n# ══════════════════════════════════════════════════���═══════════════════════════\n\n_PALETTE = [\n    '#fde68a', '#bbf7d0', '#bfdbfe', '#fecaca', '#e9d5ff',\n    '#fed7aa', '#99f6e4', '#e0e7ff', '#fce7f3', '#d1fae5',\n]\n_BTN_SHOW = '🔍  Show Token Highlights'\n_BTN_HIDE = '🙈  Hide Token Highlights'\n\n\ndef _render_token_html(text: str) -> str:\n    if not text.strip():\n        return ''\n    tokens = get_token_strings(text)\n    if not tokens:\n        return ''\n    spans = []\n    for i, tok in enumerate(tokens):\n        color   = _PALETTE[i % len(_PALETTE)]\n        display = _h.escape(tok).replace(' ', '<span style=\"opacity:0.35;font-size:0.7em\">·</span>')\n        spans.append(\n            f'<span title=\"token {i+1}\" '\n            f'style=\"background:{color};border-radius:4px;padding:2px 5px;'\n            f'font-family:\\'Courier New\\',monospace;font-size:0.8rem;'\n            f'line-height:2.2;margin:2px 1px;display:inline-block;'\n            f'cursor:default;border:1px solid rgba(0,0,0,0.06)\">{display}</span>'\n        )\n    return (\n        '<div style=\"font-family:system-ui,sans-serif;padding:10px 12px;'\n        'border:1px solid #e5e7eb;border-radius:8px;background:#fafafa\">'\n        f'<div style=\"font-size:0.75rem;color:#6b7280;margin-bottom:8px;font-weight:500\">'\n        f'{len(tokens)} tokens — each chip = one token, hover for index</div>'\n        '<div style=\"line-height:2.6;word-break:break-all;max-height:200px;overflow-y:auto\">'\n        + ''.join(spans) + '</div></div>'\n    )\n\n\ndef toggle_token_panel(is_visible: bool, text: str):\n    new_visible  = not is_visible\n    html_content = _render_token_html(text) if new_visible else ''\n    btn_label    = _BTN_HIDE if new_visible else _BTN_SHOW\n    return new_visible, html_content, gr.update(value=btn_label)\n\n\ndef update_token_panel(text: str, is_visible: bool) -> str:\n    return _render_token_html(text) if is_visible else ''\n\n\n_STATUS_EMPTY = '<span></span>'\n_STATUS_RED = (\n    '<div style=\"background:#fee2e2;border:1px solid #ef4444;color:#b91c1c;'\n    'padding:8px 12px;border-radius:6px;font-size:0.9rem;\">'\n    '🔴 <strong>Compression not needed</strong> — input ({input_tok} tokens) '\n    'is already within the {budget}-token budget.</div>'\n)\n_STATUS_GREEN = (\n    '<div style=\"background:#dcfce7;border:1px solid #22c55e;color:#15803d;'\n    'padding:8px 12px;border-radius:6px;font-size:0.9rem;\">'\n    '🟢 <strong>Ready to compress</strong> — {input_tok} tokens → {budget} token budget '\n    '({delta} tokens to shed).</div>'\n)\n\n\ndef compression_status(text: str, target_tokens: int) -> str:\n    if not text.strip():\n        return _STATUS_EMPTY\n    n = count_tokens(text)\n    if n <= int(target_tokens):\n        return _STATUS_RED.format(input_tok=n, budget=int(target_tokens))\n    return _STATUS_GREEN.format(input_tok=n, budget=int(target_tokens), delta=n - int(target_tokens))\n\n\ndef run_compression(text: str, target_tokens: int):\n    _hidden = gr.update(visible=False)\n    if not text.strip():\n        return ('', 0, 0, 0, 0.0, None,\n                _hidden, _hidden, gr.update(value='', visible=False),\n                gr.update(value='', visible=False), _hidden, gr.update(value='', visible=False))\n\n    t0 = time.perf_counter()\n    compressed, input_tokens, output_tokens = compress(text, int(target_tokens))\n    duration_ms = round((time.perf_counter() - t0) * 1000, 1)\n\n    ratio   = round(output_tokens / input_tokens, 4) if input_tokens else 0.0\n    quality = semantic_score(text, compressed)\n\n    run_id = save_run({\n        'timestamp':         datetime.now(timezone.utc).isoformat(),\n        'model':             get_current_model_id() or LLM_MODEL,\n        'tokenizer':         get_current_tokenizer_id() or LLM_MODEL,\n        'input_tokens':      input_tokens,\n        'output_tokens':     output_tokens,\n        'target_tokens':     int(target_tokens),\n        'compression_ratio': ratio,\n        'quality_score':     quality,\n        'duration_ms':       duration_ms,\n        'input_text':        text,\n        'output_text':       compressed,\n    })\n\n    return (\n        compressed, input_tokens, output_tokens, ratio, quality,\n        run_id,\n        gr.update(visible=True), gr.update(visible=True),\n        gr.update(value='', visible=True),\n        gr.update(value='', visible=False),\n        gr.update(visible=False),\n        gr.update(value='', visible=False),\n    )\n\n\ndef load_model(model_id: str) -> str:\n    if not model_id:\n        return 'No model selected.'\n    try:\n        return switch_llm(model_id)\n    except Exception as exc:\n        return f'Error loading {model_id}: {exc}'\n\n\ndef load_embedder(model_id: str) -> str:\n    if not model_id:\n        return 'No model selected.'\n    try:\n        return switch_embedder(model_id)\n    except Exception as exc:\n        return f'Error loading {model_id}: {exc}'\n\n\ndef on_embedder_change(model_id: str) -> str:\n    return EMBEDDER_INFO.get(model_id, '')\n\n\ndef submit_feedback(run_id, value: int):\n    if run_id is None:\n        return 'Run a compression first.', gr.update(visible=False), gr.update(visible=False), gr.update(value='', visible=False)\n    update_feedback(run_id, value)\n    msg = '👍 Marked as helpful — thanks!' if value == 1 else '👎 Noted — thanks for the feedback!'\n    return msg, gr.update(visible=True), gr.update(visible=True), gr.update(value='', visible=False)\n\n\ndef save_comment(run_id, comment: str):\n    if run_id is None:\n        return gr.update(value='Run a compression first.', visible=True)\n    if not comment.strip():\n        return gr.update(value='Type a note first.', visible=True)\n    update_feedback_comment(run_id, comment.strip())\n    return gr.update(value='✓ Note saved.', visible=True)\n\n\n# ══════════════════════════════════════════════════════════════════════════════\n# HISTORY TAB — handlers\n# ══════════════════════════════════════════════════════════════════════════════\n\n_DEFAULT_COLS = ['id', 'timestamp', 'model', 'compression_ratio', 'quality_score', 'feedback']\n_ALL_COLS = [\n    'id', 'timestamp', 'model', 'tokenizer',\n    'input_tokens', 'output_tokens', 'target_tokens',\n    'compression_ratio', 'quality_score', 'duration_ms',\n    'feedback', 'feedback_comment',\n]\n\n\ndef load_history(selected_cols=None):\n    cols = selected_cols if selected_cols else _DEFAULT_COLS\n    runs = get_runs(limit=100)\n    if not runs:\n        return pd.DataFrame(columns=cols), '', '', ''\n    df       = pd.DataFrame(runs)\n    existing = [c for c in cols if c in df.columns]\n    df       = df[existing]\n    avg_q = f\"{df['quality_score'].mean():.4f}\" if 'quality_score' in df.columns else '—'\n    avg_r = f\"{df['compression_ratio'].mean():.4f}\" if 'compression_ratio' in df.columns else '—'\n    return df, avg_q, avg_r, ''\n\n\ndef on_row_select(evt: gr.SelectData, df: pd.DataFrame):\n    if df is None or df.empty:\n        return None, '', 'No rows available.'\n    row_idx = evt.index[0]\n    run_id  = int(df.iloc[row_idx]['id'])\n    record  = get_run(run_id)\n    if not record:\n        return None, '', f'Row {run_id} not found in database.'\n    return run_id, render_diff_html(record), f'Row {run_id} selected — click Delete to remove.'\n\n\ndef delete_selected(run_id, selected_cols):\n    if run_id is None:\n        df, avg_q, avg_r, _ = load_history(selected_cols)\n        return df, avg_q, avg_r, None, '', 'No row selected.'\n    delete_run(run_id)\n    df, avg_q, avg_r, _ = load_history(selected_cols)\n    return df, avg_q, avg_r, None, '', f'Row {run_id} deleted.'\n\n\n# ══════════════════════════════════════════════════════════════════════════════\n# BUILD APP\n# ══════════════════════════════════════════════════════════════════════════════\n\ndef build_app() -> gr.Blocks:\n    with gr.Blocks(title=APP_TITLE) as app:\n\n        # ── Compress tab ──────────────────────────────────────────────────\n        with gr.Tab('Compress'):\n            gr.Markdown('## TinyPress — Prompt Compression Engine')\n            gr.Markdown(\n                'Paste any long text. Set your token budget. Get a compressed version '\n                'that preserves intent — scored for quality.'\n            )\n\n            with gr.Accordion('Model Settings', open=False):\n                gr.Markdown('**Compression Model**')\n                model_dropdown = gr.Dropdown(\n                    choices=AVAILABLE_MODELS, value=LLM_MODEL,\n                    label='Compression Model', allow_custom_value=True,\n                )\n                load_model_btn = gr.Button('Load Model', variant='secondary')\n                model_status   = gr.Textbox(label='Model Status', value=f'Active: {LLM_MODEL}', interactive=False)\n\n                gr.Divider()\n\n                gr.Markdown('**Scoring Embedder**')\n                embedder_dropdown = gr.Dropdown(\n                    choices=AVAILABLE_EMBEDDER_MODELS, value=EMBEDDER_MODEL,\n                    label='Embedder Model', allow_custom_value=True,\n                )\n                embedder_info_panel = gr.Markdown(value=EMBEDDER_INFO.get(EMBEDDER_MODEL, ''))\n                load_embedder_btn = gr.Button('Load Embedder', variant='secondary')\n                embedder_status   = gr.Textbox(label='Embedder Status', value=f'Active: {EMBEDDER_MODEL}', interactive=False)\n\n            with gr.Row():\n                with gr.Column():\n                    input_text = gr.Textbox(label='Input Text', lines=12, placeholder='Paste your text here...')\n                    token_toggle_btn = gr.Button(_BTN_SHOW, variant='secondary', size='sm')\n                    token_panel      = gr.HTML(value='')\n                    tokens_visible   = gr.State(value=False)\n                    target_slider    = gr.Slider(minimum=100, maximum=1000, value=DEFAULT_TARGET_TOKENS, step=50, label='Target Token Budget')\n                    status_banner    = gr.HTML(value=_STATUS_EMPTY)\n                    compress_btn     = gr.Button('Compress', variant='primary')\n\n                with gr.Column():\n                    output_text = gr.Textbox(label='Compressed Output', lines=12)\n                    with gr.Row():\n                        input_tok  = gr.Number(label='Input Tokens',  interactive=False)\n                        output_tok = gr.Number(label='Output Tokens', interactive=False)\n                    with gr.Row():\n                        ratio   = gr.Number(label='Compression Ratio',   interactive=False)\n                        quality = gr.Number(label='Quality Score (0–1)', interactive=False)\n                    gr.Markdown('**Was this compression helpful?**')\n                    with gr.Row():\n                        thumbs_up_btn   = gr.Button('👍  Helpful',     variant='secondary', visible=False, scale=1)\n                        thumbs_down_btn = gr.Button('👎  Not helpful', variant='secondary', visible=False, scale=1)\n                    feedback_status  = gr.Markdown('', visible=False)\n                    comment_box      = gr.Textbox(\n                        label='Add a note (optional)',\n                        placeholder=\"e.g. 'lost key dates', 'too short', 'great summary'\",\n                        lines=2, visible=False,\n                    )\n                    save_comment_btn = gr.Button('Save note', variant='secondary', size='sm', visible=False)\n                    comment_saved    = gr.Markdown('', visible=False)\n\n            last_run_id = gr.State(value=None)\n\n            token_toggle_btn.click(fn=toggle_token_panel, inputs=[tokens_visible, input_text], outputs=[tokens_visible, token_panel, token_toggle_btn])\n            input_text.change(fn=update_token_panel, inputs=[input_text, tokens_visible], outputs=[token_panel])\n            _sa = dict(inputs=[input_text, target_slider], outputs=[status_banner])\n            input_text.change(fn=compression_status, **_sa)\n            target_slider.change(fn=compression_status, **_sa)\n            load_model_btn.click(fn=load_model, inputs=[model_dropdown], outputs=[model_status])\n            embedder_dropdown.change(fn=on_embedder_change, inputs=[embedder_dropdown], outputs=[embedder_info_panel])\n            load_embedder_btn.click(fn=load_embedder, inputs=[embedder_dropdown], outputs=[embedder_status])\n            compress_btn.click(\n                fn=run_compression,\n                inputs=[input_text, target_slider],\n                outputs=[output_text, input_tok, output_tok, ratio, quality,\n                         last_run_id, thumbs_up_btn, thumbs_down_btn, feedback_status,\n                         comment_box, save_comment_btn, comment_saved],\n            )\n            thumbs_up_btn.click(\n                fn=lambda run_id: submit_feedback(run_id, 1),\n                inputs=[last_run_id],\n                outputs=[feedback_status, comment_box, save_comment_btn, comment_saved],\n            )\n            thumbs_down_btn.click(\n                fn=lambda run_id: submit_feedback(run_id, -1),\n                inputs=[last_run_id],\n                outputs=[feedback_status, comment_box, save_comment_btn, comment_saved],\n            )\n            save_comment_btn.click(fn=save_comment, inputs=[last_run_id, comment_box], outputs=[comment_saved])\n\n        # ── History tab ───────────────────────────────────────────────────\n        with gr.Tab('History') as history_tab:\n            gr.Markdown('## Compression Run History')\n            with gr.Row():\n                refresh_btn = gr.Button('Refresh', variant='secondary')\n                delete_btn  = gr.Button('Delete Selected Row', variant='stop')\n\n            with gr.Accordion('Column visibility', open=False):\n                col_picker = gr.CheckboxGroup(choices=_ALL_COLS, value=_DEFAULT_COLS, label=None)\n\n            with gr.Row():\n                avg_quality = gr.Textbox(label='Avg Quality Score',     interactive=False)\n                avg_ratio   = gr.Textbox(label='Avg Compression Ratio', interactive=False)\n            history_table = gr.DataFrame(label='Past Runs — click a row to see its diff', interactive=False)\n            delete_status = gr.Textbox(label='Status', value='Click a row to select it.', interactive=False)\n            gr.Markdown('### Side-by-side Diff')\n            diff_panel  = gr.HTML(value='')\n            selected_id = gr.State(value=None)\n\n            _outputs = [history_table, avg_quality, avg_ratio, diff_panel]\n            refresh_btn.click(fn=load_history, inputs=[col_picker], outputs=_outputs)\n            history_tab.select(fn=load_history, inputs=[col_picker], outputs=_outputs)\n            col_picker.change(fn=load_history, inputs=[col_picker], outputs=_outputs)\n            history_table.select(fn=on_row_select, inputs=[history_table], outputs=[selected_id, diff_panel, delete_status])\n            delete_btn.click(\n                fn=delete_selected,\n                inputs=[selected_id, col_picker],\n                outputs=[history_table, avg_quality, avg_ratio, selected_id, diff_panel, delete_status],\n            )\n\n    return app\n\n\napp = build_app()\napp.launch(share=True, server_port=SERVER_PORT)"
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cell-s10-hdr",
+   "metadata": {},
+   "source": [
+    "## Step 10 — Programmatic demo (no UI needed)\n",
+    "\n",
+    "Run this cell to compress a sample text directly and inspect all metrics inline."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cell-demo",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "SAMPLE_TEXT = \"\"\"\n",
+    "The transformer architecture, introduced in the seminal paper Attention Is All You Need by Vaswani et al.\n",
+    "in 2017, fundamentally changed how we approach sequence modelling tasks in natural language processing.\n",
+    "Prior to transformers, recurrent neural networks (RNNs) and long short-term memory (LSTM) networks were\n",
+    "the dominant architectures for tasks such as machine translation, text summarisation, and question answering.\n",
+    "However, these models suffered from several limitations: they processed tokens sequentially, making\n",
+    "parallelisation difficult; they struggled to capture long-range dependencies due to vanishing gradients;\n",
+    "and training was slow even on modern hardware. The transformer addressed all of these issues through\n",
+    "its self-attention mechanism, which allows every token in a sequence to directly attend to every other\n",
+    "token in a single operation. Multi-head attention further extends this by running several attention\n",
+    "functions in parallel, capturing different types of relationships between tokens simultaneously.\n",
+    "Position encodings are added to token embeddings to give the model a sense of sequence order, since\n",
+    "unlike RNNs the architecture has no inherent notion of position. Feed-forward sub-layers, layer\n",
+    "normalisation, and residual connections complete each transformer block. The result is a model that\n",
+    "trains faster, scales better with data and compute, and generalises more effectively than its\n",
+    "predecessors, setting the stage for large language models like GPT, BERT, and the entire modern\n",
+    "LLM ecosystem.\n",
+    "\"\"\".strip()\n",
+    "\n",
+    "TARGET = 150  # token budget\n",
+    "\n",
+    "input_tok_count = count_tokens(SAMPLE_TEXT)\n",
+    "print(f'Input tokens : {input_tok_count}')\n",
+    "print(f'Target tokens: {TARGET}')\n",
+    "print(f'Status       : {\"ready to compress\" if input_tok_count > TARGET else \"already within budget\"}')\n",
+    "print()\n",
+    "\n",
+    "t0 = time.perf_counter()\n",
+    "compressed, in_tok, out_tok = compress(SAMPLE_TEXT, TARGET)\n",
+    "elapsed = round((time.perf_counter() - t0) * 1000, 1)\n",
+    "\n",
+    "score = semantic_score(SAMPLE_TEXT, compressed)\n",
+    "ratio = round(out_tok / in_tok, 4)\n",
+    "\n",
+    "print('─' * 60)\n",
+    "print(compressed)\n",
+    "print('─' * 60)\n",
+    "print(f'Output tokens    : {out_tok}')\n",
+    "print(f'Compression ratio: {ratio}')\n",
+    "print(f'Quality score    : {score}')\n",
+    "print(f'Duration         : {elapsed} ms')\n",
+    "print(f'Model            : {get_current_model_id()}')\n",
+    "print(f'Tokenizer        : {get_current_tokenizer_id()}')"
+   ]
+  }
+ ]
+}

ui/compress_tab.py ADDED Viewed

	@@ -0,0 +1,327 @@

+import html as _h
+import time
+from datetime import datetime, timezone
+import gradio as gr
+import config
+from core.compressor import compress
+from core.scorer import semantic_score
+from core.tokenizer_utils import count_tokens, get_token_strings
+from db.store import save_run, update_feedback, update_feedback_comment
+from models.model_loader import get_current_model_id, get_current_tokenizer_id, switch_llm, switch_embedder, get_current_embedder_id
+# ── token colour palette (10 soft pastels, cycles) ───────────────────────────
+_PALETTE = [
+    "#fde68a",  # amber
+    "#bbf7d0",  # emerald
+    "#bfdbfe",  # sky-blue
+    "#fecaca",  # rose
+    "#e9d5ff",  # violet
+    "#fed7aa",  # orange
+    "#99f6e4",  # teal
+    "#e0e7ff",  # indigo
+    "#fce7f3",  # pink
+    "#d1fae5",  # green
+]
+_BTN_SHOW = "🔍  Show Token Highlights"
+_BTN_HIDE = "🙈  Hide Token Highlights"
+# ── token visualiser ─────────────────────────────────────────────────────────
+def _render_token_html(text: str) -> str:
+    if not text.strip():
+        return ""
+    tokens = get_token_strings(text)
+    if not tokens:
+        return ""
+    spans = []
+    for i, tok in enumerate(tokens):
+        color = _PALETTE[i % len(_PALETTE)]
+        # Make leading whitespace visible with a mid-dot; escape everything else.
+        display = _h.escape(tok).replace(
+            " ", '<span style="opacity:0.35;font-size:0.7em">·</span>'
+        )
+        spans.append(
+            f'<span title="token {i + 1} · id" '
+            f'style="background:{color};border-radius:4px;padding:2px 5px;'
+            f'font-family:\'Courier New\',monospace;font-size:0.8rem;'
+            f'line-height:2.2;margin:2px 1px;display:inline-block;'
+            f'cursor:default;border:1px solid rgba(0,0,0,0.06)">{display}</span>'
+        )
+    return (
+        '<div style="font-family:system-ui,sans-serif;padding:10px 12px;'
+        'border:1px solid #e5e7eb;border-radius:8px;background:#fafafa">'
+        f'<div style="font-size:0.75rem;color:#6b7280;margin-bottom:8px;font-weight:500">'
+        f'{len(tokens)} tokens — each chip = one token, hover for index</div>'
+        '<div style="line-height:2.6;word-break:break-all;'
+        'max-height:200px;overflow-y:auto">'
+        + "".join(spans)
+        + "</div></div>"
+    )
+# ── toggle handler ────────────────────────────────────────────────────────────
+def toggle_token_panel(is_visible: bool, text: str):
+    new_visible = not is_visible
+    html_content = _render_token_html(text) if new_visible else ""
+    btn_label = _BTN_HIDE if new_visible else _BTN_SHOW
+    return new_visible, html_content, gr.update(value=btn_label)
+def update_token_panel(text: str, is_visible: bool) -> str:
+    """Called on every keystroke — only re-renders when the panel is open."""
+    if not is_visible:
+        return ""
+    return _render_token_html(text)
+# ── compression status banner ─────────────────────────────────────────────────
+_STATUS_EMPTY = "<span></span>"
+_STATUS_RED = (
+    '<div style="background:#fee2e2;border:1px solid #ef4444;color:#b91c1c;'
+    'padding:8px 12px;border-radius:6px;font-size:0.9rem;">'
+    "🔴 <strong>Compression not needed</strong> — input ({input_tok} tokens) "
+    "is already within the {budget}-token budget."
+    "</div>"
+)
+_STATUS_GREEN = (
+    '<div style="background:#dcfce7;border:1px solid #22c55e;color:#15803d;'
+    'padding:8px 12px;border-radius:6px;font-size:0.9rem;">'
+    "🟢 <strong>Ready to compress</strong> — {input_tok} tokens → {budget} token budget "
+    "({delta} tokens to shed)."
+    "</div>"
+)
+def compression_status(text: str, target_tokens: int) -> str:
+    if not text.strip():
+        return _STATUS_EMPTY
+    n = count_tokens(text)
+    if n <= int(target_tokens):
+        return _STATUS_RED.format(input_tok=n, budget=int(target_tokens))
+    return _STATUS_GREEN.format(input_tok=n, budget=int(target_tokens), delta=n - int(target_tokens))
+# ── core handlers ─────────────────────────────────────────────────────────────
+def run_compression(text: str, target_tokens: int):
+    _hidden = gr.update(visible=False)
+    if not text.strip():
+        return ("", 0, 0, 0, 0.0, None,
+                _hidden, _hidden, gr.update(value="", visible=False),
+                gr.update(value="", visible=False), _hidden, gr.update(value="", visible=False))
+    t0 = time.perf_counter()
+    compressed, input_tokens, output_tokens = compress(text, int(target_tokens))
+    duration_ms = round((time.perf_counter() - t0) * 1000, 1)
+    ratio = round(output_tokens / input_tokens, 4) if input_tokens else 0.0
+    quality = semantic_score(text, compressed)
+    run_id = save_run({
+        "timestamp": datetime.now(timezone.utc).isoformat(),
+        "model": get_current_model_id() or config.LLM_MODEL,
+        "tokenizer": get_current_tokenizer_id() or config.LLM_MODEL,
+        "input_tokens": input_tokens,
+        "output_tokens": output_tokens,
+        "target_tokens": int(target_tokens),
+        "compression_ratio": ratio,
+        "quality_score": quality,
+        "duration_ms": duration_ms,
+        "input_text": text,
+        "output_text": compressed,
+    })
+    return (
+        compressed, input_tokens, output_tokens, ratio, quality,
+        run_id,
+        gr.update(visible=True), gr.update(visible=True),    # thumbs buttons
+        gr.update(value="", visible=True),                    # feedback_status
+        gr.update(value="", visible=False),                   # comment_box reset
+        gr.update(visible=False),                             # save_comment_btn reset
+        gr.update(value="", visible=False),                   # comment_saved reset
+    )
+def load_model(model_id: str) -> str:
+    if not model_id:
+        return "No model selected."
+    try:
+        return switch_llm(model_id)
+    except Exception as exc:
+        return f"Error loading {model_id}: {exc}"
+def load_embedder(model_id: str) -> str:
+    if not model_id:
+        return "No model selected."
+    try:
+        return switch_embedder(model_id)
+    except Exception as exc:
+        return f"Error loading {model_id}: {exc}"
+def on_embedder_change(model_id: str) -> str:
+    return config.EMBEDDER_INFO.get(model_id, "")
+def submit_feedback(run_id, value: int):
+    if run_id is None:
+        return "Run a compression first.", gr.update(visible=False), gr.update(visible=False), gr.update(value="", visible=False)
+    update_feedback(run_id, value)
+    msg = "👍 Marked as helpful — thanks!" if value == 1 else "👎 Noted — thanks for the feedback!"
+    return msg, gr.update(visible=True), gr.update(visible=True), gr.update(value="", visible=False)
+def save_comment(run_id, comment: str):
+    if run_id is None:
+        return gr.update(value="Run a compression first.", visible=True)
+    if not comment.strip():
+        return gr.update(value="Type a note first.", visible=True)
+    update_feedback_comment(run_id, comment.strip())
+    return gr.update(value="✓ Note saved.", visible=True)
+# ── UI ────────────────────────────────────────────────────────────────────────
+def build_compress_tab() -> gr.Tab:
+    with gr.Tab("Compress") as tab:
+        gr.Markdown("## TinyPress — Prompt Compression Engine")
+        gr.Markdown(
+            "Paste any long text. Set your token budget. Get a compressed version "
+            "that preserves intent — scored for quality."
+        )
+        with gr.Accordion("Model Settings", open=False):
+            gr.Markdown("**Compression Model**")
+            model_dropdown = gr.Dropdown(
+                choices=config.AVAILABLE_MODELS,
+                value=config.LLM_MODEL,
+                label="Compression Model",
+                allow_custom_value=True,
+            )
+            load_model_btn = gr.Button("Load Model", variant="secondary")
+            model_status = gr.Textbox(
+                label="Model Status",
+                value=f"Active: {config.LLM_MODEL}",
+                interactive=False,
+            )
+            gr.Divider()
+            gr.Markdown("**Scoring Embedder**")
+            embedder_dropdown = gr.Dropdown(
+                choices=config.AVAILABLE_EMBEDDER_MODELS,
+                value=config.EMBEDDER_MODEL,
+                label="Embedder Model",
+                allow_custom_value=True,
+            )
+            embedder_info_panel = gr.Markdown(
+                value=config.EMBEDDER_INFO.get(config.EMBEDDER_MODEL, "")
+            )
+            load_embedder_btn = gr.Button("Load Embedder", variant="secondary")
+            embedder_status = gr.Textbox(
+                label="Embedder Status",
+                value=f"Active: {config.EMBEDDER_MODEL}",
+                interactive=False,
+            )
+        with gr.Row():
+            with gr.Column():
+                input_text = gr.Textbox(
+                    label="Input Text",
+                    lines=12,
+                    placeholder="Paste your text here...",
+                )
+                # ── token highlight panel ──────────────────────────────────
+                token_toggle_btn = gr.Button(_BTN_SHOW, variant="secondary", size="sm")
+                token_panel = gr.HTML(value="")
+                tokens_visible = gr.State(value=False)
+                # ──────────────────────────────────────────────────────────
+                target_slider = gr.Slider(
+                    minimum=100,
+                    maximum=1000,
+                    value=config.DEFAULT_TARGET_TOKENS,
+                    step=50,
+                    label="Target Token Budget",
+                )
+                status_banner = gr.HTML(value=_STATUS_EMPTY)
+                compress_btn = gr.Button("Compress", variant="primary")
+            with gr.Column():
+                output_text = gr.Textbox(label="Compressed Output", lines=12)
+                with gr.Row():
+                    input_tok = gr.Number(label="Input Tokens", interactive=False)
+                    output_tok = gr.Number(label="Output Tokens", interactive=False)
+                with gr.Row():
+                    ratio = gr.Number(label="Compression Ratio", interactive=False)
+                    quality = gr.Number(label="Quality Score (0–1)", interactive=False)
+                gr.Markdown("**Was this compression helpful?**")
+                with gr.Row():
+                    thumbs_up_btn   = gr.Button("👍  Helpful",      variant="secondary", visible=False, scale=1)
+                    thumbs_down_btn = gr.Button("👎  Not helpful",  variant="secondary", visible=False, scale=1)
+                feedback_status = gr.Markdown("", visible=False)
+                comment_box = gr.Textbox(
+                    label="Add a note (optional)",
+                    placeholder="e.g. 'lost key dates', 'too short', 'great summary'",
+                    lines=2,
+                    visible=False,
+                )
+                save_comment_btn = gr.Button("Save note", variant="secondary", size="sm", visible=False)
+                comment_saved = gr.Markdown("", visible=False)
+        last_run_id = gr.State(value=None)
+        # ── event wiring ──────────────────────────────────────────────────
+        token_toggle_btn.click(
+            fn=toggle_token_panel,
+            inputs=[tokens_visible, input_text],
+            outputs=[tokens_visible, token_panel, token_toggle_btn],
+        )
+        input_text.change(
+            fn=update_token_panel,
+            inputs=[input_text, tokens_visible],
+            outputs=[token_panel],
+        )
+        _status_args = dict(inputs=[input_text, target_slider], outputs=[status_banner])
+        input_text.change(fn=compression_status, **_status_args)
+        target_slider.change(fn=compression_status, **_status_args)
+        load_model_btn.click(fn=load_model, inputs=[model_dropdown], outputs=[model_status])
+        embedder_dropdown.change(fn=on_embedder_change, inputs=[embedder_dropdown], outputs=[embedder_info_panel])
+        load_embedder_btn.click(fn=load_embedder, inputs=[embedder_dropdown], outputs=[embedder_status])
+        compress_btn.click(
+            fn=run_compression,
+            inputs=[input_text, target_slider],
+            outputs=[output_text, input_tok, output_tok, ratio, quality,
+                     last_run_id, thumbs_up_btn, thumbs_down_btn, feedback_status,
+                     comment_box, save_comment_btn, comment_saved],
+        )
+        thumbs_up_btn.click(
+            fn=lambda run_id: submit_feedback(run_id, 1),
+            inputs=[last_run_id],
+            outputs=[feedback_status, comment_box, save_comment_btn, comment_saved],
+        )
+        thumbs_down_btn.click(
+            fn=lambda run_id: submit_feedback(run_id, -1),
+            inputs=[last_run_id],
+            outputs=[feedback_status, comment_box, save_comment_btn, comment_saved],
+        )
+        save_comment_btn.click(
+            fn=save_comment,
+            inputs=[last_run_id, comment_box],
+            outputs=[comment_saved],
+        )
+    return tab

ui/history_tab.py ADDED Viewed

	@@ -0,0 +1,96 @@

+import gradio as gr
+import pandas as pd
+from db.store import get_runs, delete_run, get_run
+from core.diff import render_diff_html
+_DEFAULT_COLS = ["id", "timestamp", "model", "compression_ratio", "quality_score", "feedback"]
+_ALL_COLS = [
+    "id", "timestamp", "model", "tokenizer",
+    "input_tokens", "output_tokens", "target_tokens",
+    "compression_ratio", "quality_score", "duration_ms",
+    "feedback", "feedback_comment",
+]
+def load_history(selected_cols=None):
+    cols = selected_cols if selected_cols else _DEFAULT_COLS
+    runs = get_runs(limit=100)
+    if not runs:
+        return pd.DataFrame(columns=cols), "", "", ""
+    df = pd.DataFrame(runs)
+    existing = [c for c in cols if c in df.columns]
+    df = df[existing]
+    avg_quality = f"{df['quality_score'].mean():.4f}" if "quality_score" in df.columns else "—"
+    avg_ratio = f"{df['compression_ratio'].mean():.4f}" if "compression_ratio" in df.columns else "—"
+    return df, avg_quality, avg_ratio, ""
+def on_row_select(evt: gr.SelectData, df: pd.DataFrame):
+    if df is None or df.empty:
+        return None, "", "No rows available."
+    row_idx = evt.index[0]
+    run_id = int(df.iloc[row_idx]["id"])
+    record = get_run(run_id)
+    if not record:
+        return None, "", f"Row {run_id} not found in database."
+    diff_html = render_diff_html(record)
+    return run_id, diff_html, f"Row {run_id} selected — click Delete to remove."
+def delete_selected(run_id, selected_cols):
+    if run_id is None:
+        df, avg_q, avg_r, _ = load_history(selected_cols)
+        return df, avg_q, avg_r, None, "", "No row selected."
+    delete_run(run_id)
+    df, avg_q, avg_r, _ = load_history(selected_cols)
+    return df, avg_q, avg_r, None, "", f"Row {run_id} deleted."
+def build_history_tab() -> gr.Tab:
+    with gr.Tab("History") as tab:
+        gr.Markdown("## Compression Run History")
+        with gr.Row():
+            refresh_btn = gr.Button("Refresh", variant="secondary")
+            delete_btn  = gr.Button("Delete Selected Row", variant="stop")
+        with gr.Accordion("Column visibility", open=False):
+            col_picker = gr.CheckboxGroup(
+                choices=_ALL_COLS,
+                value=_DEFAULT_COLS,
+                label=None,
+            )
+        with gr.Row():
+            avg_quality = gr.Textbox(label="Avg Quality Score",     interactive=False)
+            avg_ratio   = gr.Textbox(label="Avg Compression Ratio", interactive=False)
+        history_table = gr.DataFrame(
+            label="Past Runs — click a row to see its diff",
+            interactive=False,
+        )
+        delete_status = gr.Textbox(
+            label="Status", value="Click a row to select it.", interactive=False
+        )
+        gr.Markdown("### Side-by-side Diff")
+        diff_panel  = gr.HTML(value="")
+        selected_id = gr.State(value=None)
+        _outputs = [history_table, avg_quality, avg_ratio, diff_panel]
+        refresh_btn.click(fn=load_history, inputs=[col_picker], outputs=_outputs)
+        tab.select(fn=load_history, inputs=[col_picker], outputs=_outputs)
+        col_picker.change(fn=load_history, inputs=[col_picker], outputs=_outputs)
+        history_table.select(
+            fn=on_row_select,
+            inputs=[history_table],
+            outputs=[selected_id, diff_panel, delete_status],
+        )
+        delete_btn.click(
+            fn=delete_selected,
+            inputs=[selected_id, col_picker],
+            outputs=[history_table, avg_quality, avg_ratio, selected_id, diff_panel, delete_status],
+        )
+    return tab