tiny-press / docs /architecture.md
sriharsha-cr's picture
Project files
ebc3bf5

A newer version of the Gradio SDK is available: 6.18.0

Upgrade

Architecture

TinyPress is built modular β€” each concern lives in its own place, nothing bleeds into something it shouldn't.

How a compression request flows

User Input (Gradio UI)
        β”‚
        β–Ό
  core/compressor.py       ← builds the prompt, calls the model, trims if it overshoots
        β”‚
        β–Ό
  models/model_loader.py   ← Qwen2.5-1.5B-Instruct, loaded once and reused
        β”‚
        β–Ό
  core/scorer.py           ← checks how much meaning survived using all-MiniLM-L6-v2
        β”‚
        β–Ό
  db/store.py              ← saves the run to SQLite
        β”‚
        β–Ό
  ui/compress_tab.py       ← shows the result and metrics back to the user

What each module does

Module Responsibility
app.py Starts everything β€” DB init, model load, Gradio launch
config.py One place for all settings β€” model names, token limits, DB path, port
ui/compress_tab.py The compression interface β€” input, slider, output, metrics
ui/history_tab.py History view β€” past runs, averages, trends
core/compressor.py Builds the compression prompt, runs generation, hard-trims if needed
core/scorer.py Cosine similarity between original and compressed text
core/tokenizer_utils.py Token counting and per-token string extraction using the LLM's own tokenizer
core/diff.py Word-level SequenceMatcher diff β€” produces annotated HTML for the history side-by-side view
models/model_loader.py Singleton model store β€” loads LLM + embedder on demand, supports hot-swapping both via switch_llm / switch_embedder
db/store.py SQLite operations β€” init, save a run, fetch history, delete a run; auto-migrates older DBs
db/schema.sql The compression_runs table definition

A few decisions worth knowing

Models load once at startup. This matters on a laptop β€” you don't want to reload a 1.5B model on every request. Both the LLM and the embedder are held in memory after the first load.

Model hot-swapping without a restart. The Model Settings accordion in the UI lets you pick a different compression model or scoring embedder mid-session. Both switch_llm and switch_embedder in model_loader.py unload the current model (deletes the references, calls gc.collect, and flushes the CUDA cache if a GPU is present) before loading the new one β€” so you don't end up with two large models in memory at once.

Hard token trim as a safety net. If the model overshoots the target budget, the output gets trimmed at the tokenizer level. It's a fallback, not the primary path β€” the prompt already asks the model to stay within budget.

Thin UI layer. The Gradio handlers in ui/ don't contain logic. They take inputs, call into core/, and return outputs. All the real work happens in core/ and db/.

DB auto-migration. store.py runs ALTER TABLE … ADD COLUMN for tokenizer, duration_ms, feedback, and feedback_comment on startup β€” so existing databases from earlier builds upgrade silently rather than crashing. feedback is nullable (INTEGER): NULL = no rating, 1 = πŸ‘, -1 = πŸ‘Ž. feedback_comment holds the optional text note.

🏠 README.md