Spaces:
Running on Zero
Running on Zero
| # Architecture | |
| TinyPress is built modular β each concern lives in its own place, nothing bleeds into something it shouldn't. | |
| ## How a compression request flows | |
| ``` | |
| User Input (Gradio UI) | |
| β | |
| βΌ | |
| core/compressor.py β builds the prompt, calls the model, trims if it overshoots | |
| β | |
| βΌ | |
| models/model_loader.py β Qwen2.5-1.5B-Instruct, loaded once and reused | |
| β | |
| βΌ | |
| core/scorer.py β checks how much meaning survived using all-MiniLM-L6-v2 | |
| β | |
| βΌ | |
| db/store.py β saves the run to SQLite | |
| β | |
| βΌ | |
| ui/compress_tab.py β shows the result and metrics back to the user | |
| ``` | |
| ## What each module does | |
| | Module | Responsibility | | |
| |---|---| | |
| | `app.py` | Starts everything β DB init, model load, Gradio launch | | |
| | `config.py` | One place for all settings β model names, token limits, DB path, port | | |
| | `ui/compress_tab.py` | The compression interface β input, slider, output, metrics | | |
| | `ui/history_tab.py` | History view β past runs, averages, trends | | |
| | `core/compressor.py` | Builds the compression prompt, runs generation, hard-trims if needed | | |
| | `core/scorer.py` | Cosine similarity between original and compressed text | | |
| | `core/tokenizer_utils.py` | Token counting and per-token string extraction using the LLM's own tokenizer | | |
| | `core/diff.py` | Word-level SequenceMatcher diff β produces annotated HTML for the history side-by-side view | | |
| | `models/model_loader.py` | Singleton model store β loads LLM + embedder on demand, supports hot-swapping both via `switch_llm` / `switch_embedder` | | |
| | `db/store.py` | SQLite operations β init, save a run, fetch history, delete a run; auto-migrates older DBs | | |
| | `db/schema.sql` | The `compression_runs` table definition | | |
| ## A few decisions worth knowing | |
| **Models load once at startup.** This matters on a laptop β you don't want to reload a 1.5B model on every request. Both the LLM and the embedder are held in memory after the first load. | |
| **Model hot-swapping without a restart.** The Model Settings accordion in the UI lets you pick a different compression model or scoring embedder mid-session. Both `switch_llm` and `switch_embedder` in `model_loader.py` unload the current model (deletes the references, calls `gc.collect`, and flushes the CUDA cache if a GPU is present) before loading the new one β so you don't end up with two large models in memory at once. | |
| **Hard token trim as a safety net.** If the model overshoots the target budget, the output gets trimmed at the tokenizer level. It's a fallback, not the primary path β the prompt already asks the model to stay within budget. | |
| **Thin UI layer.** The Gradio handlers in `ui/` don't contain logic. They take inputs, call into `core/`, and return outputs. All the real work happens in `core/` and `db/`. | |
| **DB auto-migration.** `store.py` runs `ALTER TABLE β¦ ADD COLUMN` for `tokenizer`, `duration_ms`, `feedback`, and `feedback_comment` on startup β so existing databases from earlier builds upgrade silently rather than crashing. `feedback` is nullable (`INTEGER`): `NULL` = no rating, `1` = π, `-1` = π. `feedback_comment` holds the optional text note. | |
| π [README.md](../README.md) | |