Spaces:

build-small-hackathon
/

tiny-press

Running on Zero

App Files Files Community

tiny-press / docs /architecture.md

sriharsha-cr

Project files

ebc3bf5 1 day ago

preview code

raw

history blame contribute delete

3.27 kB

	# Architecture

	TinyPress is built modular — each concern lives in its own place, nothing bleeds into something it shouldn't.

	## How a compression request flows

	```
	User Input (Gradio UI)
	│
	▼
	core/compressor.py ← builds the prompt, calls the model, trims if it overshoots
	│
	▼
	models/model_loader.py ← Qwen2.5-1.5B-Instruct, loaded once and reused
	│
	▼
	core/scorer.py ← checks how much meaning survived using all-MiniLM-L6-v2
	│
	▼
	db/store.py ← saves the run to SQLite
	│
	▼
	ui/compress_tab.py ← shows the result and metrics back to the user
	```

	## What each module does

	\| Module \| Responsibility \|
	\|---\|---\|
	\| `app.py` \| Starts everything — DB init, model load, Gradio launch \|
	\| `config.py` \| One place for all settings — model names, token limits, DB path, port \|
	\| `ui/compress_tab.py` \| The compression interface — input, slider, output, metrics \|
	\| `ui/history_tab.py` \| History view — past runs, averages, trends \|
	\| `core/compressor.py` \| Builds the compression prompt, runs generation, hard-trims if needed \|
	\| `core/scorer.py` \| Cosine similarity between original and compressed text \|
	\| `core/tokenizer_utils.py` \| Token counting and per-token string extraction using the LLM's own tokenizer \|
	\| `core/diff.py` \| Word-level SequenceMatcher diff — produces annotated HTML for the history side-by-side view \|
	\| `models/model_loader.py` \| Singleton model store — loads LLM + embedder on demand, supports hot-swapping both via `switch_llm` / `switch_embedder` \|
	\| `db/store.py` \| SQLite operations — init, save a run, fetch history, delete a run; auto-migrates older DBs \|
	\| `db/schema.sql` \| The `compression_runs` table definition \|

	## A few decisions worth knowing

	Models load once at startup. This matters on a laptop — you don't want to reload a 1.5B model on every request. Both the LLM and the embedder are held in memory after the first load.

	Model hot-swapping without a restart. The Model Settings accordion in the UI lets you pick a different compression model or scoring embedder mid-session. Both `switch_llm` and `switch_embedder` in `model_loader.py` unload the current model (deletes the references, calls `gc.collect`, and flushes the CUDA cache if a GPU is present) before loading the new one — so you don't end up with two large models in memory at once.

	Hard token trim as a safety net. If the model overshoots the target budget, the output gets trimmed at the tokenizer level. It's a fallback, not the primary path — the prompt already asks the model to stay within budget.

	Thin UI layer. The Gradio handlers in `ui/` don't contain logic. They take inputs, call into `core/`, and return outputs. All the real work happens in `core/` and `db/`.

	DB auto-migration. `store.py` runs `ALTER TABLE … ADD COLUMN` for `tokenizer`, `duration_ms`, `feedback`, and `feedback_comment` on startup — so existing databases from earlier builds upgrade silently rather than crashing. `feedback` is nullable (`INTEGER`): `NULL` = no rating, `1` = 👍, `-1` = 👎. `feedback_comment` holds the optional text note.


	🏠 [README.md](../README.md)