File size: 3,266 Bytes
ebc3bf5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
# Architecture

TinyPress is built modular β€” each concern lives in its own place, nothing bleeds into something it shouldn't.

## How a compression request flows

```
User Input (Gradio UI)
        β”‚
        β–Ό
  core/compressor.py       ← builds the prompt, calls the model, trims if it overshoots
        β”‚
        β–Ό
  models/model_loader.py   ← Qwen2.5-1.5B-Instruct, loaded once and reused
        β”‚
        β–Ό
  core/scorer.py           ← checks how much meaning survived using all-MiniLM-L6-v2
        β”‚
        β–Ό
  db/store.py              ← saves the run to SQLite
        β”‚
        β–Ό
  ui/compress_tab.py       ← shows the result and metrics back to the user
```

## What each module does

| Module | Responsibility |
|---|---|
| `app.py` | Starts everything β€” DB init, model load, Gradio launch |
| `config.py` | One place for all settings β€” model names, token limits, DB path, port |
| `ui/compress_tab.py` | The compression interface β€” input, slider, output, metrics |
| `ui/history_tab.py` | History view β€” past runs, averages, trends |
| `core/compressor.py` | Builds the compression prompt, runs generation, hard-trims if needed |
| `core/scorer.py` | Cosine similarity between original and compressed text |
| `core/tokenizer_utils.py` | Token counting and per-token string extraction using the LLM's own tokenizer |
| `core/diff.py` | Word-level SequenceMatcher diff β€” produces annotated HTML for the history side-by-side view |
| `models/model_loader.py` | Singleton model store β€” loads LLM + embedder on demand, supports hot-swapping both via `switch_llm` / `switch_embedder` |
| `db/store.py` | SQLite operations β€” init, save a run, fetch history, delete a run; auto-migrates older DBs |
| `db/schema.sql` | The `compression_runs` table definition |

## A few decisions worth knowing

**Models load once at startup.** This matters on a laptop β€” you don't want to reload a 1.5B model on every request. Both the LLM and the embedder are held in memory after the first load.

**Model hot-swapping without a restart.** The Model Settings accordion in the UI lets you pick a different compression model or scoring embedder mid-session. Both `switch_llm` and `switch_embedder` in `model_loader.py` unload the current model (deletes the references, calls `gc.collect`, and flushes the CUDA cache if a GPU is present) before loading the new one β€” so you don't end up with two large models in memory at once.

**Hard token trim as a safety net.** If the model overshoots the target budget, the output gets trimmed at the tokenizer level. It's a fallback, not the primary path β€” the prompt already asks the model to stay within budget.

**Thin UI layer.** The Gradio handlers in `ui/` don't contain logic. They take inputs, call into `core/`, and return outputs. All the real work happens in `core/` and `db/`.

**DB auto-migration.** `store.py` runs `ALTER TABLE … ADD COLUMN` for `tokenizer`, `duration_ms`, `feedback`, and `feedback_comment` on startup β€” so existing databases from earlier builds upgrade silently rather than crashing. `feedback` is nullable (`INTEGER`): `NULL` = no rating, `1` = πŸ‘, `-1` = πŸ‘Ž. `feedback_comment` holds the optional text note.


🏠 [README.md](../README.md)