Spaces:
Running on Zero
title: Tiny Press
emoji: π
colorFrom: indigo
colorTo: gray
sdk: gradio
sdk_version: 6.18.0
python_version: '3.12'
app_file: app.py
pinned: false
license: mit
short_description: Compress any text to a token budget locally.
models:
- Qwen/Qwen2.5-1.5B-Instruct
tags:
- gradio
- build-small-hackathon
- thousand-token-wood
- text-compression
- prompt-optimization
- local-inference
TinyPress β Prompt Compression Engine
HuggingFace Build Small Hackathon Β· Track: Thousand Token Wood
The constraint is the feature. Give TinyPress a long piece of text, set a token budget, and get back a compressed version that still carries the meaning β scored, saved, and diffed so you can see exactly what was kept and what was shed.
No cloud. No API bill. Two small models running quietly on your machine.
Demo
Why this fits Thousand Token Wood
Working inside a tight token budget is not a limitation to work around β it is the problem worth solving. LLM context windows are finite, prompt costs are real, and bloated inputs degrade output quality. TinyPress treats the token count as a hard constraint and makes compression the primary interaction: you set the budget, the model meets it, and a quality score tells you how much meaning survived.
Features
| ποΈ Token-budget compression | Set a target (100β1000 tokens) and compress to exactly that budget |
| π Quality score | Cosine similarity between original and compressed text β 0 to 1, higher is better |
| π’π΄ Live readiness banner | Green when input is over budget and compression will run; red when already within budget |
| π Token highlight panel | Every token rendered as a colour-coded chip so you can see where your budget is going |
| π Model hot-swap | Switch the compression LLM mid-session without a restart (5 curated models, or any HF model ID) |
| π― Embedder hot-swap | Switch the scoring embedder with per-model trade-off info (speed vs quality vs RAM) |
| ππ Feedback capture | Rate every result, add an optional text note β saved instantly to SQLite |
| π Run history | Every compression persisted locally with full metrics and configurable column visibility |
| π Side-by-side diff | Word-level colour diff β dropped (red), rewritten (amber), inserted (green), unchanged (plain) |
Models
| Role | Default | Alternatives |
|---|---|---|
| Compression LLM | Qwen/Qwen2.5-1.5B-Instruct |
Qwen2.5-0.5B, SmolLM2-1.7B, Phi-3.5-mini, Llama-3.2-1B |
| Quality scorer | sentence-transformers/all-MiniLM-L6-v2 |
mpnet-base, bge-small, bge-base, mxbai-large, gte-Qwen2-1.5B |
All models are open-weight and under 32B. Everything runs locally β no API calls, no data leaves your machine.
Get started
python -m venv .venv
# Windows
.venv\Scripts\activate
# macOS / Linux
source .venv/bin/activate
pip install -r requirements.txt
python app.py
Open http://localhost:7860. That's it.
Run it in Colab: open tinypress_colab.ipynb β it installs dependencies, loads the models, and launches a public Gradio share URL. GPU runtime recommended for faster inference.
Optional environment overrides:
| Variable | Default | Description |
|---|---|---|
LLM_MODEL |
Qwen/Qwen2.5-1.5B-Instruct |
Compression model |
EMBEDDER_MODEL |
sentence-transformers/all-MiniLM-L6-v2 |
Scoring embedder |
DB_PATH |
tinypress.db |
SQLite database path |
PORT |
7860 |
Gradio server port |
Hardware
| Minimum | Recommended | |
|---|---|---|
| RAM | 8 GB | 16 GB |
| VRAM | CPU-only works | 4 GB GPU speeds up inference |
| Disk | ~4 GB | ~4 GB |
Architecture
Input text + token budget
β
core/compressor.py β builds prompt, calls LLM, hard-trims if it overshoots
β
models/model_loader.py β Qwen2.5-1.5B (or swapped model), loaded once, reused
β
core/scorer.py β cosine similarity via sentence-transformer embedder
β
db/store.py β saves run to SQLite
β
ui/compress_tab.py β shows result, metrics, feedback UI
Thin UI layer β Gradio handlers pass inputs to core/, return outputs. All logic lives in core/ and db/.
Full docs: Architecture Β· Setup Β· Get Started Β· Folder Structure
About
Built by Sriharsha C R β AI Engineer and Cloud Native developer.
