Spaces:
Running on Zero
A newer version of the Gradio SDK is available: 6.18.0
title: Tiny Press
emoji: π
colorFrom: indigo
colorTo: gray
sdk: gradio
sdk_version: 6.15.2
python_version: '3.12'
app_file: app.py
pinned: false
license: mit
short_description: Compress any text to a token budget locally.
models:
- Qwen/Qwen2.5-1.5B-Instruct
tags:
- gradio
- build-small-hackathon
- thousand-token-wood
- text-compression
- prompt-optimization
- local-inference
TinyPress β Prompt Compression Engine
HuggingFace Build Small Hackathon Β· Track: Thousand Token Wood
The constraint is the feature. Give TinyPress a long piece of text, set a token budget, and get back a compressed version that still carries the meaning β scored, saved, and diffed so you can see exactly what was kept and what was shed.
No cloud. No API bill. Two small models running quietly on your machine.
Demo
π» Try @ https://huggingface.co/spaces/build-small-hackathon/tiny-press
π©βπ»Notebook @ https://colab.research.google.com/github/SriharshaCR/tiny-press/blob/task/bootstrap/tinypress_colab.ipynb
Social Media Posts
- https://x.com/sriharsha_cr/status/2065662576684650879
- https://www.linkedin.com/posts/sriharsha-cr_tinypress-prompt-compression-engine-activity-7471426128331624448-aKfe
Why this fits Thousand Token Wood
Working inside a tight token budget is not a limitation to work around β it is the problem worth solving. LLM context windows are finite, prompt costs are real, and bloated inputs degrade output quality. TinyPress treats the token count as a hard constraint and makes compression the primary interaction: you set the budget, the model meets it, and a quality score tells you how much meaning survived.
Features
| ποΈ Token-budget compression | Set a target (100β1000 tokens) and compress to exactly that budget |
| π Quality score | Cosine similarity between original and compressed text β 0 to 1, higher is better |
| π’π΄ Live readiness banner | Green when input is over budget and compression will run; red when already within budget |
| π Token highlight panel | Every token rendered as a colour-coded chip so you can see where your budget is going |
| π Model hot-swap | Switch the compression LLM mid-session without a restart (5 curated models, or any HF model ID) |
| π― Embedder hot-swap | Switch the scoring embedder with per-model trade-off info (speed vs quality vs RAM) |
| ππ Feedback capture | Rate every result, add an optional text note β saved instantly to SQLite |
| π Run history | Every compression persisted locally with full metrics and configurable column visibility |
| π Side-by-side diff | Word-level colour diff β dropped (red), rewritten (amber), inserted (green), unchanged (plain) |
Models
| Role | Default | Alternatives |
|---|---|---|
| Compression LLM | Qwen/Qwen2.5-1.5B-Instruct |
Qwen2.5-0.5B, SmolLM2-1.7B, Phi-3.5-mini, Llama-3.2-1B |
| Quality scorer | sentence-transformers/all-MiniLM-L6-v2 |
mpnet-base, bge-small, bge-base, mxbai-large, gte-Qwen2-1.5B |
All models are open-weight and under 32B. Everything runs locally β no API calls, no data leaves your machine.
Get started
python -m venv .venv
# Windows
.venv\Scripts\activate
# macOS / Linux
source .venv/bin/activate
pip install -r requirements.txt
python app.py
Open http://localhost:7860. That's it.
Run it in Colab: open tinypress_colab.ipynb β it installs dependencies, loads the models, and launches a public Gradio share URL. GPU runtime recommended for faster inference.
Optional environment overrides:
| Variable | Default | Description |
|---|---|---|
LLM_MODEL |
Qwen/Qwen2.5-1.5B-Instruct |
Compression model |
EMBEDDER_MODEL |
sentence-transformers/all-MiniLM-L6-v2 |
Scoring embedder |
DB_PATH |
tinypress.db |
SQLite database path |
PORT |
7860 |
Gradio server port |
Hardware
| Minimum | Recommended | |
|---|---|---|
| RAM | 8 GB | 16 GB |
| VRAM | CPU-only works | 4 GB GPU speeds up inference |
| Disk | ~4 GB | ~4 GB |
Architecture
Input text + token budget
β
core/compressor.py β builds prompt, calls LLM, hard-trims if it overshoots
β
models/model_loader.py β Qwen2.5-1.5B (or swapped model), loaded once, reused
β
core/scorer.py β cosine similarity via sentence-transformer embedder
β
db/store.py β saves run to SQLite
β
ui/compress_tab.py β shows result, metrics, feedback UI
Thin UI layer β Gradio handlers pass inputs to core/, return outputs. All logic lives in core/ and db/.
Full docs: Architecture Β· Setup Β· Get Started Β· Folder Structure
About
Built by Sriharsha C R β AI Engineer and Cloud Native developer.
