tiny-press / README.md
sriharsha-cr's picture
Preview updates
65b0baa

A newer version of the Gradio SDK is available: 6.18.0

Upgrade
metadata
title: Tiny Press
emoji: πŸ“Š
colorFrom: indigo
colorTo: gray
sdk: gradio
sdk_version: 6.15.2
python_version: '3.12'
app_file: app.py
pinned: false
license: mit
short_description: Compress any text to a token budget locally.
models:
  - Qwen/Qwen2.5-1.5B-Instruct
tags:
  - gradio
  - build-small-hackathon
  - thousand-token-wood
  - text-compression
  - prompt-optimization
  - local-inference

TinyPress β€” Prompt Compression Engine

HuggingFace Build Small Hackathon Β· Track: Thousand Token Wood

The constraint is the feature. Give TinyPress a long piece of text, set a token budget, and get back a compressed version that still carries the meaning β€” scored, saved, and diffed so you can see exactly what was kept and what was shed.

No cloud. No API bill. Two small models running quietly on your machine.


Demo

TinyPress Demo

πŸ’» Try @ https://huggingface.co/spaces/build-small-hackathon/tiny-press

πŸ‘©β€πŸ’»Notebook @ https://colab.research.google.com/github/SriharshaCR/tiny-press/blob/task/bootstrap/tinypress_colab.ipynb

Social Media Posts


Why this fits Thousand Token Wood

Working inside a tight token budget is not a limitation to work around β€” it is the problem worth solving. LLM context windows are finite, prompt costs are real, and bloated inputs degrade output quality. TinyPress treats the token count as a hard constraint and makes compression the primary interaction: you set the budget, the model meets it, and a quality score tells you how much meaning survived.


Features

πŸ—œοΈ Token-budget compression Set a target (100–1000 tokens) and compress to exactly that budget
πŸ“Š Quality score Cosine similarity between original and compressed text β€” 0 to 1, higher is better
πŸŸ’πŸ”΄ Live readiness banner Green when input is over budget and compression will run; red when already within budget
πŸ” Token highlight panel Every token rendered as a colour-coded chip so you can see where your budget is going
πŸ”€ Model hot-swap Switch the compression LLM mid-session without a restart (5 curated models, or any HF model ID)
🎯 Embedder hot-swap Switch the scoring embedder with per-model trade-off info (speed vs quality vs RAM)
πŸ‘πŸ‘Ž Feedback capture Rate every result, add an optional text note β€” saved instantly to SQLite
πŸ“œ Run history Every compression persisted locally with full metrics and configurable column visibility
πŸ”Ž Side-by-side diff Word-level colour diff β€” dropped (red), rewritten (amber), inserted (green), unchanged (plain)

Models

Role Default Alternatives
Compression LLM Qwen/Qwen2.5-1.5B-Instruct Qwen2.5-0.5B, SmolLM2-1.7B, Phi-3.5-mini, Llama-3.2-1B
Quality scorer sentence-transformers/all-MiniLM-L6-v2 mpnet-base, bge-small, bge-base, mxbai-large, gte-Qwen2-1.5B

All models are open-weight and under 32B. Everything runs locally β€” no API calls, no data leaves your machine.


Get started

python -m venv .venv
# Windows
.venv\Scripts\activate
# macOS / Linux
source .venv/bin/activate

pip install -r requirements.txt
python app.py

Open http://localhost:7860. That's it.

Run it in Colab: open tinypress_colab.ipynb β€” it installs dependencies, loads the models, and launches a public Gradio share URL. GPU runtime recommended for faster inference.

Optional environment overrides:

Variable Default Description
LLM_MODEL Qwen/Qwen2.5-1.5B-Instruct Compression model
EMBEDDER_MODEL sentence-transformers/all-MiniLM-L6-v2 Scoring embedder
DB_PATH tinypress.db SQLite database path
PORT 7860 Gradio server port

Hardware

Minimum Recommended
RAM 8 GB 16 GB
VRAM CPU-only works 4 GB GPU speeds up inference
Disk ~4 GB ~4 GB

Architecture

Input text + token budget
        β”‚
  core/compressor.py     β€” builds prompt, calls LLM, hard-trims if it overshoots
        β”‚
  models/model_loader.py β€” Qwen2.5-1.5B (or swapped model), loaded once, reused
        β”‚
  core/scorer.py         β€” cosine similarity via sentence-transformer embedder
        β”‚
  db/store.py            β€” saves run to SQLite
        β”‚
  ui/compress_tab.py     β€” shows result, metrics, feedback UI

Thin UI layer β€” Gradio handlers pass inputs to core/, return outputs. All logic lives in core/ and db/.

Full docs: Architecture Β· Setup Β· Get Started Β· Folder Structure


About

Built by Sriharsha C R β€” AI Engineer and Cloud Native developer.

LinkedIn X / Twitter HuggingFace GitHub