Spaces:

build-small-hackathon
/

tiny-press

Running on Zero

App Files Files Community

tiny-press / README.md

sriharsha-cr

pr/2 (#2)

6ea3105 3 days ago

5.35 kB

title: Tiny Press
emoji: 📊
colorFrom: indigo
colorTo: gray
sdk: gradio
sdk_version: 6.18.0
python_version: '3.12'
app_file: app.py
pinned: false
license: mit
short_description: Compress any text to a token budget locally.
models:
  - Qwen/Qwen2.5-1.5B-Instruct
tags:
  - gradio
  - build-small-hackathon
  - thousand-token-wood
  - text-compression
  - prompt-optimization
  - local-inference

TinyPress — Prompt Compression Engine

HuggingFace Build Small Hackathon · Track: Thousand Token Wood

The constraint is the feature. Give TinyPress a long piece of text, set a token budget, and get back a compressed version that still carries the meaning — scored, saved, and diffed so you can see exactly what was kept and what was shed.

No cloud. No API bill. Two small models running quietly on your machine.

Demo

Why this fits Thousand Token Wood

Working inside a tight token budget is not a limitation to work around — it is the problem worth solving. LLM context windows are finite, prompt costs are real, and bloated inputs degrade output quality. TinyPress treats the token count as a hard constraint and makes compression the primary interaction: you set the budget, the model meets it, and a quality score tells you how much meaning survived.

Features


🗜️ Token-budget compression	Set a target (100–1000 tokens) and compress to exactly that budget
📊 Quality score	Cosine similarity between original and compressed text — 0 to 1, higher is better
🟢🔴 Live readiness banner	Green when input is over budget and compression will run; red when already within budget
🔍 Token highlight panel	Every token rendered as a colour-coded chip so you can see where your budget is going
🔀 Model hot-swap	Switch the compression LLM mid-session without a restart (5 curated models, or any HF model ID)
🎯 Embedder hot-swap	Switch the scoring embedder with per-model trade-off info (speed vs quality vs RAM)
👍👎 Feedback capture	Rate every result, add an optional text note — saved instantly to SQLite
📜 Run history	Every compression persisted locally with full metrics and configurable column visibility
🔎 Side-by-side diff	Word-level colour diff — dropped (red), rewritten (amber), inserted (green), unchanged (plain)

Models

Role	Default	Alternatives
Compression LLM	`Qwen/Qwen2.5-1.5B-Instruct`	Qwen2.5-0.5B, SmolLM2-1.7B, Phi-3.5-mini, Llama-3.2-1B
Quality scorer	`sentence-transformers/all-MiniLM-L6-v2`	mpnet-base, bge-small, bge-base, mxbai-large, gte-Qwen2-1.5B

All models are open-weight and under 32B. Everything runs locally — no API calls, no data leaves your machine.

Get started

python -m venv .venv
# Windows
.venv\Scripts\activate
# macOS / Linux
source .venv/bin/activate

pip install -r requirements.txt
python app.py

Open http://localhost:7860. That's it.

Run it in Colab: open tinypress_colab.ipynb — it installs dependencies, loads the models, and launches a public Gradio share URL. GPU runtime recommended for faster inference.

Optional environment overrides:

Variable	Default	Description
`LLM_MODEL`	`Qwen/Qwen2.5-1.5B-Instruct`	Compression model
`EMBEDDER_MODEL`	`sentence-transformers/all-MiniLM-L6-v2`	Scoring embedder
`DB_PATH`	`tinypress.db`	SQLite database path
`PORT`	`7860`	Gradio server port

Hardware

	Minimum	Recommended
RAM	8 GB	16 GB
VRAM	CPU-only works	4 GB GPU speeds up inference
Disk	~4 GB	~4 GB

Architecture

Input text + token budget
        │
  core/compressor.py     — builds prompt, calls LLM, hard-trims if it overshoots
        │
  models/model_loader.py — Qwen2.5-1.5B (or swapped model), loaded once, reused
        │
  core/scorer.py         — cosine similarity via sentence-transformer embedder
        │
  db/store.py            — saves run to SQLite
        │
  ui/compress_tab.py     — shows result, metrics, feedback UI

Thin UI layer — Gradio handlers pass inputs to core/, return outputs. All logic lives in core/ and db/.

Full docs: Architecture · Setup · Get Started · Folder Structure

About

Built by Sriharsha C R — AI Engineer and Cloud Native developer.