{ "cells": [ { "cell_type": "markdown", "id": "cell-title", "metadata": { "id": "cell-title" }, "source": [ "# TinyPress โ Prompt Compression Engine\n", "\n", "**HuggingFace Build Small Hackathon ยท Track: Thousand Token Wood**\n", "\n", "| Layer | Detail |\n", "|-------|--------|\n", "| Compression | `Qwen/Qwen2.5-1.5B-Instruct` (default, switchable) |\n", "| Scoring | `sentence-transformers/all-MiniLM-L6-v2` (default, switchable) |\n", "| UI | Gradio 5 โ public share URL |\n", "| Storage | SQLite at `/content/tinypress.db` |\n", "\n", "**Features**\n", "- Compress text to a user-defined token budget\n", "- Live ๐ด / ๐ข compression readiness banner\n", "- Per-token colour highlight panel (toggle on/off)\n", "- Dynamic compression model switching (5 curated <32B models)\n", "- Dynamic scoring embedder switching (6 models, with per-model impact info)\n", "- ๐ / ๐ feedback on every compression result, with optional text comment\n", "- Compression run history persisted to SQLite\n", "- Column picker in History tab โ compact default view, expandable to all fields\n", "- Per-row delete in history\n", "- Side-by-side word-level diff viewer with feedback badge and token detail\n", "\n", "> **Recommended runtime:** GPU โ Runtime โ Change runtime type โ T4 GPU\n", "\n", "---\n", "\n", "### About the author\n", "\n", "Built by **Sriharsha C R** โ AI Engineer, Cloud Native developer, and knowledge sharer.\n", "If this was useful, feel free to connect โ always happy to chat about AI, LLMs, or anything in between.\n", "\n", "[](https://www.linkedin.com/in/sriharsha-cr)\n", "[](https://x.com/sriharsha_cr)\n", "[](https://huggingface.co/sriharsha-cr)\n", "[](https://github.com/SriharshaCR)" ] }, { "cell_type": "markdown", "id": "cell-s1-hdr", "metadata": { "id": "cell-s1-hdr" }, "source": [ "## Step 1 โ Install dependencies" ] }, { "cell_type": "code", "execution_count": null, "id": "cell-install", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "collapsed": true, "id": "cell-install", "outputId": "cb76788b-9d5a-4b8a-b107-3d21e06c1bfc" }, "outputs": [], "source": [ "!pip install -q \\\n", " \"gradio==5.0\" \\\n", " \"transformers>=4.40.0\" \\\n", " \"sentence-transformers>=3.0.0\" \\\n", " \"torch>=2.2.0\" \\\n", " \"numpy>=1.26.0\" \\\n", " \"pandas>=2.0.0\" \\\n", " \"accelerate>=0.30.0\" \\\n", " \"huggingface_hub==0.25.2\"" ] }, { "cell_type": "markdown", "id": "cell-s2-hdr", "metadata": { "id": "cell-s2-hdr" }, "source": [ "## Step 2 โ Runtime check" ] }, { "cell_type": "code", "execution_count": null, "id": "cell-runtime", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "cell-runtime", "outputId": "0d5544c5-650d-4aa5-8640-fdd0170f1d6e" }, "outputs": [], "source": [ "import torch\n", "\n", "device = 'cuda' if torch.cuda.is_available() else 'cpu'\n", "dtype = torch.float16 if device == 'cuda' else torch.float32\n", "\n", "print(f'Device : {device}')\n", "if device == 'cuda':\n", " print(f'GPU : {torch.cuda.get_device_name(0)}')\n", " print(f'VRAM : {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB')\n", "print(f'dtype : {dtype}')" ] }, { "cell_type": "markdown", "id": "cell-s3-hdr", "metadata": { "id": "cell-s3-hdr" }, "source": [ "## Step 3 โ Configuration" ] }, { "cell_type": "code", "execution_count": 3, "id": "T1fBTJIxdWHP", "metadata": { "id": "T1fBTJIxdWHP" }, "outputs": [], "source": [ "# Curated <32B open-weight causal LMs for local / Colab inference.\n", "AVAILABLE_MODELS = [\n", " 'Qwen/Qwen2.5-1.5B-Instruct',\n", " 'Qwen/Qwen2.5-0.5B-Instruct',\n", " 'HuggingFaceTB/SmolLM2-1.7B-Instruct',\n", " 'microsoft/Phi-3.5-mini-instruct',\n", " 'meta-llama/Llama-3.2-1B-Instruct',\n", "]\n", "\n", "# Curated sentence-transformer embedding models for quality scoring.\n", "AVAILABLE_EMBEDDER_MODELS = [\n", " 'sentence-transformers/all-MiniLM-L6-v2',\n", " 'sentence-transformers/all-mpnet-base-v2',\n", " 'BAAI/bge-small-en-v1.5',\n", " 'BAAI/bge-base-en-v1.5',\n", " 'mixedbread-ai/mxbai-embed-large-v1',\n", " 'Alibaba-NLP/gte-Qwen2-1.5B-instruct',\n", "]\n", "\n", "EMBEDDER_INFO = {\n", " 'sentence-transformers/all-MiniLM-L6-v2': (\n", " 'โก **Fast ยท 22M params ยท Default** \\n'\n", " 'Great baseline. Scores are reliable for typical compression ratios. '\n", " 'Runs comfortably on CPU โ minimal overhead.'\n", " ),\n", " 'sentence-transformers/all-mpnet-base-v2': (\n", " 'โ๏ธ **Balanced ยท 110M params** \\n'\n", " 'Noticeably sharper quality scores than MiniLM, especially on longer texts. '\n", " 'Small speed trade-off; fine on CPU.'\n", " ),\n", " 'BAAI/bge-small-en-v1.5': (\n", " 'โก **Fast ยท 33M params** \\n'\n", " 'Strong quality-to-size ratio โ often matches MiniLM on accuracy while being '\n", " 'slightly more sensitive to meaning shifts. Good CPU option.'\n", " ),\n", " 'BAAI/bge-base-en-v1.5': (\n", " 'โ๏ธ **Balanced ยท 109M params** \\n'\n", " 'Consistently strong on semantic similarity benchmarks. '\n", " 'Scores will be more discriminating โ small differences in compression quality show up more clearly.'\n", " ),\n", " 'mixedbread-ai/mxbai-embed-large-v1': (\n", " '๐ **High quality ยท 335M params** \\n'\n", " 'Top-tier similarity scores. Quality readings will be the most accurate here, '\n", " 'but slower to load and run. GPU recommended.'\n", " ),\n", " 'Alibaba-NLP/gte-Qwen2-1.5B-instruct': (\n", " '๐ฌ **Best quality ยท 1.5B params** \\n'\n", " 'Strongest semantic understanding in this list. Scores will reflect subtle meaning loss '\n", " 'that smaller models miss. Requires significant RAM/VRAM โ GPU strongly recommended.'\n", " ),\n", "}" ] }, { "cell_type": "code", "execution_count": null, "id": "cell-config", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "cell-config", "outputId": "f98bce01-b9ec-49dc-8f5a-e58d82e1de69" }, "outputs": [], "source": [ "import os\n", "\n", "LLM_MODEL = os.getenv('LLM_MODEL', AVAILABLE_MODELS[1])\n", "EMBEDDER_MODEL = os.getenv('EMBEDDER_MODEL', AVAILABLE_EMBEDDER_MODELS[0])\n", "DB_PATH = os.getenv('DB_PATH', '/content/tinypress.db')\n", "SERVER_PORT = int(os.getenv('PORT', 7860))\n", "\n", "DEFAULT_TARGET_TOKENS = 500\n", "MAX_NEW_TOKENS = 1024\n", "APP_TITLE = 'TinyPress'\n", "\n", "PUBLIC_UI = True\n", "\n", "print(f'LLM : {LLM_MODEL}')\n", "print(f'Embedder : {EMBEDDER_MODEL}')\n", "print(f'DB : {DB_PATH}')" ] }, { "cell_type": "markdown", "id": "cell-s4-hdr", "metadata": { "id": "cell-s4-hdr" }, "source": [ "## Step 4 โ Model loader" ] }, { "cell_type": "code", "execution_count": null, "id": "cell-model-loader", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "cell-model-loader", "outputId": "e6c0f401-e6c9-4eee-d577-8831907a7a36" }, "outputs": [], "source": [ "from transformers import AutoTokenizer, AutoModelForCausalLM\n", "from sentence_transformers import SentenceTransformer\n", "import gc\n", "\n", "_llm = None\n", "_tokenizer = None\n", "_embedder = None\n", "_current_model_id = None\n", "_current_embedder_id = None\n", "\n", "\n", "def get_current_model_id():\n", " return _current_model_id\n", "\n", "\n", "def get_current_tokenizer_id():\n", " # Tokenizer is always loaded from the same HF repo as the model.\n", " return _current_model_id\n", "\n", "\n", "def get_current_embedder_id():\n", " return _current_embedder_id\n", "\n", "\n", "def get_llm():\n", " global _llm, _tokenizer\n", " if _llm is None:\n", " _load_llm(LLM_MODEL)\n", " return _llm, _tokenizer\n", "\n", "\n", "def switch_llm(model_id: str) -> str:\n", " global _current_model_id\n", " if _current_model_id == model_id:\n", " return f'Already using {model_id}'\n", " _unload_llm()\n", " _load_llm(model_id)\n", " return f'Loaded: {model_id}'\n", "\n", "\n", "def _load_llm(model_id: str):\n", " \"\"\"Load model + its paired tokenizer. Both come from the same model_id.\"\"\"\n", " global _llm, _tokenizer, _current_model_id\n", " print(f'Loading LLM: {model_id} ...')\n", " _tokenizer = AutoTokenizer.from_pretrained(model_id)\n", " _llm = AutoModelForCausalLM.from_pretrained(\n", " model_id,\n", " torch_dtype=dtype,\n", " device_map='auto',\n", " )\n", " _llm.eval()\n", " _current_model_id = model_id\n", " print(f'LLM ready: {model_id}')\n", "\n", "\n", "def _unload_llm():\n", " \"\"\"Free GPU/CPU memory before loading a different model.\"\"\"\n", " global _llm, _tokenizer, _current_model_id\n", " del _llm, _tokenizer\n", " _llm = None\n", " _tokenizer = None\n", " _current_model_id = None\n", " gc.collect()\n", " if torch.cuda.is_available():\n", " torch.cuda.empty_cache()\n", "\n", "\n", "def get_embedder():\n", " global _embedder, _current_embedder_id\n", " if _embedder is None:\n", " _load_embedder(EMBEDDER_MODEL)\n", " return _embedder\n", "\n", "\n", "def switch_embedder(model_id: str) -> str:\n", " global _current_embedder_id\n", " if _current_embedder_id == model_id:\n", " return f'Already using {model_id}'\n", " _unload_embedder()\n", " _load_embedder(model_id)\n", " return f'Loaded: {model_id}'\n", "\n", "\n", "def _load_embedder(model_id: str):\n", " global _embedder, _current_embedder_id\n", " print(f'Loading embedder: {model_id} ...')\n", " # Explicitly set device to 'cpu' to avoid ZeroGPU conflicts\n", " _embedder = SentenceTransformer(model_id, device='cpu')\n", " _current_embedder_id = model_id\n", " print(f'Embedder ready: {model_id}')\n", "\n", "\n", "def _unload_embedder():\n", " global _embedder, _current_embedder_id\n", " del _embedder\n", " _embedder = None\n", " _current_embedder_id = None\n", " gc.collect()\n", " if torch.cuda.is_available():\n", " torch.cuda.empty_cache()\n", "\n", "\n", "print('Model loader defined.')" ] }, { "cell_type": "markdown", "id": "cell-s5-hdr", "metadata": { "id": "cell-s5-hdr" }, "source": [ "## Step 5 โ Core pipeline" ] }, { "cell_type": "code", "execution_count": null, "id": "cell-core", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "cell-core", "outputId": "2092a376-eebe-4b70-e600-4d1f684222b4" }, "outputs": [], "source": [ "import numpy as np\n", "\n", "# โโ tokenizer utils โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\n", "\n", "def count_tokens(text: str) -> int:\n", " _, tokenizer = get_llm()\n", " return len(tokenizer.encode(text, add_special_tokens=False))\n", "\n", "\n", "def get_token_strings(text: str) -> list:\n", " \"\"\"Return the decoded surface string for every token in text.\"\"\"\n", " _, tokenizer = get_llm()\n", " ids = tokenizer.encode(text, add_special_tokens=False)\n", " return [tokenizer.decode([i]) for i in ids]\n", "\n", "\n", "# โโ compressor โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\n", "\n", "_PROMPT_TEMPLATE = (\n", " 'You are a lossless compression assistant. '\n", " 'Compress the following text to at most {target} tokens.\\n'\n", " 'Preserve all key facts, decisions, and intent. '\n", " 'Do not add commentary. Output only the compressed text.\\n\\n'\n", " 'TEXT:\\n{text}\\n\\nCOMPRESSED:'\n", ")\n", "\n", "\n", "def _generate(prompt: str) -> str:\n", " model, tokenizer = get_llm()\n", " inputs = tokenizer(prompt, return_tensors='pt').to(model.device)\n", " with torch.no_grad():\n", " output_ids = model.generate(\n", " **inputs,\n", " max_new_tokens=MAX_NEW_TOKENS,\n", " do_sample=False,\n", " pad_token_id=tokenizer.eos_token_id,\n", " )\n", " new_tokens = output_ids[0][inputs['input_ids'].shape[1]:]\n", " return tokenizer.decode(new_tokens, skip_special_tokens=True).strip()\n", "\n", "\n", "def compress(text: str, target_tokens: int) -> tuple:\n", " \"\"\"Returns (compressed_text, input_token_count, output_token_count).\"\"\"\n", " input_tokens = count_tokens(text)\n", " if input_tokens <= target_tokens:\n", " return text, input_tokens, input_tokens\n", "\n", " prompt = _PROMPT_TEMPLATE.format(target=target_tokens, text=text)\n", " compressed = _generate(prompt)\n", "\n", " # Hard-trim if model overshoots.\n", " _, tokenizer = get_llm()\n", " ids = tokenizer.encode(compressed, add_special_tokens=False)\n", " if len(ids) > target_tokens:\n", " compressed = tokenizer.decode(ids[:target_tokens], skip_special_tokens=True)\n", "\n", " output_tokens = count_tokens(compressed)\n", " return compressed, input_tokens, output_tokens\n", "\n", "\n", "# โโ scorer โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ\n", "\n", "def semantic_score(original: str, compressed: str) -> float:\n", " embedder = get_embedder()\n", " vecs = embedder.encode([original, compressed], convert_to_numpy=True)\n", " cos = float(\n", " np.dot(vecs[0], vecs[1]) / (np.linalg.norm(vecs[0]) * np.linalg.norm(vecs[1]))\n", " )\n", " return round(max(0.0, min(1.0, cos)), 4)\n", "\n", "\n", "print('Core pipeline defined.')" ] }, { "cell_type": "markdown", "id": "cell-s6-hdr", "metadata": { "id": "cell-s6-hdr" }, "source": [ "## Step 6 โ Diff renderer" ] }, { "cell_type": "code", "execution_count": null, "id": "cell-diff", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "cell-diff", "outputId": "34230370-85e9-4d36-fd28-1e8542ffa448" }, "outputs": [], "source": [ "import difflib\n", "import html as _h\n", "\n", "\n", "def _word_diff(original: str, compressed: str) -> tuple:\n", " \"\"\"\n", " Word-level SequenceMatcher diff.\n", " Returns (annotated_original_html, annotated_compressed_html).\n", " Colour key:\n", " original โ red strikethrough = dropped\n", " compressed โ amber = rewritten\n", " compressed โ green = inserted\n", " plain = unchanged\n", " \"\"\"\n", " orig_words = original.split()\n", " comp_words = compressed.split()\n", " matcher = difflib.SequenceMatcher(None, orig_words, comp_words, autojunk=False)\n", "\n", " orig_parts, comp_parts = [], []\n", "\n", " for tag, i1, i2, j1, j2 in matcher.get_opcodes():\n", " ow = _h.escape(' '.join(orig_words[i1:i2]))\n", " cw = _h.escape(' '.join(comp_words[j1:j2]))\n", "\n", " if tag == 'equal':\n", " orig_parts.append(ow)\n", " comp_parts.append(cw)\n", "\n", " elif tag == 'delete':\n", " orig_parts.append(\n", " f'{ow}'\n", " )\n", "\n", " elif tag == 'insert':\n", " comp_parts.append(\n", " f'{cw}'\n", " )\n", "\n", " elif tag == 'replace':\n", " orig_parts.append(\n", " f'{ow}'\n", " )\n", " comp_parts.append(\n", " f'{cw}'\n", " )\n", "\n", " return ' '.join(orig_parts), ' '.join(comp_parts)\n", "\n", "\n", "def render_diff_html(record: dict) -> str:\n", " \"\"\"Build a self-contained side-by-side diff HTML block for a compression run.\"\"\"\n", " original = record.get('input_text', '')\n", " compressed = record.get('output_text', '')\n", " if not original or not compressed:\n", " return ''\n", "\n", " orig_html, comp_html = _word_diff(original, compressed)\n", "\n", " model = _h.escape(record.get('model', 'โ'))\n", " tokenizer = _h.escape(record.get('tokenizer', 'โ'))\n", " ts = _h.escape(record.get('timestamp', 'โ'))\n", " in_tok = record.get('input_tokens', 'โ')\n", " out_tok = record.get('output_tokens', 'โ')\n", " target_tok = record.get('target_tokens', 'โ')\n", " ratio = record.get('compression_ratio', 0)\n", " quality = record.get('quality_score', 0)\n", " duration = record.get('duration_ms', 'โ')\n", " run_id = record.get('id', 'โ')\n", "\n", " feedback_val = record.get('feedback')\n", " feedback_note = _h.escape(record.get('feedback_comment') or '')\n", "\n", " # Build optional feedback block\n", " if feedback_val is not None:\n", " badge_bg = '#f0fdf4' if feedback_val == 1 else '#fef2f2'\n", " badge_color = '#15803d' if feedback_val == 1 else '#b91c1c'\n", " badge_text = '๐ Helpful' if feedback_val == 1 else '๐ Not helpful'\n", " feedback_block = (\n", " f'