# Tiny Scribe - Project Context ## Project Overview **Tiny Scribe** is a lightweight, local LLM-powered transcript summarization tool. It is designed to run efficiently on standard hardware (including free CPU tiers on HuggingFace Spaces) using GGUF quantized models. The project features a web interface (Gradio) and a CLI tool, supporting over 24 models ranging from 100M to 30B parameters. It includes specialized features like live streaming, reasoning mode (thinking) for supported models, and dual-language output (English/Traditional Chinese). ## Tech Stack * **Language:** Python 3.10+ * **UI Framework:** Gradio (Web), `argparse` (CLI) * **Inference Engine:** `llama-cpp-python` (Python bindings for `llama.cpp`) * **Model Format:** GGUF (Quantized) * **Containerization:** Docker (optimized for HuggingFace Spaces) * **Utilities:** `opencc` (Chinese conversion), `huggingface_hub` ## Key Files & Directories * `app.py`: The main entry point for the Gradio web application. Contains the UI layout, model loading logic, and generation pipeline. * `summarize_transcript.py`: Command-line interface for batch processing or local summarization without the web UI. * `Dockerfile`: Defines the build environment. **Crucial:** It installs a specific pre-compiled wheel for `llama-cpp-python` to ensure compatibility and performance on HF Spaces (Free CPU tier). * `deploy.sh`: Helper script to stage, commit, and push changes to the HuggingFace Space. Enforces non-generic commit messages. * `requirements.txt`: Python dependencies (excluding `llama-cpp-python` which is handled specially in Docker). * `transcripts/`: Directory for storing input transcript files. * `AGENTS.md` / `CLAUDE.md`: Existing context files for other AI assistants. ## Build & Run Instructions ### 1. Installation The project relies on `llama-cpp-python`. For local development, you must install it separately, as it's not in `requirements.txt` to avoid build errors on systems without compilers. ```bash # Install general dependencies pip install -r requirements.txt # Install llama-cpp-python (with CUDA support if available, otherwise CPU) # See: https://github.com/abetlen/llama-cpp-python#installation pip install llama-cpp-python ``` ### 2. Running the Web UI ```bash python app.py # Access at http://localhost:7860 ``` ### 3. Running the CLI ```bash # Basic English summary python summarize_transcript.py -i transcripts/your_file.txt # Traditional Chinese output python summarize_transcript.py -i transcripts/your_file.txt -l zh-TW # Use a specific model python summarize_transcript.py -i transcripts/your_file.txt -m "unsloth/Qwen3-1.7B-GGUF" ``` ### 4. Deployment (HuggingFace Spaces) Always use the provided script to ensure clean commits and deployment: ```bash ./deploy.sh "Your descriptive commit message" ``` ## Model Architecture & Categories The project categorizes models to help users balance speed vs. quality: * **Tiny (0.1-0.6B):** Extremely fast, good for simple formatting (e.g., Qwen3-0.6B). * **Compact (1.5-2.6B):** Good balance for free tier (e.g., Granite-3.1-1B, Qwen3-1.7B). * **Standard (3-7B):** Higher quality, slower on CPU (e.g., Llama-3-8B variants). * **Medium (21-30B):** High performance, requires significant RAM (e.g., Command R, Qwen-30B). ## Development Conventions * **Dependency Management:** `llama-cpp-python` is pinned in the `Dockerfile` via a custom wheel URL. Do not add it to `requirements.txt` unless you are changing the build strategy. * **Code Style:** The project uses `ruff` for linting. * **Git:** Use `deploy.sh` to push. Avoid generic commit messages like "update" or "fix". * **Environment:** The app is optimized for Linux/Docker environments. Local Windows development may require extra setup for `llama-cpp-python` compilation.