Spaces:
Running
Tiny Scribe - Project Context
Project Overview
Tiny Scribe is a lightweight, local LLM-powered transcript summarization tool. It is designed to run efficiently on standard hardware (including free CPU tiers on HuggingFace Spaces) using GGUF quantized models.
The project features a web interface (Gradio) and a CLI tool, supporting over 24 models ranging from 100M to 30B parameters. It includes specialized features like live streaming, reasoning mode (thinking) for supported models, and dual-language output (English/Traditional Chinese).
Tech Stack
- Language: Python 3.10+
- UI Framework: Gradio (Web),
argparse(CLI) - Inference Engine:
llama-cpp-python(Python bindings forllama.cpp) - Model Format: GGUF (Quantized)
- Containerization: Docker (optimized for HuggingFace Spaces)
- Utilities:
opencc(Chinese conversion),huggingface_hub
Key Files & Directories
app.py: The main entry point for the Gradio web application. Contains the UI layout, model loading logic, and generation pipeline.summarize_transcript.py: Command-line interface for batch processing or local summarization without the web UI.Dockerfile: Defines the build environment. Crucial: It installs a specific pre-compiled wheel forllama-cpp-pythonto ensure compatibility and performance on HF Spaces (Free CPU tier).deploy.sh: Helper script to stage, commit, and push changes to the HuggingFace Space. Enforces non-generic commit messages.requirements.txt: Python dependencies (excludingllama-cpp-pythonwhich is handled specially in Docker).transcripts/: Directory for storing input transcript files.AGENTS.md/CLAUDE.md: Existing context files for other AI assistants.
Build & Run Instructions
1. Installation
The project relies on llama-cpp-python. For local development, you must install it separately, as it's not in requirements.txt to avoid build errors on systems without compilers.
# Install general dependencies
pip install -r requirements.txt
# Install llama-cpp-python (with CUDA support if available, otherwise CPU)
# See: https://github.com/abetlen/llama-cpp-python#installation
pip install llama-cpp-python
2. Running the Web UI
python app.py
# Access at http://localhost:7860
3. Running the CLI
# Basic English summary
python summarize_transcript.py -i transcripts/your_file.txt
# Traditional Chinese output
python summarize_transcript.py -i transcripts/your_file.txt -l zh-TW
# Use a specific model
python summarize_transcript.py -i transcripts/your_file.txt -m "unsloth/Qwen3-1.7B-GGUF"
4. Deployment (HuggingFace Spaces)
Always use the provided script to ensure clean commits and deployment:
./deploy.sh "Your descriptive commit message"
Model Architecture & Categories
The project categorizes models to help users balance speed vs. quality:
- Tiny (0.1-0.6B): Extremely fast, good for simple formatting (e.g., Qwen3-0.6B).
- Compact (1.5-2.6B): Good balance for free tier (e.g., Granite-3.1-1B, Qwen3-1.7B).
- Standard (3-7B): Higher quality, slower on CPU (e.g., Llama-3-8B variants).
- Medium (21-30B): High performance, requires significant RAM (e.g., Command R, Qwen-30B).
Development Conventions
- Dependency Management:
llama-cpp-pythonis pinned in theDockerfilevia a custom wheel URL. Do not add it torequirements.txtunless you are changing the build strategy. - Code Style: The project uses
rufffor linting. - Git: Use
deploy.shto push. Avoid generic commit messages like "update" or "fix". - Environment: The app is optimized for Linux/Docker environments. Local Windows development may require extra setup for
llama-cpp-pythoncompilation.