tiny-scribe / GEMINI.md
Luigi's picture
Improve UI/UX: Modern glassmorphism design, added Paste Text tab, and optimized visual hierarchy
f21446b
# Tiny Scribe - Project Context
## Project Overview
**Tiny Scribe** is a lightweight, local LLM-powered transcript summarization tool. It is designed to run efficiently on standard hardware (including free CPU tiers on HuggingFace Spaces) using GGUF quantized models.
The project features a web interface (Gradio) and a CLI tool, supporting over 24 models ranging from 100M to 30B parameters. It includes specialized features like live streaming, reasoning mode (thinking) for supported models, and dual-language output (English/Traditional Chinese).
## Tech Stack
* **Language:** Python 3.10+
* **UI Framework:** Gradio (Web), `argparse` (CLI)
* **Inference Engine:** `llama-cpp-python` (Python bindings for `llama.cpp`)
* **Model Format:** GGUF (Quantized)
* **Containerization:** Docker (optimized for HuggingFace Spaces)
* **Utilities:** `opencc` (Chinese conversion), `huggingface_hub`
## Key Files & Directories
* `app.py`: The main entry point for the Gradio web application. Contains the UI layout, model loading logic, and generation pipeline.
* `summarize_transcript.py`: Command-line interface for batch processing or local summarization without the web UI.
* `Dockerfile`: Defines the build environment. **Crucial:** It installs a specific pre-compiled wheel for `llama-cpp-python` to ensure compatibility and performance on HF Spaces (Free CPU tier).
* `deploy.sh`: Helper script to stage, commit, and push changes to the HuggingFace Space. Enforces non-generic commit messages.
* `requirements.txt`: Python dependencies (excluding `llama-cpp-python` which is handled specially in Docker).
* `transcripts/`: Directory for storing input transcript files.
* `AGENTS.md` / `CLAUDE.md`: Existing context files for other AI assistants.
## Build & Run Instructions
### 1. Installation
The project relies on `llama-cpp-python`. For local development, you must install it separately, as it's not in `requirements.txt` to avoid build errors on systems without compilers.
```bash
# Install general dependencies
pip install -r requirements.txt
# Install llama-cpp-python (with CUDA support if available, otherwise CPU)
# See: https://github.com/abetlen/llama-cpp-python#installation
pip install llama-cpp-python
```
### 2. Running the Web UI
```bash
python app.py
# Access at http://localhost:7860
```
### 3. Running the CLI
```bash
# Basic English summary
python summarize_transcript.py -i transcripts/your_file.txt
# Traditional Chinese output
python summarize_transcript.py -i transcripts/your_file.txt -l zh-TW
# Use a specific model
python summarize_transcript.py -i transcripts/your_file.txt -m "unsloth/Qwen3-1.7B-GGUF"
```
### 4. Deployment (HuggingFace Spaces)
Always use the provided script to ensure clean commits and deployment:
```bash
./deploy.sh "Your descriptive commit message"
```
## Model Architecture & Categories
The project categorizes models to help users balance speed vs. quality:
* **Tiny (0.1-0.6B):** Extremely fast, good for simple formatting (e.g., Qwen3-0.6B).
* **Compact (1.5-2.6B):** Good balance for free tier (e.g., Granite-3.1-1B, Qwen3-1.7B).
* **Standard (3-7B):** Higher quality, slower on CPU (e.g., Llama-3-8B variants).
* **Medium (21-30B):** High performance, requires significant RAM (e.g., Command R, Qwen-30B).
## Development Conventions
* **Dependency Management:** `llama-cpp-python` is pinned in the `Dockerfile` via a custom wheel URL. Do not add it to `requirements.txt` unless you are changing the build strategy.
* **Code Style:** The project uses `ruff` for linting.
* **Git:** Use `deploy.sh` to push. Avoid generic commit messages like "update" or "fix".
* **Environment:** The app is optimized for Linux/Docker environments. Local Windows development may require extra setup for `llama-cpp-python` compilation.