# Tiny Scribe - Project Context

## Project Overview
**Tiny Scribe** is a lightweight, local LLM-powered transcript summarization tool. It is designed to run efficiently on standard hardware (including free CPU tiers on HuggingFace Spaces) using GGUF quantized models.

The project features a web interface (Gradio) and a CLI tool, supporting over 24 models ranging from 100M to 30B parameters. It includes specialized features like live streaming, reasoning mode (thinking) for supported models, and dual-language output (English/Traditional Chinese).

## Tech Stack
*   **Language:** Python 3.10+
*   **UI Framework:** Gradio (Web), `argparse` (CLI)
*   **Inference Engine:** `llama-cpp-python` (Python bindings for `llama.cpp`)
*   **Model Format:** GGUF (Quantized)
*   **Containerization:** Docker (optimized for HuggingFace Spaces)
*   **Utilities:** `opencc` (Chinese conversion), `huggingface_hub`

## Key Files & Directories
*   `app.py`: The main entry point for the Gradio web application. Contains the UI layout, model loading logic, and generation pipeline.
*   `summarize_transcript.py`: Command-line interface for batch processing or local summarization without the web UI.
*   `Dockerfile`: Defines the build environment. **Crucial:** It installs a specific pre-compiled wheel for `llama-cpp-python` to ensure compatibility and performance on HF Spaces (Free CPU tier).
*   `deploy.sh`: Helper script to stage, commit, and push changes to the HuggingFace Space. Enforces non-generic commit messages.
*   `requirements.txt`: Python dependencies (excluding `llama-cpp-python` which is handled specially in Docker).
*   `transcripts/`: Directory for storing input transcript files.
*   `AGENTS.md` / `CLAUDE.md`: Existing context files for other AI assistants.

## Build & Run Instructions

### 1. Installation
The project relies on `llama-cpp-python`. For local development, you must install it separately, as it's not in `requirements.txt` to avoid build errors on systems without compilers.

```bash
# Install general dependencies
pip install -r requirements.txt

# Install llama-cpp-python (with CUDA support if available, otherwise CPU)
# See: https://github.com/abetlen/llama-cpp-python#installation
pip install llama-cpp-python
```

### 2. Running the Web UI
```bash
python app.py
# Access at http://localhost:7860
```

### 3. Running the CLI
```bash
# Basic English summary
python summarize_transcript.py -i transcripts/your_file.txt

# Traditional Chinese output
python summarize_transcript.py -i transcripts/your_file.txt -l zh-TW

# Use a specific model
python summarize_transcript.py -i transcripts/your_file.txt -m "unsloth/Qwen3-1.7B-GGUF"
```

### 4. Deployment (HuggingFace Spaces)
Always use the provided script to ensure clean commits and deployment:
```bash
./deploy.sh "Your descriptive commit message"
```

## Model Architecture & Categories
The project categorizes models to help users balance speed vs. quality:
*   **Tiny (0.1-0.6B):** Extremely fast, good for simple formatting (e.g., Qwen3-0.6B).
*   **Compact (1.5-2.6B):** Good balance for free tier (e.g., Granite-3.1-1B, Qwen3-1.7B).
*   **Standard (3-7B):** Higher quality, slower on CPU (e.g., Llama-3-8B variants).
*   **Medium (21-30B):** High performance, requires significant RAM (e.g., Command R, Qwen-30B).

## Development Conventions
*   **Dependency Management:** `llama-cpp-python` is pinned in the `Dockerfile` via a custom wheel URL. Do not add it to `requirements.txt` unless you are changing the build strategy.
*   **Code Style:** The project uses `ruff` for linting.
*   **Git:** Use `deploy.sh` to push. Avoid generic commit messages like "update" or "fix".
*   **Environment:** The app is optimized for Linux/Docker environments. Local Windows development may require extra setup for `llama-cpp-python` compilation.