tiny-scribe / GEMINI.md
Luigi's picture
Improve UI/UX: Modern glassmorphism design, added Paste Text tab, and optimized visual hierarchy
f21446b

Tiny Scribe - Project Context

Project Overview

Tiny Scribe is a lightweight, local LLM-powered transcript summarization tool. It is designed to run efficiently on standard hardware (including free CPU tiers on HuggingFace Spaces) using GGUF quantized models.

The project features a web interface (Gradio) and a CLI tool, supporting over 24 models ranging from 100M to 30B parameters. It includes specialized features like live streaming, reasoning mode (thinking) for supported models, and dual-language output (English/Traditional Chinese).

Tech Stack

  • Language: Python 3.10+
  • UI Framework: Gradio (Web), argparse (CLI)
  • Inference Engine: llama-cpp-python (Python bindings for llama.cpp)
  • Model Format: GGUF (Quantized)
  • Containerization: Docker (optimized for HuggingFace Spaces)
  • Utilities: opencc (Chinese conversion), huggingface_hub

Key Files & Directories

  • app.py: The main entry point for the Gradio web application. Contains the UI layout, model loading logic, and generation pipeline.
  • summarize_transcript.py: Command-line interface for batch processing or local summarization without the web UI.
  • Dockerfile: Defines the build environment. Crucial: It installs a specific pre-compiled wheel for llama-cpp-python to ensure compatibility and performance on HF Spaces (Free CPU tier).
  • deploy.sh: Helper script to stage, commit, and push changes to the HuggingFace Space. Enforces non-generic commit messages.
  • requirements.txt: Python dependencies (excluding llama-cpp-python which is handled specially in Docker).
  • transcripts/: Directory for storing input transcript files.
  • AGENTS.md / CLAUDE.md: Existing context files for other AI assistants.

Build & Run Instructions

1. Installation

The project relies on llama-cpp-python. For local development, you must install it separately, as it's not in requirements.txt to avoid build errors on systems without compilers.

# Install general dependencies
pip install -r requirements.txt

# Install llama-cpp-python (with CUDA support if available, otherwise CPU)
# See: https://github.com/abetlen/llama-cpp-python#installation
pip install llama-cpp-python

2. Running the Web UI

python app.py
# Access at http://localhost:7860

3. Running the CLI

# Basic English summary
python summarize_transcript.py -i transcripts/your_file.txt

# Traditional Chinese output
python summarize_transcript.py -i transcripts/your_file.txt -l zh-TW

# Use a specific model
python summarize_transcript.py -i transcripts/your_file.txt -m "unsloth/Qwen3-1.7B-GGUF"

4. Deployment (HuggingFace Spaces)

Always use the provided script to ensure clean commits and deployment:

./deploy.sh "Your descriptive commit message"

Model Architecture & Categories

The project categorizes models to help users balance speed vs. quality:

  • Tiny (0.1-0.6B): Extremely fast, good for simple formatting (e.g., Qwen3-0.6B).
  • Compact (1.5-2.6B): Good balance for free tier (e.g., Granite-3.1-1B, Qwen3-1.7B).
  • Standard (3-7B): Higher quality, slower on CPU (e.g., Llama-3-8B variants).
  • Medium (21-30B): High performance, requires significant RAM (e.g., Command R, Qwen-30B).

Development Conventions

  • Dependency Management: llama-cpp-python is pinned in the Dockerfile via a custom wheel URL. Do not add it to requirements.txt unless you are changing the build strategy.
  • Code Style: The project uses ruff for linting.
  • Git: Use deploy.sh to push. Avoid generic commit messages like "update" or "fix".
  • Environment: The app is optimized for Linux/Docker environments. Local Windows development may require extra setup for llama-cpp-python compilation.