Spaces:

Luigi
/

tiny-scribe

Running

App Files Files Community

tiny-scribe / GEMINI.md

Luigi

Improve UI/UX: Modern glassmorphism design, added Paste Text tab, and optimized visual hierarchy

f21446b 2 months ago

preview code

raw

history blame contribute delete

3.83 kB

	# Tiny Scribe - Project Context

	## Project Overview
	Tiny Scribe is a lightweight, local LLM-powered transcript summarization tool. It is designed to run efficiently on standard hardware (including free CPU tiers on HuggingFace Spaces) using GGUF quantized models.

	The project features a web interface (Gradio) and a CLI tool, supporting over 24 models ranging from 100M to 30B parameters. It includes specialized features like live streaming, reasoning mode (thinking) for supported models, and dual-language output (English/Traditional Chinese).

	## Tech Stack
	* Language: Python 3.10+
	* UI Framework: Gradio (Web), `argparse` (CLI)
	* Inference Engine: `llama-cpp-python` (Python bindings for `llama.cpp`)
	* Model Format: GGUF (Quantized)
	* Containerization: Docker (optimized for HuggingFace Spaces)
	* Utilities: `opencc` (Chinese conversion), `huggingface_hub`

	## Key Files & Directories
	* `app.py`: The main entry point for the Gradio web application. Contains the UI layout, model loading logic, and generation pipeline.
	* `summarize_transcript.py`: Command-line interface for batch processing or local summarization without the web UI.
	* `Dockerfile`: Defines the build environment. Crucial: It installs a specific pre-compiled wheel for `llama-cpp-python` to ensure compatibility and performance on HF Spaces (Free CPU tier).
	* `deploy.sh`: Helper script to stage, commit, and push changes to the HuggingFace Space. Enforces non-generic commit messages.
	* `requirements.txt`: Python dependencies (excluding `llama-cpp-python` which is handled specially in Docker).
	* `transcripts/`: Directory for storing input transcript files.
	* `AGENTS.md` / `CLAUDE.md`: Existing context files for other AI assistants.

	## Build & Run Instructions

	### 1. Installation
	The project relies on `llama-cpp-python`. For local development, you must install it separately, as it's not in `requirements.txt` to avoid build errors on systems without compilers.

	```bash
	# Install general dependencies
	pip install -r requirements.txt

	# Install llama-cpp-python (with CUDA support if available, otherwise CPU)
	# See: https://github.com/abetlen/llama-cpp-python#installation
	pip install llama-cpp-python
	```

	### 2. Running the Web UI
	```bash
	python app.py
	# Access at http://localhost:7860
	```

	### 3. Running the CLI
	```bash
	# Basic English summary
	python summarize_transcript.py -i transcripts/your_file.txt

	# Traditional Chinese output
	python summarize_transcript.py -i transcripts/your_file.txt -l zh-TW

	# Use a specific model
	python summarize_transcript.py -i transcripts/your_file.txt -m "unsloth/Qwen3-1.7B-GGUF"
	```

	### 4. Deployment (HuggingFace Spaces)
	Always use the provided script to ensure clean commits and deployment:
	```bash
	./deploy.sh "Your descriptive commit message"
	```

	## Model Architecture & Categories
	The project categorizes models to help users balance speed vs. quality:
	* Tiny (0.1-0.6B): Extremely fast, good for simple formatting (e.g., Qwen3-0.6B).
	* Compact (1.5-2.6B): Good balance for free tier (e.g., Granite-3.1-1B, Qwen3-1.7B).
	* Standard (3-7B): Higher quality, slower on CPU (e.g., Llama-3-8B variants).
	* Medium (21-30B): High performance, requires significant RAM (e.g., Command R, Qwen-30B).

	## Development Conventions
	* Dependency Management: `llama-cpp-python` is pinned in the `Dockerfile` via a custom wheel URL. Do not add it to `requirements.txt` unless you are changing the build strategy.
	* Code Style: The project uses `ruff` for linting.
	* Git: Use `deploy.sh` to push. Avoid generic commit messages like "update" or "fix".
	* Environment: The app is optimized for Linux/Docker environments. Local Windows development may require extra setup for `llama-cpp-python` compilation.