Spaces:
Running
Running
| # AGENTS.md - Tiny Scribe Project Guidelines | |
| ## Project Overview | |
| Tiny Scribe is a Python CLI tool and Gradio web app for summarizing transcripts using GGUF models (e.g., ERNIE, Qwen, Granite) with llama-cpp-python. It supports live streaming output and bilingual summaries (English or Traditional Chinese zh-TW) via OpenCC. | |
| ## Build / Lint / Test Commands | |
| **Run the CLI script:** | |
| ```bash | |
| python summarize_transcript.py -i ./transcripts/short.txt # Default English output | |
| python summarize_transcript.py -i ./transcripts/short.txt -l zh-TW # Traditional Chinese output | |
| python summarize_transcript.py -m unsloth/Qwen3-1.7B-GGUF:Q2_K_L | |
| python summarize_transcript.py -c # CPU only | |
| ``` | |
| **Run the Gradio web app:** | |
| ```bash | |
| python app.py # Starts on port 7860 | |
| ``` | |
| **Linting (if ruff installed):** | |
| ```bash | |
| ruff check . | |
| ruff format . # Auto-format code | |
| ``` | |
| **Type checking (if mypy installed):** | |
| ```bash | |
| mypy summarize_transcript.py | |
| mypy app.py | |
| ``` | |
| **Running tests (root project tests):** | |
| ```bash | |
| # Run all root tests | |
| python test_e2e.py | |
| python test_advanced_mode.py | |
| python test_lfm2_extract.py | |
| # Run single test with pytest | |
| pytest test_e2e.py -v # Run all tests in file | |
| pytest test_e2e.py::test_e2e -v # Run specific function | |
| pytest test_advanced_mode.py -k "test_name" # Run by name pattern | |
| ``` | |
| **llama-cpp-python submodule tests:** | |
| ```bash | |
| cd llama-cpp-python && pip install ".[test]" && pytest tests/test_llama.py -v | |
| # Run specific test | |
| cd llama-cpp-python && pytest tests/test_llama.py::test_function_name -v | |
| ``` | |
| ## Code Style Guidelines | |
| **Formatting:** | |
| - 4 spaces indentation, 100 char max line length, double quotes for docstrings | |
| - Two blank lines before functions, one after docstrings | |
| **Imports (ordered):** | |
| ```python | |
| # Standard library | |
| import os | |
| from typing import Tuple, Optional, Generator | |
| # Third-party packages | |
| from llama_cpp import Llama | |
| import gradio as gr | |
| # Local modules | |
| from meeting_summarizer.trace import Tracer | |
| ``` | |
| **Type Hints:** | |
| - Use type hints for params/returns | |
| - `Optional[]` for nullable types, `Generator[str, None, None]` for generators | |
| - Example: `def load_model(repo_id: str, filename: str) -> Llama:` | |
| **Naming Conventions:** | |
| - `snake_case` for functions/variables, `CamelCase` for classes, `UPPER_CASE` for constants | |
| - Descriptive names: `stream_summarize_transcript`, not `summ` | |
| **Error Handling:** | |
| - Use explicit error messages with f-strings, check file existence before operations | |
| - Use `try/except` for external API calls (Hugging Face, model loading) | |
| - Log errors with context for debugging | |
| ## Dependencies | |
| **Required:** | |
| - `llama-cpp-python>=0.3.0` - Core inference engine (installed from llama-cpp-python submodule) | |
| - `gradio>=5.0.0` - Web UI framework | |
| - `gradio_huggingfacehub_search>=0.0.12` - HuggingFace model search component | |
| - `huggingface-hub>=0.23.0` - Model downloading | |
| - `opencc-python-reimplemented>=0.1.7` - Chinese text conversion | |
| - `numpy>=1.24.0` - Numerical operations for embeddings | |
| **Development (optional):** | |
| - `pytest>=7.4.0` - Testing framework | |
| - `ruff` - Linting and formatting | |
| - `mypy` - Type checking | |
| ## Project Structure | |
| ``` | |
| tiny-scribe/ | |
| βββ summarize_transcript.py # Main CLI script | |
| βββ app.py # Gradio web app | |
| βββ requirements.txt # Python dependencies | |
| βββ transcripts/ # Input transcript files | |
| βββ test_e2e.py # E2E test | |
| βββ test_advanced_mode.py # Advanced mode test | |
| βββ test_lfm2_extract.py # LFM2 extraction test | |
| βββ meeting_summarizer/ # Core summarization module | |
| β βββ __init__.py | |
| β βββ trace.py # Tracing/logging utilities | |
| β βββ extraction.py # Extraction and deduplication logic | |
| βββ llama-cpp-python/ # Git submodule | |
| βββ README.md # Project documentation | |
| ``` | |
| ## Usage Patterns | |
| **Model Loading:** | |
| ```python | |
| llm = Llama.from_pretrained( | |
| repo_id="unsloth/Qwen3-0.6B-GGUF", | |
| filename="*Q4_0.gguf", | |
| n_gpu_layers=-1, # -1 for all GPU, 0 for CPU | |
| n_ctx=32768, # Context window size | |
| verbose=False, # Cleaner output | |
| ) | |
| ``` | |
| **Inference Settings:** | |
| - Extraction models: Low temp (0.1-0.3) for deterministic JSON | |
| - Synthesis models: Higher temp (0.7-0.9) for creative summaries | |
| - Reasoning types: Non-reasoning (hide checkbox), Hybrid (toggleable), Thinking-only (always on) | |
| **Environment & GPU:** | |
| ```bash | |
| DEFAULT_N_THREADS=2 # CPU threads (1-32) | |
| N_GPU_LAYERS=0 # 0=CPU, -1=all GPU | |
| HF_HUB_DOWNLOAD_TIMEOUT=300 # Download timeout (seconds) | |
| ``` | |
| GPU offload detection: `from llama_cpp import llama_supports_gpu_offload` | |
| ## Notes for AI Agents | |
| - Always call `llm.reset()` after completion to ensure state isolation | |
| - Model format: `repo_id:quant` (e.g., `unsloth/Qwen3-1.7B-GGUF:Q2_K_L`) | |
| - Default language output is English (zh-TW available via `-l zh-TW` or web UI) | |
| - OpenCC conversion only applied when output_language is "zh-TW" | |
| - HuggingFace cache at `~/.cache/huggingface/hub/` - clean periodically | |
| - HF Spaces runs on CPU tier with 2 vCPUs, 16GB RAM | |
| - Keep model sizes under 4GB for reasonable performance on free tier | |
| - Tests exist in root (test_e2e.py, test_advanced_mode.py, test_lfm2_extract.py) | |
| - Submodule tests in llama-cpp-python/tests/ | |