Spaces:

Luigi
/

tiny-scribe

Running

App Files Files Community

tiny-scribe / AGENTS.md

Luigi

docs: update AGENTS.md guidelines and add comprehensive UI/UX implementation plan

9d88146 about 1 month ago

preview code

raw

history blame contribute delete

5.37 kB

AGENTS.md - Tiny Scribe Project Guidelines

Project Overview

Tiny Scribe is a Python CLI tool and Gradio web app for summarizing transcripts using GGUF models (e.g., ERNIE, Qwen, Granite) with llama-cpp-python. It supports live streaming output and bilingual summaries (English or Traditional Chinese zh-TW) via OpenCC.

Build / Lint / Test Commands

Run the CLI script:

python summarize_transcript.py -i ./transcripts/short.txt              # Default English output
python summarize_transcript.py -i ./transcripts/short.txt -l zh-TW    # Traditional Chinese output
python summarize_transcript.py -m unsloth/Qwen3-1.7B-GGUF:Q2_K_L
python summarize_transcript.py -c  # CPU only

Run the Gradio web app:

python app.py  # Starts on port 7860

Linting (if ruff installed):

ruff check .
ruff format .            # Auto-format code

Type checking (if mypy installed):

mypy summarize_transcript.py
mypy app.py

Running tests (root project tests):

# Run all root tests
python test_e2e.py
python test_advanced_mode.py
python test_lfm2_extract.py

# Run single test with pytest
pytest test_e2e.py -v                          # Run all tests in file
pytest test_e2e.py::test_e2e -v               # Run specific function
pytest test_advanced_mode.py -k "test_name"    # Run by name pattern

llama-cpp-python submodule tests:

cd llama-cpp-python && pip install ".[test]" && pytest tests/test_llama.py -v

# Run specific test
cd llama-cpp-python && pytest tests/test_llama.py::test_function_name -v

Code Style Guidelines

Formatting:

4 spaces indentation, 100 char max line length, double quotes for docstrings
Two blank lines before functions, one after docstrings

Imports (ordered):

# Standard library
import os
from typing import Tuple, Optional, Generator

# Third-party packages
from llama_cpp import Llama
import gradio as gr

# Local modules
from meeting_summarizer.trace import Tracer

Type Hints:

Use type hints for params/returns
Optional[] for nullable types, Generator[str, None, None] for generators
Example: def load_model(repo_id: str, filename: str) -> Llama:

Naming Conventions:

snake_case for functions/variables, CamelCase for classes, UPPER_CASE for constants
Descriptive names: stream_summarize_transcript, not summ

Error Handling:

Use explicit error messages with f-strings, check file existence before operations
Use try/except for external API calls (Hugging Face, model loading)
Log errors with context for debugging

Dependencies

Required:

llama-cpp-python>=0.3.0 - Core inference engine (installed from llama-cpp-python submodule)
gradio>=5.0.0 - Web UI framework
gradio_huggingfacehub_search>=0.0.12 - HuggingFace model search component
huggingface-hub>=0.23.0 - Model downloading
opencc-python-reimplemented>=0.1.7 - Chinese text conversion
numpy>=1.24.0 - Numerical operations for embeddings

Development (optional):

pytest>=7.4.0 - Testing framework
ruff - Linting and formatting
mypy - Type checking

Project Structure

tiny-scribe/
├── summarize_transcript.py    # Main CLI script
├── app.py                     # Gradio web app
├── requirements.txt           # Python dependencies
├── transcripts/               # Input transcript files
├── test_e2e.py               # E2E test
├── test_advanced_mode.py     # Advanced mode test
├── test_lfm2_extract.py      # LFM2 extraction test
├── meeting_summarizer/       # Core summarization module
│   ├── __init__.py
│   ├── trace.py             # Tracing/logging utilities
│   └── extraction.py        # Extraction and deduplication logic
├── llama-cpp-python/          # Git submodule
└── README.md                  # Project documentation

Usage Patterns

Model Loading:

llm = Llama.from_pretrained(
    repo_id="unsloth/Qwen3-0.6B-GGUF",
    filename="*Q4_0.gguf",
    n_gpu_layers=-1,  # -1 for all GPU, 0 for CPU
    n_ctx=32768,      # Context window size
    verbose=False,    # Cleaner output
)

Inference Settings:

Extraction models: Low temp (0.1-0.3) for deterministic JSON
Synthesis models: Higher temp (0.7-0.9) for creative summaries
Reasoning types: Non-reasoning (hide checkbox), Hybrid (toggleable), Thinking-only (always on)

Environment & GPU:

DEFAULT_N_THREADS=2          # CPU threads (1-32)
N_GPU_LAYERS=0              # 0=CPU, -1=all GPU
HF_HUB_DOWNLOAD_TIMEOUT=300  # Download timeout (seconds)

GPU offload detection: from llama_cpp import llama_supports_gpu_offload

Notes for AI Agents

Always call llm.reset() after completion to ensure state isolation
Model format: repo_id:quant (e.g., unsloth/Qwen3-1.7B-GGUF:Q2_K_L)
Default language output is English (zh-TW available via -l zh-TW or web UI)
OpenCC conversion only applied when output_language is "zh-TW"
HuggingFace cache at ~/.cache/huggingface/hub/ - clean periodically
HF Spaces runs on CPU tier with 2 vCPUs, 16GB RAM
Keep model sizes under 4GB for reasonable performance on free tier
Tests exist in root (test_e2e.py, test_advanced_mode.py, test_lfm2_extract.py)
Submodule tests in llama-cpp-python/tests/