Spaces:

Luigi
/

tiny-scribe

Running

App Files Files Community

tiny-scribe / AGENTS.md

Luigi

docs: update AGENTS.md guidelines and add comprehensive UI/UX implementation plan

9d88146 about 1 month ago

preview code

raw

history blame contribute delete

5.37 kB

	# AGENTS.md - Tiny Scribe Project Guidelines

	## Project Overview

	Tiny Scribe is a Python CLI tool and Gradio web app for summarizing transcripts using GGUF models (e.g., ERNIE, Qwen, Granite) with llama-cpp-python. It supports live streaming output and bilingual summaries (English or Traditional Chinese zh-TW) via OpenCC.

	## Build / Lint / Test Commands

	Run the CLI script:
	```bash
	python summarize_transcript.py -i ./transcripts/short.txt # Default English output
	python summarize_transcript.py -i ./transcripts/short.txt -l zh-TW # Traditional Chinese output
	python summarize_transcript.py -m unsloth/Qwen3-1.7B-GGUF:Q2_K_L
	python summarize_transcript.py -c # CPU only
	```

	Run the Gradio web app:
	```bash
	python app.py # Starts on port 7860
	```

	Linting (if ruff installed):
	```bash
	ruff check .
	ruff format . # Auto-format code
	```

	Type checking (if mypy installed):
	```bash
	mypy summarize_transcript.py
	mypy app.py
	```

	Running tests (root project tests):
	```bash
	# Run all root tests
	python test_e2e.py
	python test_advanced_mode.py
	python test_lfm2_extract.py

	# Run single test with pytest
	pytest test_e2e.py -v # Run all tests in file
	pytest test_e2e.py::test_e2e -v # Run specific function
	pytest test_advanced_mode.py -k "test_name" # Run by name pattern
	```

	llama-cpp-python submodule tests:
	```bash
	cd llama-cpp-python && pip install ".[test]" && pytest tests/test_llama.py -v

	# Run specific test
	cd llama-cpp-python && pytest tests/test_llama.py::test_function_name -v
	```

	## Code Style Guidelines

	Formatting:
	- 4 spaces indentation, 100 char max line length, double quotes for docstrings
	- Two blank lines before functions, one after docstrings

	Imports (ordered):
	```python
	# Standard library
	import os
	from typing import Tuple, Optional, Generator

	# Third-party packages
	from llama_cpp import Llama
	import gradio as gr

	# Local modules
	from meeting_summarizer.trace import Tracer
	```

	Type Hints:
	- Use type hints for params/returns
	- `Optional[]` for nullable types, `Generator[str, None, None]` for generators
	- Example: `def load_model(repo_id: str, filename: str) -> Llama:`

	Naming Conventions:
	- `snake_case` for functions/variables, `CamelCase` for classes, `UPPER_CASE` for constants
	- Descriptive names: `stream_summarize_transcript`, not `summ`

	Error Handling:
	- Use explicit error messages with f-strings, check file existence before operations
	- Use `try/except` for external API calls (Hugging Face, model loading)
	- Log errors with context for debugging

	## Dependencies

	Required:
	- `llama-cpp-python>=0.3.0` - Core inference engine (installed from llama-cpp-python submodule)
	- `gradio>=5.0.0` - Web UI framework
	- `gradio_huggingfacehub_search>=0.0.12` - HuggingFace model search component
	- `huggingface-hub>=0.23.0` - Model downloading
	- `opencc-python-reimplemented>=0.1.7` - Chinese text conversion
	- `numpy>=1.24.0` - Numerical operations for embeddings

	Development (optional):
	- `pytest>=7.4.0` - Testing framework
	- `ruff` - Linting and formatting
	- `mypy` - Type checking

	## Project Structure

	```
	tiny-scribe/
	├── summarize_transcript.py # Main CLI script
	├── app.py # Gradio web app
	├── requirements.txt # Python dependencies
	├── transcripts/ # Input transcript files
	├── test_e2e.py # E2E test
	├── test_advanced_mode.py # Advanced mode test
	├── test_lfm2_extract.py # LFM2 extraction test
	├── meeting_summarizer/ # Core summarization module
	│ ├── __init__.py
	│ ├── trace.py # Tracing/logging utilities
	│ └── extraction.py # Extraction and deduplication logic
	├── llama-cpp-python/ # Git submodule
	└── README.md # Project documentation
	```

	## Usage Patterns

	Model Loading:
	```python
	llm = Llama.from_pretrained(
	repo_id="unsloth/Qwen3-0.6B-GGUF",
	filename="*Q4_0.gguf",
	n_gpu_layers=-1, # -1 for all GPU, 0 for CPU
	n_ctx=32768, # Context window size
	verbose=False, # Cleaner output
	)
	```

	Inference Settings:
	- Extraction models: Low temp (0.1-0.3) for deterministic JSON
	- Synthesis models: Higher temp (0.7-0.9) for creative summaries
	- Reasoning types: Non-reasoning (hide checkbox), Hybrid (toggleable), Thinking-only (always on)

	Environment & GPU:
	```bash
	DEFAULT_N_THREADS=2 # CPU threads (1-32)
	N_GPU_LAYERS=0 # 0=CPU, -1=all GPU
	HF_HUB_DOWNLOAD_TIMEOUT=300 # Download timeout (seconds)
	```

	GPU offload detection: `from llama_cpp import llama_supports_gpu_offload`

	## Notes for AI Agents

	- Always call `llm.reset()` after completion to ensure state isolation
	- Model format: `repo_id:quant` (e.g., `unsloth/Qwen3-1.7B-GGUF:Q2_K_L`)
	- Default language output is English (zh-TW available via `-l zh-TW` or web UI)
	- OpenCC conversion only applied when output_language is "zh-TW"
	- HuggingFace cache at `~/.cache/huggingface/hub/` - clean periodically
	- HF Spaces runs on CPU tier with 2 vCPUs, 16GB RAM
	- Keep model sizes under 4GB for reasonable performance on free tier
	- Tests exist in root (test_e2e.py, test_advanced_mode.py, test_lfm2_extract.py)
	- Submodule tests in llama-cpp-python/tests/