Spaces:
Running
Running
File size: 5,365 Bytes
8a9d263 e78283f 8a9d263 ddec8de 8a9d263 e78283f 8a9d263 ddec8de 8a9d263 10d339c 8a9d263 ddec8de 8a9d263 bc6516c 8a9d263 9d88146 bc6516c 9d88146 8a9d263 bc6516c 8a9d263 bc6516c 10d339c bc6516c 8a9d263 9d88146 8a9d263 bc6516c 8a9d263 bc6516c 8a9d263 10d339c 8a9d263 ddec8de 9d88146 8a9d263 9d88146 8a9d263 9d88146 8a9d263 9d88146 10d339c 8a9d263 9d88146 ddec8de 9d88146 ddec8de 9d88146 8a9d263 ddec8de 8a9d263 bc6516c ddec8de 8a9d263 bc6516c 9d88146 8a9d263 10d339c 8a9d263 9d88146 8a9d263 9d88146 8a9d263 bc6516c e78283f 10d339c ddec8de bc6516c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 | # AGENTS.md - Tiny Scribe Project Guidelines
## Project Overview
Tiny Scribe is a Python CLI tool and Gradio web app for summarizing transcripts using GGUF models (e.g., ERNIE, Qwen, Granite) with llama-cpp-python. It supports live streaming output and bilingual summaries (English or Traditional Chinese zh-TW) via OpenCC.
## Build / Lint / Test Commands
**Run the CLI script:**
```bash
python summarize_transcript.py -i ./transcripts/short.txt # Default English output
python summarize_transcript.py -i ./transcripts/short.txt -l zh-TW # Traditional Chinese output
python summarize_transcript.py -m unsloth/Qwen3-1.7B-GGUF:Q2_K_L
python summarize_transcript.py -c # CPU only
```
**Run the Gradio web app:**
```bash
python app.py # Starts on port 7860
```
**Linting (if ruff installed):**
```bash
ruff check .
ruff format . # Auto-format code
```
**Type checking (if mypy installed):**
```bash
mypy summarize_transcript.py
mypy app.py
```
**Running tests (root project tests):**
```bash
# Run all root tests
python test_e2e.py
python test_advanced_mode.py
python test_lfm2_extract.py
# Run single test with pytest
pytest test_e2e.py -v # Run all tests in file
pytest test_e2e.py::test_e2e -v # Run specific function
pytest test_advanced_mode.py -k "test_name" # Run by name pattern
```
**llama-cpp-python submodule tests:**
```bash
cd llama-cpp-python && pip install ".[test]" && pytest tests/test_llama.py -v
# Run specific test
cd llama-cpp-python && pytest tests/test_llama.py::test_function_name -v
```
## Code Style Guidelines
**Formatting:**
- 4 spaces indentation, 100 char max line length, double quotes for docstrings
- Two blank lines before functions, one after docstrings
**Imports (ordered):**
```python
# Standard library
import os
from typing import Tuple, Optional, Generator
# Third-party packages
from llama_cpp import Llama
import gradio as gr
# Local modules
from meeting_summarizer.trace import Tracer
```
**Type Hints:**
- Use type hints for params/returns
- `Optional[]` for nullable types, `Generator[str, None, None]` for generators
- Example: `def load_model(repo_id: str, filename: str) -> Llama:`
**Naming Conventions:**
- `snake_case` for functions/variables, `CamelCase` for classes, `UPPER_CASE` for constants
- Descriptive names: `stream_summarize_transcript`, not `summ`
**Error Handling:**
- Use explicit error messages with f-strings, check file existence before operations
- Use `try/except` for external API calls (Hugging Face, model loading)
- Log errors with context for debugging
## Dependencies
**Required:**
- `llama-cpp-python>=0.3.0` - Core inference engine (installed from llama-cpp-python submodule)
- `gradio>=5.0.0` - Web UI framework
- `gradio_huggingfacehub_search>=0.0.12` - HuggingFace model search component
- `huggingface-hub>=0.23.0` - Model downloading
- `opencc-python-reimplemented>=0.1.7` - Chinese text conversion
- `numpy>=1.24.0` - Numerical operations for embeddings
**Development (optional):**
- `pytest>=7.4.0` - Testing framework
- `ruff` - Linting and formatting
- `mypy` - Type checking
## Project Structure
```
tiny-scribe/
βββ summarize_transcript.py # Main CLI script
βββ app.py # Gradio web app
βββ requirements.txt # Python dependencies
βββ transcripts/ # Input transcript files
βββ test_e2e.py # E2E test
βββ test_advanced_mode.py # Advanced mode test
βββ test_lfm2_extract.py # LFM2 extraction test
βββ meeting_summarizer/ # Core summarization module
β βββ __init__.py
β βββ trace.py # Tracing/logging utilities
β βββ extraction.py # Extraction and deduplication logic
βββ llama-cpp-python/ # Git submodule
βββ README.md # Project documentation
```
## Usage Patterns
**Model Loading:**
```python
llm = Llama.from_pretrained(
repo_id="unsloth/Qwen3-0.6B-GGUF",
filename="*Q4_0.gguf",
n_gpu_layers=-1, # -1 for all GPU, 0 for CPU
n_ctx=32768, # Context window size
verbose=False, # Cleaner output
)
```
**Inference Settings:**
- Extraction models: Low temp (0.1-0.3) for deterministic JSON
- Synthesis models: Higher temp (0.7-0.9) for creative summaries
- Reasoning types: Non-reasoning (hide checkbox), Hybrid (toggleable), Thinking-only (always on)
**Environment & GPU:**
```bash
DEFAULT_N_THREADS=2 # CPU threads (1-32)
N_GPU_LAYERS=0 # 0=CPU, -1=all GPU
HF_HUB_DOWNLOAD_TIMEOUT=300 # Download timeout (seconds)
```
GPU offload detection: `from llama_cpp import llama_supports_gpu_offload`
## Notes for AI Agents
- Always call `llm.reset()` after completion to ensure state isolation
- Model format: `repo_id:quant` (e.g., `unsloth/Qwen3-1.7B-GGUF:Q2_K_L`)
- Default language output is English (zh-TW available via `-l zh-TW` or web UI)
- OpenCC conversion only applied when output_language is "zh-TW"
- HuggingFace cache at `~/.cache/huggingface/hub/` - clean periodically
- HF Spaces runs on CPU tier with 2 vCPUs, 16GB RAM
- Keep model sizes under 4GB for reasonable performance on free tier
- Tests exist in root (test_e2e.py, test_advanced_mode.py, test_lfm2_extract.py)
- Submodule tests in llama-cpp-python/tests/
|