Luigi commited on
Commit
ddec8de
Β·
1 Parent(s): 80ca4af

Update AGENTS.md with Gradio app coverage and deployment info

Browse files

- Add Gradio web app commands and documentation
- Include Docker/HF Spaces deployment section
- Update dependencies with version specifications
- Add gradio import to code style guidelines
- Improve thinking block regex pattern to support both formats
- Add HF Spaces resource constraints (2 vCPUs, 16GB RAM)
- Update project structure with app.py and Dockerfile

Files changed (1) hide show
  1. AGENTS.md +33 -12
AGENTS.md CHANGED
@@ -2,17 +2,22 @@
2
 
3
  ## Project Overview
4
 
5
- Tiny Scribe is a Python CLI tool for summarizing transcripts using GGUF models (e.g., ERNIE, Qwen, Granite) with llama-cpp-python. It supports live streaming output and Traditional Chinese (zh-TW) conversion via OpenCC.
6
 
7
  ## Build / Lint / Test Commands
8
 
9
- **Run the script:**
10
  ```bash
11
  python summarize_transcript.py -i ./transcripts/short.txt
12
  python summarize_transcript.py -m unsloth/Qwen3-1.7B-GGUF:Q2_K_L
13
  python summarize_transcript.py -c # CPU only
14
  ```
15
 
 
 
 
 
 
16
  **Linting (if ruff installed):**
17
  ```bash
18
  ruff check .
@@ -23,6 +28,7 @@ ruff format . # Auto-format code
23
  **Type checking (if mypy installed):**
24
  ```bash
25
  mypy summarize_transcript.py
 
26
  ```
27
 
28
  **Running tests:**
@@ -62,6 +68,7 @@ from typing import Tuple, Optional, Generator
62
  from llama_cpp import Llama
63
  from huggingface_hub import hf_hub_download
64
  from opencc import OpenCC
 
65
  ```
66
 
67
  **Type Hints:**
@@ -90,29 +97,29 @@ from opencc import OpenCC
90
  ## Dependencies
91
 
92
  **Required:**
93
- - `llama-cpp-python` - Core inference engine
94
- - `huggingface-hub` - Model downloading
95
- - `opencc` - Chinese text conversion
 
96
 
97
  **Development (optional):**
98
- - `pytest` - Testing framework
99
  - `ruff` - Linting and formatting
100
  - `mypy` - Type checking
101
- - `black` - Code formatting
102
 
103
  ## Project Structure
104
 
105
  ```
106
  tiny-scribe/
107
  β”œβ”€β”€ summarize_transcript.py # Main CLI script
 
 
 
108
  β”œβ”€β”€ transcripts/ # Input transcript files
109
  β”‚ β”œβ”€β”€ short.txt
110
  β”‚ └── full.txt
111
- β”œβ”€β”€ summary.txt # Generated output
112
  β”œβ”€β”€ llama-cpp-python/ # Git submodule
113
  β”‚ β”œβ”€β”€ tests/ # Test suite
114
- β”‚ β”‚ β”œβ”€β”€ test_llama.py
115
- β”‚ β”‚ └── test_llama_grammar.py
116
  β”‚ └── vendor/llama.cpp/ # Core C++ library
117
  └── README.md # Project documentation
118
  ```
@@ -143,7 +150,7 @@ stream = llm.create_chat_completion(
143
  **Thinking Block Parsing:**
144
  ```python
145
  # Extract thinking/reasoning blocks from model output
146
- THINKING_PATTERN = re.compile(r'<thinking>(.*?)</thinking>', re.DOTALL)
147
 
148
  for chunk in stream:
149
  delta = chunk["choices"][0]["delta"]
@@ -169,8 +176,9 @@ traditional_text = converter.convert(simplified_text)
169
  - Always call `llm.reset()` after completion to ensure state isolation
170
  - Model format: `repo_id:quant` (e.g., `unsloth/Qwen3-1.7B-GGUF:Q2_K_L`)
171
  - Default language output is Traditional Chinese (zh-TW) via OpenCC conversion
172
- - Claude permissions configured in `.claude/settings.local.json` for tool access
173
  - HuggingFace cache at `~/.cache/huggingface/hub/` - clean periodically
 
 
174
 
175
  ## Git Submodule Management
176
 
@@ -181,3 +189,16 @@ git submodule update --init --recursive
181
  # Update llama-cpp-python to latest
182
  cd llama-cpp-python && git pull origin main && cd .. && git add llama-cpp-python
183
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
  ## Project Overview
4
 
5
+ Tiny Scribe is a Python CLI tool and Gradio web app for summarizing transcripts using GGUF models (e.g., ERNIE, Qwen, Granite) with llama-cpp-python. It supports live streaming output and Traditional Chinese (zh-TW) conversion via OpenCC.
6
 
7
  ## Build / Lint / Test Commands
8
 
9
+ **Run the CLI script:**
10
  ```bash
11
  python summarize_transcript.py -i ./transcripts/short.txt
12
  python summarize_transcript.py -m unsloth/Qwen3-1.7B-GGUF:Q2_K_L
13
  python summarize_transcript.py -c # CPU only
14
  ```
15
 
16
+ **Run the Gradio web app:**
17
+ ```bash
18
+ python app.py # Starts on port 7860
19
+ ```
20
+
21
  **Linting (if ruff installed):**
22
  ```bash
23
  ruff check .
 
28
  **Type checking (if mypy installed):**
29
  ```bash
30
  mypy summarize_transcript.py
31
+ mypy app.py
32
  ```
33
 
34
  **Running tests:**
 
68
  from llama_cpp import Llama
69
  from huggingface_hub import hf_hub_download
70
  from opencc import OpenCC
71
+ import gradio as gr
72
  ```
73
 
74
  **Type Hints:**
 
97
  ## Dependencies
98
 
99
  **Required:**
100
+ - `llama-cpp-python>=0.3.0` - Core inference engine
101
+ - `gradio>=5.0.0` - Web UI framework
102
+ - `huggingface-hub>=0.23.0` - Model downloading
103
+ - `opencc-python-reimplemented>=0.1.7` - Chinese text conversion
104
 
105
  **Development (optional):**
106
+ - `pytest>=7.4.0` - Testing framework
107
  - `ruff` - Linting and formatting
108
  - `mypy` - Type checking
 
109
 
110
  ## Project Structure
111
 
112
  ```
113
  tiny-scribe/
114
  β”œβ”€β”€ summarize_transcript.py # Main CLI script
115
+ β”œβ”€β”€ app.py # Gradio web app (HuggingFace Spaces)
116
+ β”œβ”€β”€ requirements.txt # Python dependencies
117
+ β”œβ”€β”€ Dockerfile # HF Spaces deployment config
118
  β”œβ”€β”€ transcripts/ # Input transcript files
119
  β”‚ β”œβ”€β”€ short.txt
120
  β”‚ └── full.txt
 
121
  β”œβ”€β”€ llama-cpp-python/ # Git submodule
122
  β”‚ β”œβ”€β”€ tests/ # Test suite
 
 
123
  β”‚ └── vendor/llama.cpp/ # Core C++ library
124
  └── README.md # Project documentation
125
  ```
 
150
  **Thinking Block Parsing:**
151
  ```python
152
  # Extract thinking/reasoning blocks from model output
153
+ THINKING_PATTERN = re.compile(r'<think(?:ing)?>(.*?)</think(?:ing)?>', re.DOTALL)
154
 
155
  for chunk in stream:
156
  delta = chunk["choices"][0]["delta"]
 
176
  - Always call `llm.reset()` after completion to ensure state isolation
177
  - Model format: `repo_id:quant` (e.g., `unsloth/Qwen3-1.7B-GGUF:Q2_K_L`)
178
  - Default language output is Traditional Chinese (zh-TW) via OpenCC conversion
 
179
  - HuggingFace cache at `~/.cache/huggingface/hub/` - clean periodically
180
+ - HF Spaces runs on CPU tier with 2 vCPUs, 16GB RAM
181
+ - Keep model sizes under 4GB for reasonable performance on free tier
182
 
183
  ## Git Submodule Management
184
 
 
189
  # Update llama-cpp-python to latest
190
  cd llama-cpp-python && git pull origin main && cd .. && git add llama-cpp-python
191
  ```
192
+
193
+ ## Docker/HuggingFace Spaces Deployment
194
+
195
+ ```bash
196
+ # Build locally
197
+ docker build -t tiny-scribe .
198
+
199
+ # Run locally
200
+ docker run -p 7860:7860 tiny-scribe
201
+
202
+ # Deploy script
203
+ ./deploy.sh # Commits, pushes, and triggers HF Spaces rebuild
204
+ ```