File size: 4,624 Bytes
10d339c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fc5ac33
 
10d339c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fc5ac33
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10d339c
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
# HuggingFace Spaces Deployment Guide

## Quick Start

### 1. Create Space on HuggingFace

1. Go to [huggingface.co/spaces](https://huggingface.co/spaces)
2. Click "Create new Space"
3. Select:
   - **Space name**: `tiny-scribe` (or your preferred name)
   - **SDK**: Docker
   - **Space hardware**: CPU (Free Tier - 2 vCPUs)
4. Click "Create Space"

### 2. Upload Files

Upload these files to your Space:
- `app.py` - Main Gradio application
- `Dockerfile` - Container configuration
- `requirements.txt` - Python dependencies
- `README.md` - Space documentation
- `transcripts/` - Example files (optional)

Using Git:
```bash
git clone https://huggingface.co/spaces/your-username/tiny-scribe
cd tiny-scribe
# Copy files from this repo
git add .
git commit -m "Initial HF Spaces deployment"
git push
```

**IMPORTANT:** Always use `git push` - never edit files via the HuggingFace web UI. Web edits create generic commit messages like "Upload app.py with huggingface_hub".

### 3. Wait for Build

The Space will automatically:
1. Build the Docker container (~2-5 minutes)
2. Install dependencies (llama-cpp-python wheel is prebuilt)
3. Start the Gradio app

### 4. Access Your App

Once built, visit: `https://your-username-tiny-scribe.hf.space`

## Configuration

### Model Selection

The default model (`unsloth/Qwen3-0.6B-GGUF` Q4_K_M) is optimized for CPU:
- Small: 0.6B parameters
- Fast: ~2-5 seconds for short texts
- Efficient: Uses ~400MB RAM

To change models, edit `app.py`:
```python
DEFAULT_MODEL = "unsloth/Qwen3-1.7B-GGUF"  # Larger model
DEFAULT_FILENAME = "*Q2_K_L.gguf"  # Lower quantization for speed
```

### Performance Tuning

For Free Tier (2 vCPUs):
- Keep `n_ctx=4096` (context window)
- Use `max_tokens=512` (output length)
- Set `temperature=0.6` (balance creativity/coherence)

### Environment Variables

Optional settings in Space Settings:
```
MODEL_REPO=unsloth/Qwen3-0.6B-GGUF
MODEL_FILENAME=*Q4_K_M.gguf
MAX_TOKENS=512
TEMPERATURE=0.6
```

## Features

1. **File Upload**: Drag & drop .txt files
2. **Live Streaming**: Real-time token output
3. **Traditional Chinese**: Auto-conversion to zh-TW
4. **Progressive Loading**: Model downloads on first use (~30-60s)
5. **Responsive UI**: Works on mobile and desktop

## Troubleshooting

### Build Fails
- Check Docker Hub status
- Verify requirements.txt syntax
- Ensure no large files in repo

### Out of Memory
- Reduce `n_ctx` (context window)
- Use smaller model (Q2_K quantization)
- Limit input file size

### Slow Inference
- Normal for CPU-only Free Tier
- First request downloads model (~400MB)
- Subsequent requests are faster

## Architecture

```
User Upload β†’ Gradio Interface β†’ app.py β†’ llama-cpp-python β†’ Qwen Model
                                    ↓
                              OpenCC (s2twp)
                                    ↓
                         Streaming Output β†’ User
```

## Deployment Workflow

### Recommended: Use the Deployment Script

The `deploy.sh` script ensures meaningful commit messages:

```bash
# Make your changes
vim app.py

# Test locally
python app.py

# Deploy with meaningful message
./deploy.sh "Fix: Improve thinking block extraction"
```

The script will:
1. Check for uncommitted changes
2. Prompt for commit message if not provided
3. Warn about generic/short messages
4. Show commits to be pushed
5. Confirm before pushing
6. Verify commit message was preserved on remote

### Manual Deployment

If deploying manually:

```bash
# 1. Make changes
vim app.py

# 2. Test locally
python app.py

# 3. Commit with detailed message
git add app.py
git commit -m "Fix: Improve streaming output formatting

- Extract thinking blocks more reliably
- Show full response in thinking field
- Update regex pattern for better parsing"

# 4. Push to HuggingFace Spaces
git push origin main

# 5. Verify deployment
# Visit: https://huggingface.co/spaces/Luigi/tiny-scribe
```

### Avoiding Generic Commit Messages

**❌ DON'T:**
- Edit files directly on huggingface.co
- Use the "Upload files" button in HF web UI
- Use single-word commit messages ("fix", "update")

**βœ… DO:**
- Always use `git push` from command line
- Write descriptive commit messages
- Test locally before pushing

### Git Hook

A pre-push hook is installed in `.git/hooks/pre-push` that:
- Validates commit messages before pushing
- Warns about very short messages
- Ensures you're not accidentally pushing generic commits

## Local Testing

Before deploying to HF Spaces:

```bash
pip install -r requirements.txt
python app.py
```

Then open: http://localhost:7860

## License

MIT - See LICENSE file for details.