tiny-scribe / DEPLOY.md
Luigi's picture
Test: Verify git workflow preserves commit messages
fc5ac33
# HuggingFace Spaces Deployment Guide
## Quick Start
### 1. Create Space on HuggingFace
1. Go to [huggingface.co/spaces](https://huggingface.co/spaces)
2. Click "Create new Space"
3. Select:
- **Space name**: `tiny-scribe` (or your preferred name)
- **SDK**: Docker
- **Space hardware**: CPU (Free Tier - 2 vCPUs)
4. Click "Create Space"
### 2. Upload Files
Upload these files to your Space:
- `app.py` - Main Gradio application
- `Dockerfile` - Container configuration
- `requirements.txt` - Python dependencies
- `README.md` - Space documentation
- `transcripts/` - Example files (optional)
Using Git:
```bash
git clone https://huggingface.co/spaces/your-username/tiny-scribe
cd tiny-scribe
# Copy files from this repo
git add .
git commit -m "Initial HF Spaces deployment"
git push
```
**IMPORTANT:** Always use `git push` - never edit files via the HuggingFace web UI. Web edits create generic commit messages like "Upload app.py with huggingface_hub".
### 3. Wait for Build
The Space will automatically:
1. Build the Docker container (~2-5 minutes)
2. Install dependencies (llama-cpp-python wheel is prebuilt)
3. Start the Gradio app
### 4. Access Your App
Once built, visit: `https://your-username-tiny-scribe.hf.space`
## Configuration
### Model Selection
The default model (`unsloth/Qwen3-0.6B-GGUF` Q4_K_M) is optimized for CPU:
- Small: 0.6B parameters
- Fast: ~2-5 seconds for short texts
- Efficient: Uses ~400MB RAM
To change models, edit `app.py`:
```python
DEFAULT_MODEL = "unsloth/Qwen3-1.7B-GGUF" # Larger model
DEFAULT_FILENAME = "*Q2_K_L.gguf" # Lower quantization for speed
```
### Performance Tuning
For Free Tier (2 vCPUs):
- Keep `n_ctx=4096` (context window)
- Use `max_tokens=512` (output length)
- Set `temperature=0.6` (balance creativity/coherence)
### Environment Variables
Optional settings in Space Settings:
```
MODEL_REPO=unsloth/Qwen3-0.6B-GGUF
MODEL_FILENAME=*Q4_K_M.gguf
MAX_TOKENS=512
TEMPERATURE=0.6
```
## Features
1. **File Upload**: Drag & drop .txt files
2. **Live Streaming**: Real-time token output
3. **Traditional Chinese**: Auto-conversion to zh-TW
4. **Progressive Loading**: Model downloads on first use (~30-60s)
5. **Responsive UI**: Works on mobile and desktop
## Troubleshooting
### Build Fails
- Check Docker Hub status
- Verify requirements.txt syntax
- Ensure no large files in repo
### Out of Memory
- Reduce `n_ctx` (context window)
- Use smaller model (Q2_K quantization)
- Limit input file size
### Slow Inference
- Normal for CPU-only Free Tier
- First request downloads model (~400MB)
- Subsequent requests are faster
## Architecture
```
User Upload β†’ Gradio Interface β†’ app.py β†’ llama-cpp-python β†’ Qwen Model
↓
OpenCC (s2twp)
↓
Streaming Output β†’ User
```
## Deployment Workflow
### Recommended: Use the Deployment Script
The `deploy.sh` script ensures meaningful commit messages:
```bash
# Make your changes
vim app.py
# Test locally
python app.py
# Deploy with meaningful message
./deploy.sh "Fix: Improve thinking block extraction"
```
The script will:
1. Check for uncommitted changes
2. Prompt for commit message if not provided
3. Warn about generic/short messages
4. Show commits to be pushed
5. Confirm before pushing
6. Verify commit message was preserved on remote
### Manual Deployment
If deploying manually:
```bash
# 1. Make changes
vim app.py
# 2. Test locally
python app.py
# 3. Commit with detailed message
git add app.py
git commit -m "Fix: Improve streaming output formatting
- Extract thinking blocks more reliably
- Show full response in thinking field
- Update regex pattern for better parsing"
# 4. Push to HuggingFace Spaces
git push origin main
# 5. Verify deployment
# Visit: https://huggingface.co/spaces/Luigi/tiny-scribe
```
### Avoiding Generic Commit Messages
**❌ DON'T:**
- Edit files directly on huggingface.co
- Use the "Upload files" button in HF web UI
- Use single-word commit messages ("fix", "update")
**βœ… DO:**
- Always use `git push` from command line
- Write descriptive commit messages
- Test locally before pushing
### Git Hook
A pre-push hook is installed in `.git/hooks/pre-push` that:
- Validates commit messages before pushing
- Warns about very short messages
- Ensures you're not accidentally pushing generic commits
## Local Testing
Before deploying to HF Spaces:
```bash
pip install -r requirements.txt
python app.py
```
Then open: http://localhost:7860
## License
MIT - See LICENSE file for details.