Spaces:

Luigi
/

tiny-scribe

Running

File size: 4,624 Bytes

# HuggingFace Spaces Deployment Guide

## Quick Start

### 1. Create Space on HuggingFace

1. Go to [huggingface.co/spaces](https://huggingface.co/spaces)
2. Click "Create new Space"
3. Select:
   - **Space name**: `tiny-scribe` (or your preferred name)
   - **SDK**: Docker
   - **Space hardware**: CPU (Free Tier - 2 vCPUs)
4. Click "Create Space"

### 2. Upload Files

Upload these files to your Space:
- `app.py` - Main Gradio application
- `Dockerfile` - Container configuration
- `requirements.txt` - Python dependencies
- `README.md` - Space documentation
- `transcripts/` - Example files (optional)

Using Git:
```bash
git clone https://huggingface.co/spaces/your-username/tiny-scribe
cd tiny-scribe
# Copy files from this repo
git add .
git commit -m "Initial HF Spaces deployment"
git push
```

**IMPORTANT:** Always use `git push` - never edit files via the HuggingFace web UI. Web edits create generic commit messages like "Upload app.py with huggingface_hub".

### 3. Wait for Build

The Space will automatically:
1. Build the Docker container (~2-5 minutes)
2. Install dependencies (llama-cpp-python wheel is prebuilt)
3. Start the Gradio app

### 4. Access Your App

Once built, visit: `https://your-username-tiny-scribe.hf.space`

## Configuration

### Model Selection

The default model (`unsloth/Qwen3-0.6B-GGUF` Q4_K_M) is optimized for CPU:
- Small: 0.6B parameters
- Fast: ~2-5 seconds for short texts
- Efficient: Uses ~400MB RAM

To change models, edit `app.py`:
```python
DEFAULT_MODEL = "unsloth/Qwen3-1.7B-GGUF"  # Larger model
DEFAULT_FILENAME = "*Q2_K_L.gguf"  # Lower quantization for speed
```

### Performance Tuning

For Free Tier (2 vCPUs):
- Keep `n_ctx=4096` (context window)
- Use `max_tokens=512` (output length)
- Set `temperature=0.6` (balance creativity/coherence)

### Environment Variables

Optional settings in Space Settings:
```
MODEL_REPO=unsloth/Qwen3-0.6B-GGUF
MODEL_FILENAME=*Q4_K_M.gguf
MAX_TOKENS=512
TEMPERATURE=0.6
```

## Features

1. **File Upload**: Drag & drop .txt files
2. **Live Streaming**: Real-time token output
3. **Traditional Chinese**: Auto-conversion to zh-TW
4. **Progressive Loading**: Model downloads on first use (~30-60s)
5. **Responsive UI**: Works on mobile and desktop

## Troubleshooting

### Build Fails
- Check Docker Hub status
- Verify requirements.txt syntax
- Ensure no large files in repo

### Out of Memory
- Reduce `n_ctx` (context window)
- Use smaller model (Q2_K quantization)
- Limit input file size

### Slow Inference
- Normal for CPU-only Free Tier
- First request downloads model (~400MB)
- Subsequent requests are faster

## Architecture

```
User Upload → Gradio Interface → app.py → llama-cpp-python → Qwen Model
                                    ↓
                              OpenCC (s2twp)
                                    ↓
                         Streaming Output → User
```

## Deployment Workflow

### Recommended: Use the Deployment Script

The `deploy.sh` script ensures meaningful commit messages:

```bash
# Make your changes
vim app.py

# Test locally
python app.py

# Deploy with meaningful message
./deploy.sh "Fix: Improve thinking block extraction"
```

The script will:
1. Check for uncommitted changes
2. Prompt for commit message if not provided
3. Warn about generic/short messages
4. Show commits to be pushed
5. Confirm before pushing
6. Verify commit message was preserved on remote

### Manual Deployment

If deploying manually:

```bash
# 1. Make changes
vim app.py

# 2. Test locally
python app.py

# 3. Commit with detailed message
git add app.py
git commit -m "Fix: Improve streaming output formatting

- Extract thinking blocks more reliably
- Show full response in thinking field
- Update regex pattern for better parsing"

# 4. Push to HuggingFace Spaces
git push origin main

# 5. Verify deployment
# Visit: https://huggingface.co/spaces/Luigi/tiny-scribe
```

### Avoiding Generic Commit Messages

**❌ DON'T:**
- Edit files directly on huggingface.co
- Use the "Upload files" button in HF web UI
- Use single-word commit messages ("fix", "update")

**✅ DO:**
- Always use `git push` from command line
- Write descriptive commit messages
- Test locally before pushing

### Git Hook

A pre-push hook is installed in `.git/hooks/pre-push` that:
- Validates commit messages before pushing
- Warns about very short messages
- Ensures you're not accidentally pushing generic commits

## Local Testing

Before deploying to HF Spaces:

```bash
pip install -r requirements.txt
python app.py
```

Then open: http://localhost:7860

## License

MIT - See LICENSE file for details.