# HuggingFace Spaces Deployment Guide ## Quick Start ### 1. Create Space on HuggingFace 1. Go to [huggingface.co/spaces](https://huggingface.co/spaces) 2. Click "Create new Space" 3. Select: - **Space name**: `tiny-scribe` (or your preferred name) - **SDK**: Docker - **Space hardware**: CPU (Free Tier - 2 vCPUs) 4. Click "Create Space" ### 2. Upload Files Upload these files to your Space: - `app.py` - Main Gradio application - `Dockerfile` - Container configuration - `requirements.txt` - Python dependencies - `README.md` - Space documentation - `transcripts/` - Example files (optional) Using Git: ```bash git clone https://huggingface.co/spaces/your-username/tiny-scribe cd tiny-scribe # Copy files from this repo git add . git commit -m "Initial HF Spaces deployment" git push ``` **IMPORTANT:** Always use `git push` - never edit files via the HuggingFace web UI. Web edits create generic commit messages like "Upload app.py with huggingface_hub". ### 3. Wait for Build The Space will automatically: 1. Build the Docker container (~2-5 minutes) 2. Install dependencies (llama-cpp-python wheel is prebuilt) 3. Start the Gradio app ### 4. Access Your App Once built, visit: `https://your-username-tiny-scribe.hf.space` ## Configuration ### Model Selection The default model (`unsloth/Qwen3-0.6B-GGUF` Q4_K_M) is optimized for CPU: - Small: 0.6B parameters - Fast: ~2-5 seconds for short texts - Efficient: Uses ~400MB RAM To change models, edit `app.py`: ```python DEFAULT_MODEL = "unsloth/Qwen3-1.7B-GGUF" # Larger model DEFAULT_FILENAME = "*Q2_K_L.gguf" # Lower quantization for speed ``` ### Performance Tuning For Free Tier (2 vCPUs): - Keep `n_ctx=4096` (context window) - Use `max_tokens=512` (output length) - Set `temperature=0.6` (balance creativity/coherence) ### Environment Variables Optional settings in Space Settings: ``` MODEL_REPO=unsloth/Qwen3-0.6B-GGUF MODEL_FILENAME=*Q4_K_M.gguf MAX_TOKENS=512 TEMPERATURE=0.6 ``` ## Features 1. **File Upload**: Drag & drop .txt files 2. **Live Streaming**: Real-time token output 3. **Traditional Chinese**: Auto-conversion to zh-TW 4. **Progressive Loading**: Model downloads on first use (~30-60s) 5. **Responsive UI**: Works on mobile and desktop ## Troubleshooting ### Build Fails - Check Docker Hub status - Verify requirements.txt syntax - Ensure no large files in repo ### Out of Memory - Reduce `n_ctx` (context window) - Use smaller model (Q2_K quantization) - Limit input file size ### Slow Inference - Normal for CPU-only Free Tier - First request downloads model (~400MB) - Subsequent requests are faster ## Architecture ``` User Upload → Gradio Interface → app.py → llama-cpp-python → Qwen Model ↓ OpenCC (s2twp) ↓ Streaming Output → User ``` ## Deployment Workflow ### Recommended: Use the Deployment Script The `deploy.sh` script ensures meaningful commit messages: ```bash # Make your changes vim app.py # Test locally python app.py # Deploy with meaningful message ./deploy.sh "Fix: Improve thinking block extraction" ``` The script will: 1. Check for uncommitted changes 2. Prompt for commit message if not provided 3. Warn about generic/short messages 4. Show commits to be pushed 5. Confirm before pushing 6. Verify commit message was preserved on remote ### Manual Deployment If deploying manually: ```bash # 1. Make changes vim app.py # 2. Test locally python app.py # 3. Commit with detailed message git add app.py git commit -m "Fix: Improve streaming output formatting - Extract thinking blocks more reliably - Show full response in thinking field - Update regex pattern for better parsing" # 4. Push to HuggingFace Spaces git push origin main # 5. Verify deployment # Visit: https://huggingface.co/spaces/Luigi/tiny-scribe ``` ### Avoiding Generic Commit Messages **❌ DON'T:** - Edit files directly on huggingface.co - Use the "Upload files" button in HF web UI - Use single-word commit messages ("fix", "update") **✅ DO:** - Always use `git push` from command line - Write descriptive commit messages - Test locally before pushing ### Git Hook A pre-push hook is installed in `.git/hooks/pre-push` that: - Validates commit messages before pushing - Warns about very short messages - Ensures you're not accidentally pushing generic commits ## Local Testing Before deploying to HF Spaces: ```bash pip install -r requirements.txt python app.py ``` Then open: http://localhost:7860 ## License MIT - See LICENSE file for details.