Spaces:
Running
Running
| # HuggingFace Spaces Deployment Guide | |
| ## Quick Start | |
| ### 1. Create Space on HuggingFace | |
| 1. Go to [huggingface.co/spaces](https://huggingface.co/spaces) | |
| 2. Click "Create new Space" | |
| 3. Select: | |
| - **Space name**: `tiny-scribe` (or your preferred name) | |
| - **SDK**: Docker | |
| - **Space hardware**: CPU (Free Tier - 2 vCPUs) | |
| 4. Click "Create Space" | |
| ### 2. Upload Files | |
| Upload these files to your Space: | |
| - `app.py` - Main Gradio application | |
| - `Dockerfile` - Container configuration | |
| - `requirements.txt` - Python dependencies | |
| - `README.md` - Space documentation | |
| - `transcripts/` - Example files (optional) | |
| Using Git: | |
| ```bash | |
| git clone https://huggingface.co/spaces/your-username/tiny-scribe | |
| cd tiny-scribe | |
| # Copy files from this repo | |
| git add . | |
| git commit -m "Initial HF Spaces deployment" | |
| git push | |
| ``` | |
| **IMPORTANT:** Always use `git push` - never edit files via the HuggingFace web UI. Web edits create generic commit messages like "Upload app.py with huggingface_hub". | |
| ### 3. Wait for Build | |
| The Space will automatically: | |
| 1. Build the Docker container (~2-5 minutes) | |
| 2. Install dependencies (llama-cpp-python wheel is prebuilt) | |
| 3. Start the Gradio app | |
| ### 4. Access Your App | |
| Once built, visit: `https://your-username-tiny-scribe.hf.space` | |
| ## Configuration | |
| ### Model Selection | |
| The default model (`unsloth/Qwen3-0.6B-GGUF` Q4_K_M) is optimized for CPU: | |
| - Small: 0.6B parameters | |
| - Fast: ~2-5 seconds for short texts | |
| - Efficient: Uses ~400MB RAM | |
| To change models, edit `app.py`: | |
| ```python | |
| DEFAULT_MODEL = "unsloth/Qwen3-1.7B-GGUF" # Larger model | |
| DEFAULT_FILENAME = "*Q2_K_L.gguf" # Lower quantization for speed | |
| ``` | |
| ### Performance Tuning | |
| For Free Tier (2 vCPUs): | |
| - Keep `n_ctx=4096` (context window) | |
| - Use `max_tokens=512` (output length) | |
| - Set `temperature=0.6` (balance creativity/coherence) | |
| ### Environment Variables | |
| Optional settings in Space Settings: | |
| ``` | |
| MODEL_REPO=unsloth/Qwen3-0.6B-GGUF | |
| MODEL_FILENAME=*Q4_K_M.gguf | |
| MAX_TOKENS=512 | |
| TEMPERATURE=0.6 | |
| ``` | |
| ## Features | |
| 1. **File Upload**: Drag & drop .txt files | |
| 2. **Live Streaming**: Real-time token output | |
| 3. **Traditional Chinese**: Auto-conversion to zh-TW | |
| 4. **Progressive Loading**: Model downloads on first use (~30-60s) | |
| 5. **Responsive UI**: Works on mobile and desktop | |
| ## Troubleshooting | |
| ### Build Fails | |
| - Check Docker Hub status | |
| - Verify requirements.txt syntax | |
| - Ensure no large files in repo | |
| ### Out of Memory | |
| - Reduce `n_ctx` (context window) | |
| - Use smaller model (Q2_K quantization) | |
| - Limit input file size | |
| ### Slow Inference | |
| - Normal for CPU-only Free Tier | |
| - First request downloads model (~400MB) | |
| - Subsequent requests are faster | |
| ## Architecture | |
| ``` | |
| User Upload β Gradio Interface β app.py β llama-cpp-python β Qwen Model | |
| β | |
| OpenCC (s2twp) | |
| β | |
| Streaming Output β User | |
| ``` | |
| ## Deployment Workflow | |
| ### Recommended: Use the Deployment Script | |
| The `deploy.sh` script ensures meaningful commit messages: | |
| ```bash | |
| # Make your changes | |
| vim app.py | |
| # Test locally | |
| python app.py | |
| # Deploy with meaningful message | |
| ./deploy.sh "Fix: Improve thinking block extraction" | |
| ``` | |
| The script will: | |
| 1. Check for uncommitted changes | |
| 2. Prompt for commit message if not provided | |
| 3. Warn about generic/short messages | |
| 4. Show commits to be pushed | |
| 5. Confirm before pushing | |
| 6. Verify commit message was preserved on remote | |
| ### Manual Deployment | |
| If deploying manually: | |
| ```bash | |
| # 1. Make changes | |
| vim app.py | |
| # 2. Test locally | |
| python app.py | |
| # 3. Commit with detailed message | |
| git add app.py | |
| git commit -m "Fix: Improve streaming output formatting | |
| - Extract thinking blocks more reliably | |
| - Show full response in thinking field | |
| - Update regex pattern for better parsing" | |
| # 4. Push to HuggingFace Spaces | |
| git push origin main | |
| # 5. Verify deployment | |
| # Visit: https://huggingface.co/spaces/Luigi/tiny-scribe | |
| ``` | |
| ### Avoiding Generic Commit Messages | |
| **β DON'T:** | |
| - Edit files directly on huggingface.co | |
| - Use the "Upload files" button in HF web UI | |
| - Use single-word commit messages ("fix", "update") | |
| **β DO:** | |
| - Always use `git push` from command line | |
| - Write descriptive commit messages | |
| - Test locally before pushing | |
| ### Git Hook | |
| A pre-push hook is installed in `.git/hooks/pre-push` that: | |
| - Validates commit messages before pushing | |
| - Warns about very short messages | |
| - Ensures you're not accidentally pushing generic commits | |
| ## Local Testing | |
| Before deploying to HF Spaces: | |
| ```bash | |
| pip install -r requirements.txt | |
| python app.py | |
| ``` | |
| Then open: http://localhost:7860 | |
| ## License | |
| MIT - See LICENSE file for details. | |