Spaces:
Running
Running
HuggingFace Spaces Deployment Guide
Quick Start
1. Create Space on HuggingFace
- Go to huggingface.co/spaces
- Click "Create new Space"
- Select:
- Space name:
tiny-scribe(or your preferred name) - SDK: Docker
- Space hardware: CPU (Free Tier - 2 vCPUs)
- Space name:
- Click "Create Space"
2. Upload Files
Upload these files to your Space:
app.py- Main Gradio applicationDockerfile- Container configurationrequirements.txt- Python dependenciesREADME.md- Space documentationtranscripts/- Example files (optional)
Using Git:
git clone https://huggingface.co/spaces/your-username/tiny-scribe
cd tiny-scribe
# Copy files from this repo
git add .
git commit -m "Initial HF Spaces deployment"
git push
IMPORTANT: Always use git push - never edit files via the HuggingFace web UI. Web edits create generic commit messages like "Upload app.py with huggingface_hub".
3. Wait for Build
The Space will automatically:
- Build the Docker container (~2-5 minutes)
- Install dependencies (llama-cpp-python wheel is prebuilt)
- Start the Gradio app
4. Access Your App
Once built, visit: https://your-username-tiny-scribe.hf.space
Configuration
Model Selection
The default model (unsloth/Qwen3-0.6B-GGUF Q4_K_M) is optimized for CPU:
- Small: 0.6B parameters
- Fast: ~2-5 seconds for short texts
- Efficient: Uses ~400MB RAM
To change models, edit app.py:
DEFAULT_MODEL = "unsloth/Qwen3-1.7B-GGUF" # Larger model
DEFAULT_FILENAME = "*Q2_K_L.gguf" # Lower quantization for speed
Performance Tuning
For Free Tier (2 vCPUs):
- Keep
n_ctx=4096(context window) - Use
max_tokens=512(output length) - Set
temperature=0.6(balance creativity/coherence)
Environment Variables
Optional settings in Space Settings:
MODEL_REPO=unsloth/Qwen3-0.6B-GGUF
MODEL_FILENAME=*Q4_K_M.gguf
MAX_TOKENS=512
TEMPERATURE=0.6
Features
- File Upload: Drag & drop .txt files
- Live Streaming: Real-time token output
- Traditional Chinese: Auto-conversion to zh-TW
- Progressive Loading: Model downloads on first use (~30-60s)
- Responsive UI: Works on mobile and desktop
Troubleshooting
Build Fails
- Check Docker Hub status
- Verify requirements.txt syntax
- Ensure no large files in repo
Out of Memory
- Reduce
n_ctx(context window) - Use smaller model (Q2_K quantization)
- Limit input file size
Slow Inference
- Normal for CPU-only Free Tier
- First request downloads model (~400MB)
- Subsequent requests are faster
Architecture
User Upload β Gradio Interface β app.py β llama-cpp-python β Qwen Model
β
OpenCC (s2twp)
β
Streaming Output β User
Deployment Workflow
Recommended: Use the Deployment Script
The deploy.sh script ensures meaningful commit messages:
# Make your changes
vim app.py
# Test locally
python app.py
# Deploy with meaningful message
./deploy.sh "Fix: Improve thinking block extraction"
The script will:
- Check for uncommitted changes
- Prompt for commit message if not provided
- Warn about generic/short messages
- Show commits to be pushed
- Confirm before pushing
- Verify commit message was preserved on remote
Manual Deployment
If deploying manually:
# 1. Make changes
vim app.py
# 2. Test locally
python app.py
# 3. Commit with detailed message
git add app.py
git commit -m "Fix: Improve streaming output formatting
- Extract thinking blocks more reliably
- Show full response in thinking field
- Update regex pattern for better parsing"
# 4. Push to HuggingFace Spaces
git push origin main
# 5. Verify deployment
# Visit: https://huggingface.co/spaces/Luigi/tiny-scribe
Avoiding Generic Commit Messages
β DON'T:
- Edit files directly on huggingface.co
- Use the "Upload files" button in HF web UI
- Use single-word commit messages ("fix", "update")
β DO:
- Always use
git pushfrom command line - Write descriptive commit messages
- Test locally before pushing
Git Hook
A pre-push hook is installed in .git/hooks/pre-push that:
- Validates commit messages before pushing
- Warns about very short messages
- Ensures you're not accidentally pushing generic commits
Local Testing
Before deploying to HF Spaces:
pip install -r requirements.txt
python app.py
Then open: http://localhost:7860
License
MIT - See LICENSE file for details.