Spaces:
Running
Running
File size: 4,624 Bytes
10d339c fc5ac33 10d339c fc5ac33 10d339c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 | # HuggingFace Spaces Deployment Guide
## Quick Start
### 1. Create Space on HuggingFace
1. Go to [huggingface.co/spaces](https://huggingface.co/spaces)
2. Click "Create new Space"
3. Select:
- **Space name**: `tiny-scribe` (or your preferred name)
- **SDK**: Docker
- **Space hardware**: CPU (Free Tier - 2 vCPUs)
4. Click "Create Space"
### 2. Upload Files
Upload these files to your Space:
- `app.py` - Main Gradio application
- `Dockerfile` - Container configuration
- `requirements.txt` - Python dependencies
- `README.md` - Space documentation
- `transcripts/` - Example files (optional)
Using Git:
```bash
git clone https://huggingface.co/spaces/your-username/tiny-scribe
cd tiny-scribe
# Copy files from this repo
git add .
git commit -m "Initial HF Spaces deployment"
git push
```
**IMPORTANT:** Always use `git push` - never edit files via the HuggingFace web UI. Web edits create generic commit messages like "Upload app.py with huggingface_hub".
### 3. Wait for Build
The Space will automatically:
1. Build the Docker container (~2-5 minutes)
2. Install dependencies (llama-cpp-python wheel is prebuilt)
3. Start the Gradio app
### 4. Access Your App
Once built, visit: `https://your-username-tiny-scribe.hf.space`
## Configuration
### Model Selection
The default model (`unsloth/Qwen3-0.6B-GGUF` Q4_K_M) is optimized for CPU:
- Small: 0.6B parameters
- Fast: ~2-5 seconds for short texts
- Efficient: Uses ~400MB RAM
To change models, edit `app.py`:
```python
DEFAULT_MODEL = "unsloth/Qwen3-1.7B-GGUF" # Larger model
DEFAULT_FILENAME = "*Q2_K_L.gguf" # Lower quantization for speed
```
### Performance Tuning
For Free Tier (2 vCPUs):
- Keep `n_ctx=4096` (context window)
- Use `max_tokens=512` (output length)
- Set `temperature=0.6` (balance creativity/coherence)
### Environment Variables
Optional settings in Space Settings:
```
MODEL_REPO=unsloth/Qwen3-0.6B-GGUF
MODEL_FILENAME=*Q4_K_M.gguf
MAX_TOKENS=512
TEMPERATURE=0.6
```
## Features
1. **File Upload**: Drag & drop .txt files
2. **Live Streaming**: Real-time token output
3. **Traditional Chinese**: Auto-conversion to zh-TW
4. **Progressive Loading**: Model downloads on first use (~30-60s)
5. **Responsive UI**: Works on mobile and desktop
## Troubleshooting
### Build Fails
- Check Docker Hub status
- Verify requirements.txt syntax
- Ensure no large files in repo
### Out of Memory
- Reduce `n_ctx` (context window)
- Use smaller model (Q2_K quantization)
- Limit input file size
### Slow Inference
- Normal for CPU-only Free Tier
- First request downloads model (~400MB)
- Subsequent requests are faster
## Architecture
```
User Upload β Gradio Interface β app.py β llama-cpp-python β Qwen Model
β
OpenCC (s2twp)
β
Streaming Output β User
```
## Deployment Workflow
### Recommended: Use the Deployment Script
The `deploy.sh` script ensures meaningful commit messages:
```bash
# Make your changes
vim app.py
# Test locally
python app.py
# Deploy with meaningful message
./deploy.sh "Fix: Improve thinking block extraction"
```
The script will:
1. Check for uncommitted changes
2. Prompt for commit message if not provided
3. Warn about generic/short messages
4. Show commits to be pushed
5. Confirm before pushing
6. Verify commit message was preserved on remote
### Manual Deployment
If deploying manually:
```bash
# 1. Make changes
vim app.py
# 2. Test locally
python app.py
# 3. Commit with detailed message
git add app.py
git commit -m "Fix: Improve streaming output formatting
- Extract thinking blocks more reliably
- Show full response in thinking field
- Update regex pattern for better parsing"
# 4. Push to HuggingFace Spaces
git push origin main
# 5. Verify deployment
# Visit: https://huggingface.co/spaces/Luigi/tiny-scribe
```
### Avoiding Generic Commit Messages
**β DON'T:**
- Edit files directly on huggingface.co
- Use the "Upload files" button in HF web UI
- Use single-word commit messages ("fix", "update")
**β
DO:**
- Always use `git push` from command line
- Write descriptive commit messages
- Test locally before pushing
### Git Hook
A pre-push hook is installed in `.git/hooks/pre-push` that:
- Validates commit messages before pushing
- Warns about very short messages
- Ensures you're not accidentally pushing generic commits
## Local Testing
Before deploying to HF Spaces:
```bash
pip install -r requirements.txt
python app.py
```
Then open: http://localhost:7860
## License
MIT - See LICENSE file for details.
|