File size: 5,847 Bytes
57fa449
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
# HuggingFace Spaces Deployment Guide

## Overview
This application is configured to run on **HuggingFace Spaces** using local model inference (no external API calls required).

---

## Quick Setup

### 1. Create a New Space
1. Go to https://huggingface.co/new-space
2. Choose **Gradio** as the SDK
3. Select **GPU** hardware (T4 or better recommended)
4. Name your Space (e.g., `transcriptor-ai`)

### 2. Upload Your Code
Upload all files from this directory to your Space, or connect a Git repository.

### 3. Configure Space Settings (Optional)

Go to **Settings β†’ Variables** in your Space and add:

| Variable | Value | Description |
|----------|-------|-------------|
| `DEBUG_MODE` | `True` or `False` | Enable detailed logging |
| `LLM_TEMPERATURE` | `0.7` | Model creativity (0.0-1.0) |
| `LLM_TIMEOUT` | `120` | Timeout in seconds |
| `LOCAL_MODEL` | `microsoft/Phi-3-mini-4k-instruct` | Model to use |

**Note:** All settings have sensible defaults - you don't need to set these unless you want to customize.

---

## Hardware Requirements

### Recommended: GPU (T4 or better)
- **Phi-3-mini-4k-instruct**: 3.8B params, ~8GB GPU RAM
- Processing speed: ~30-60 seconds per transcript chunk
- **Best for:** Production use with multiple users

### Alternative: CPU (not recommended)
- Will work but be very slow (5-10 minutes per chunk)
- Only suitable for testing

---

## Supported Models

You can change the model by setting the `LOCAL_MODEL` variable:

### Small & Fast (Recommended for Free Tier)
```

LOCAL_MODEL=microsoft/Phi-3-mini-4k-instruct  (Default - 3.8B params)

```

### Medium (Better quality, needs more GPU)
```

LOCAL_MODEL=mistralai/Mistral-7B-Instruct-v0.3  (7B params)

```

### Alternatives
```

LOCAL_MODEL=HuggingFaceH4/zephyr-7b-beta       (7B params, good instruction following)

LOCAL_MODEL=TinyLlama/TinyLlama-1.1B-Chat-v1.0 (1.1B params, very fast but lower quality)

```

---

## Configuration Files

### βœ… Required Files
- `app.py` - Main application
- `requirements.txt` - Python dependencies
- `llm.py`, `extractors.py`, etc. - Core modules

### ⚠️ NOT Needed for Spaces
- `.env` file - Use Spaces Variables instead
- Local database files
- API keys (unless using external APIs)

---

## Environment Configuration

The app automatically detects if it's running on HuggingFace Spaces and uses local model inference by default.

**Default Configuration (no .env needed):**
```python

USE_HF_API = False        # Don't use HF Inference API

USE_LMSTUDIO = False      # Don't use LM Studio

LLM_BACKEND = local       # Use local transformers

DEBUG_MODE = False        # Disable debug logs

```

**To override:** Set Spaces Variables (Settings β†’ Variables)

---

## Troubleshooting

### Issue: "Out of Memory" Error
**Solution:** Switch to a smaller model
```

LOCAL_MODEL=TinyLlama/TinyLlama-1.1B-Chat-v1.0

```

### Issue: Very Slow Processing
**Solution:**
1. Make sure you selected **GPU** hardware (not CPU)
2. Check Space logs for "Model loaded on cuda" confirmation
3. If on CPU, upgrade to GPU tier

### Issue: Quality Score 0.00
**Causes:**
1. Model not loaded properly (check logs for "[Local Model] Loading...")
2. GPU out of memory (model falls back to CPU)
3. Timeout too short (increase `LLM_TIMEOUT`)

**Debug Steps:**
1. Set `DEBUG_MODE=True` in Spaces Variables
2. Check logs for detailed error messages
3. Look for "[Local Model] βœ… Generated X characters"

### Issue: Model Downloads Every Time
**Solution:** HuggingFace Spaces caches models automatically, but first load takes 2-5 minutes.
- Subsequent starts are faster (~30 seconds)
- Don't restart Space unnecessarily

---

## Performance Optimization

### 1. Reduce Context Window
Edit `llm.py` line 399:
```python

max_length=2000  # Reduce from 3500 for faster processing

```

### 2. Lower Token Limit
Set Spaces Variable:
```

MAX_TOKENS_PER_REQUEST=800  # Default is 1500

```

### 3. Use Smaller Model
```

LOCAL_MODEL=TinyLlama/TinyLlama-1.1B-Chat-v1.0

```

### 4. Disable Debug Mode
```

DEBUG_MODE=False

```

---

## Monitoring

### View Logs
1. Go to your Space
2. Click **Logs** tab at the top
3. Look for startup messages:

```

βœ… Configuration loaded for HuggingFace Spaces

πŸš€ TranscriptorAI Enterprise - LLM Backend: local

[Local Model] Loading microsoft/Phi-3-mini-4k-instruct...

[Local Model] βœ… Model loaded on cuda:0

```

### Check Processing
During analysis, you should see:
```

[Local Model] Generating (1500 max tokens, temp=0.7)...

[Local Model] βœ… Generated 1247 characters

[LLM Debug] βœ… Successfully extracted JSON with 7 fields

```

---

## Cost Estimation

### Free Tier (CPU)
- ⚠️ Very slow but free
- ~5-10 minutes per transcript

### GPU (T4) - ~$0.60/hour
- ⚑ Fast processing
- ~30-60 seconds per transcript
- Space sleeps after inactivity (saves money)

### Persistent GPU (Upgraded)
- Always-on for instant access
- Higher cost but best user experience

---

## Security Notes

1. **No API Keys Needed:** Everything runs locally
2. **Private Processing:** Data never leaves your Space
3. **Secrets Management:** Use Spaces Secrets (not Variables) for sensitive data
4. **Model Access:** Phi-3 and most models don't require gated access

---

## Next Steps

1. βœ… Upload code to your Space
2. βœ… Select GPU hardware
3. βœ… Wait for first model download (~2-5 min)
4. βœ… Test with a sample transcript
5. πŸŽ‰ Share your Space URL!

---

## Support

- **HuggingFace Spaces Docs:** https://huggingface.co/docs/hub/spaces
- **Transformers Docs:** https://huggingface.co/docs/transformers
- **GPU Pricing:** https://huggingface.co/pricing

---

**Last Updated:** October 2025