abhisheksan commited on
Commit
e125769
Β·
1 Parent(s): bc9b12e

Remove extra documentation files

Browse files
Files changed (2) hide show
  1. HF_SPACES_GUIDE.md +0 -442
  2. SOLUTION.md +0 -176
HF_SPACES_GUIDE.md DELETED
@@ -1,442 +0,0 @@
1
- # Hugging Face Spaces Deployment Guide
2
-
3
- ## 🎯 Overview
4
-
5
- This guide explains how to deploy and use the Multi-Utility Server on Hugging Face Spaces, including limitations and workarounds.
6
-
7
- ## πŸš€ Quick Deployment
8
-
9
- ### Step 1: Create a Space
10
-
11
- 1. Go to [Hugging Face Spaces](https://huggingface.co/spaces)
12
- 2. Click **"Create new Space"**
13
- 3. Choose:
14
- - **Space name:** Your choice
15
- - **SDK:** Docker
16
- - **Visibility:** Public or Private
17
- 4. Click **"Create Space"**
18
-
19
- ### Step 2: Configure Secrets
20
-
21
- 1. Go to your Space's **Settings** β†’ **Repository secrets**
22
- 2. Add a new secret:
23
- - **Name:** `API_KEYS`
24
- - **Value:** `your-secure-api-key-here` (comma-separated for multiple keys)
25
- 3. Save
26
-
27
- ### Step 3: Push Code
28
-
29
- ```bash
30
- # Clone your space
31
- git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
32
- cd YOUR_SPACE_NAME
33
-
34
- # Add this repository as a remote
35
- git remote add source https://github.com/YOUR_REPO/multiutility-server.git
36
- git pull source main
37
-
38
- # Push to HF Spaces
39
- git push origin main
40
- ```
41
-
42
- Or connect your GitHub repository directly in Space settings.
43
-
44
- ## πŸ“Š Feature Availability on HF Spaces
45
-
46
- | Feature | Status | Endpoint |
47
- |---------|--------|----------|
48
- | **Text Embeddings** | βœ… Works | `POST /api/v1/embeddings/generate` |
49
- | **Audio File Transcription** | βœ… Works | `POST /api/v1/subtitles/transcribe` |
50
- | **YouTube Subtitle Extraction** | ❌ Blocked | `POST /api/v1/subtitles/extract` |
51
- | **Health Checks** | βœ… Works | `GET /health` |
52
-
53
- ## ⚠️ Network Limitations
54
-
55
- ### What's Blocked
56
-
57
- Hugging Face Spaces runs in a sandboxed environment that **blocks external internet access** for security reasons. This means:
58
-
59
- - ❌ Cannot download from YouTube directly
60
- - ❌ Cannot access external APIs
61
- - ❌ Cannot perform web scraping
62
-
63
- ### What Works
64
-
65
- - βœ… File uploads from users
66
- - βœ… AI model inference (Whisper, embeddings)
67
- - βœ… Returning results to users
68
- - βœ… Internal HF services
69
-
70
- ## 🎀 Audio Transcription Workflow
71
-
72
- Since YouTube downloads don't work on HF Spaces, use this workflow instead:
73
-
74
- ### Option 1: User Downloads Audio Locally
75
-
76
- **Step 1:** User downloads audio using [yt-dlp](https://github.com/yt-dlp/yt-dlp)
77
- ```bash
78
- # Install yt-dlp
79
- pip install yt-dlp
80
-
81
- # Download audio from YouTube
82
- yt-dlp -x --audio-format mp3 "https://www.youtube.com/watch?v=VIDEO_ID" -o audio.mp3
83
- ```
84
-
85
- **Step 2:** User uploads audio to your HF Space
86
- ```bash
87
- curl -X POST https://YOUR_SPACE.hf.space/api/v1/subtitles/transcribe \
88
- -H "x-api-key: your-api-key" \
89
- -F "file=@audio.mp3" \
90
- -F "lang=en"
91
- ```
92
-
93
- **Step 3:** Receive transcription
94
- ```json
95
- {
96
- "status": "success",
97
- "language": "en",
98
- "file_name": "audio.mp3",
99
- "transcription": [
100
- "First segment of transcribed text",
101
- "Second segment of transcribed text",
102
- "..."
103
- ]
104
- }
105
- ```
106
-
107
- ### Option 2: Browser-Based Upload
108
-
109
- Create a simple HTML form for users:
110
-
111
- ```html
112
- <!DOCTYPE html>
113
- <html>
114
- <body>
115
- <h2>Audio Transcription</h2>
116
- <form id="uploadForm">
117
- <input type="file" id="audioFile" accept="audio/*" required>
118
- <select id="language">
119
- <option value="en">English</option>
120
- <option value="es">Spanish</option>
121
- <option value="fr">French</option>
122
- </select>
123
- <button type="submit">Transcribe</button>
124
- </form>
125
-
126
- <div id="result"></div>
127
-
128
- <script>
129
- document.getElementById('uploadForm').onsubmit = async (e) => {
130
- e.preventDefault();
131
- const formData = new FormData();
132
- formData.append('file', document.getElementById('audioFile').files[0]);
133
- formData.append('lang', document.getElementById('language').value);
134
-
135
- const response = await fetch('https://YOUR_SPACE.hf.space/api/v1/subtitles/transcribe', {
136
- method: 'POST',
137
- headers: { 'x-api-key': 'your-api-key' },
138
- body: formData
139
- });
140
-
141
- const result = await response.json();
142
- document.getElementById('result').innerHTML =
143
- '<pre>' + JSON.stringify(result, null, 2) + '</pre>';
144
- };
145
- </script>
146
- </body>
147
- </html>
148
- ```
149
-
150
- ## πŸ“ API Usage Examples
151
-
152
- ### Text Embeddings (Works on HF Spaces)
153
-
154
- ```python
155
- import requests
156
-
157
- url = "https://YOUR_SPACE.hf.space/api/v1/embeddings/generate"
158
- headers = {
159
- "Content-Type": "application/json",
160
- "x-api-key": "your-api-key"
161
- }
162
- data = {
163
- "texts": [
164
- "Hello, how are you?",
165
- "Machine learning is fascinating"
166
- ],
167
- "normalize": True
168
- }
169
-
170
- response = requests.post(url, headers=headers, json=data)
171
- print(response.json())
172
- ```
173
-
174
- ### Audio File Transcription (Works on HF Spaces)
175
-
176
- ```python
177
- import requests
178
-
179
- url = "https://YOUR_SPACE.hf.space/api/v1/subtitles/transcribe"
180
- headers = {"x-api-key": "your-api-key"}
181
-
182
- with open("audio.mp3", "rb") as audio_file:
183
- files = {"file": audio_file}
184
- data = {"lang": "en"}
185
- response = requests.post(url, headers=headers, files=files, data=data)
186
-
187
- print(response.json())
188
- ```
189
-
190
- ### YouTube Extraction (Does NOT Work on HF Spaces)
191
-
192
- ```python
193
- # ❌ This will fail on HF Spaces with network error
194
- import requests
195
-
196
- url = "https://YOUR_SPACE.hf.space/api/v1/subtitles/extract"
197
- headers = {
198
- "Content-Type": "application/json",
199
- "x-api-key": "your-api-key"
200
- }
201
- data = {
202
- "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
203
- "lang": "en"
204
- }
205
-
206
- response = requests.post(url, headers=headers, json=data)
207
- # Error: Network connectivity issue
208
- ```
209
-
210
- ## πŸ”§ Configuration
211
-
212
- ### Required Environment Variables
213
-
214
- Set these in HF Spaces **Repository secrets**:
215
-
216
- | Variable | Description | Example |
217
- |----------|-------------|---------|
218
- | `API_KEYS` | Comma-separated API keys | `key1,key2,key3` |
219
-
220
- ### Optional Environment Variables
221
-
222
- | Variable | Description | Default |
223
- |----------|-------------|---------|
224
- | `CORS_ORIGINS` | Allowed origins | `*` |
225
- | `RATE_LIMIT_REQUESTS` | Requests per minute | `100` |
226
- | `LOG_LEVEL` | Logging level | `INFO` |
227
- | `WHISPER_MODEL` | Whisper model size | `base` |
228
- | `EMBEDDING_MODEL` | HuggingFace model | `mixedbread-ai/mxbai-embed-large-v1` |
229
-
230
- ### Whisper Model Options
231
-
232
- | Model | Size | Speed | Accuracy |
233
- |-------|------|-------|----------|
234
- | `tiny` | 39 MB | Fastest | Lowest |
235
- | `base` | 74 MB | Fast | Good |
236
- | `small` | 244 MB | Medium | Better |
237
- | `medium` | 769 MB | Slow | Best |
238
-
239
- **Recommendation for HF Spaces:** Use `base` or `small` for good balance.
240
-
241
- ## πŸ› Troubleshooting
242
-
243
- ### Issue: Build fails with poetry.lock error
244
-
245
- **Error:**
246
- ```
247
- The lock file might not be compatible with the current version of Poetry
248
- ```
249
-
250
- **Solution:**
251
- ```bash
252
- poetry lock
253
- git add poetry.lock
254
- git commit -m "Update poetry.lock"
255
- git push
256
- ```
257
-
258
- ### Issue: "Unauthorized" error
259
-
260
- **Error:**
261
- ```json
262
- {"detail": "Unauthorized: Invalid or missing API key"}
263
- ```
264
-
265
- **Solution:**
266
- - Verify `API_KEYS` secret is set in Space settings
267
- - Include `x-api-key` header in your requests
268
- - Check for typos in the API key
269
-
270
- ### Issue: YouTube extraction fails
271
-
272
- **Error:**
273
- ```json
274
- {
275
- "status": "error",
276
- "message": "Network connectivity issue: Unable to reach YouTube..."
277
- }
278
- ```
279
-
280
- **Solution:**
281
- This is expected on HF Spaces. Use the audio upload endpoint instead:
282
- 1. Download audio locally with yt-dlp
283
- 2. Upload to `/api/v1/subtitles/transcribe`
284
-
285
- ### Issue: Out of memory
286
-
287
- **Error:**
288
- ```
289
- Container killed due to memory limit
290
- ```
291
-
292
- **Solution:**
293
- - Use smaller Whisper model: `WHISPER_MODEL=tiny` or `WHISPER_MODEL=base`
294
- - Process shorter audio files
295
- - Consider upgrading to HF Spaces Pro (more RAM)
296
-
297
- ### Issue: Slow transcription
298
-
299
- **Solution:**
300
- - Use smaller Whisper model (`tiny` or `base`)
301
- - Process shorter audio segments
302
- - Note: HF Spaces free tier uses CPU (no GPU)
303
-
304
- ## πŸ“ˆ Performance Tips
305
-
306
- ### 1. Choose the Right Whisper Model
307
-
308
- ```python
309
- # Fast but less accurate (good for testing)
310
- WHISPER_MODEL=tiny
311
-
312
- # Balanced (recommended for production)
313
- WHISPER_MODEL=base
314
-
315
- # Accurate but slow (only if you need high quality)
316
- WHISPER_MODEL=small
317
- ```
318
-
319
- ### 2. Optimize Audio Files
320
-
321
- ```bash
322
- # Convert to optimal format before upload
323
- ffmpeg -i input.wav -ar 16000 -ac 1 -c:a libmp3lame output.mp3
324
- ```
325
-
326
- ### 3. Rate Limiting
327
-
328
- The server has rate limiting enabled:
329
- - Default: 100 requests per minute
330
- - Adjust via `RATE_LIMIT_REQUESTS` environment variable
331
-
332
- ## πŸ”’ Security Best Practices
333
-
334
- ### 1. Use Strong API Keys
335
-
336
- ```bash
337
- # Generate secure API key
338
- openssl rand -base64 32
339
- ```
340
-
341
- ### 2. Rotate Keys Regularly
342
-
343
- Update `API_KEYS` in Space secrets monthly.
344
-
345
- ### 3. Monitor Usage
346
-
347
- Check Space logs regularly:
348
- - Settings β†’ Logs
349
- - Look for suspicious activity
350
-
351
- ### 4. Use Private Spaces for Sensitive Data
352
-
353
- Consider making your Space private if handling sensitive content.
354
-
355
- ## πŸ’° Cost Considerations
356
-
357
- ### Free Tier
358
-
359
- - βœ… Unlimited inference
360
- - βœ… 16GB RAM
361
- - βœ… 2 vCPU
362
- - ⚠️ CPU-only (no GPU)
363
- - ⚠️ May sleep after inactivity
364
-
365
- ### Spaces Pro ($5/month per Space)
366
-
367
- - βœ… Always-on
368
- - βœ… Better performance
369
- - βœ… More resources
370
- - βœ… Custom domains
371
-
372
- ## πŸŽ“ Best Practices
373
-
374
- ### 1. Document the Workflow
375
-
376
- Add a README to your Space explaining:
377
- - How to download audio locally
378
- - How to use the upload endpoint
379
- - Supported audio formats
380
-
381
- ### 2. Provide Examples
382
-
383
- Include example API calls and code snippets.
384
-
385
- ### 3. Set Expectations
386
-
387
- Clearly state that YouTube direct extraction doesn't work on HF Spaces.
388
-
389
- ### 4. Offer Alternatives
390
-
391
- Suggest self-hosted deployment for users who need YouTube extraction.
392
-
393
- ## πŸš€ Alternative Deployment
394
-
395
- If you need YouTube extraction, consider:
396
-
397
- ### Self-Hosted Options
398
-
399
- 1. **Docker on VPS** (DigitalOcean, Linode)
400
- - Cost: $4-12/month
401
- - Full control
402
- - All features work
403
-
404
- 2. **Cloud Platforms** (AWS, GCP, Azure)
405
- - Scalable
406
- - More expensive
407
- - Enterprise-grade
408
-
409
- 3. **Railway/Render**
410
- - Easy deployment
411
- - $5-20/month
412
- - Good middle ground
413
-
414
- ## πŸ“š Additional Resources
415
-
416
- - [Hugging Face Spaces Documentation](https://huggingface.co/docs/hub/spaces)
417
- - [yt-dlp Documentation](https://github.com/yt-dlp/yt-dlp)
418
- - [Whisper Model Information](https://github.com/openai/whisper)
419
- - [FastAPI Documentation](https://fastapi.tiangolo.com/)
420
-
421
- ## πŸ†˜ Support
422
-
423
- For issues:
424
- 1. Check Space logs (Settings β†’ Logs)
425
- 2. Verify environment variables are set
426
- 3. Test with simple requests first
427
- 4. Check API key is correct
428
- 5. Review this guide for common issues
429
-
430
- ## βœ… Success Checklist
431
-
432
- After deployment, verify:
433
-
434
- - [ ] Space builds successfully
435
- - [ ] Health check works: `GET /health`
436
- - [ ] Embeddings endpoint works
437
- - [ ] Audio upload endpoint works
438
- - [ ] API key authentication works
439
- - [ ] Rate limiting is configured
440
- - [ ] Documentation is clear for users
441
-
442
- **Your HF Space is ready to use! πŸŽ‰**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SOLUTION.md DELETED
@@ -1,176 +0,0 @@
1
- # HF Spaces Solution - Simple Summary
2
-
3
- ## ❌ Problem
4
- YouTube subtitle extraction fails on Hugging Face Spaces with network error:
5
- ```
6
- Failed to resolve 'www.youtube.com' - No address associated with hostname
7
- ```
8
-
9
- **Why?** HF Spaces blocks all external internet access for security.
10
-
11
- ---
12
-
13
- ## βœ… Solution: Audio File Upload
14
-
15
- Instead of downloading from YouTube, users upload audio files directly.
16
-
17
- ### What Works on HF Spaces
18
-
19
- | Endpoint | Status | Description |
20
- |----------|--------|-------------|
21
- | `POST /api/v1/subtitles/transcribe` | βœ… Works | Upload audio, get transcription |
22
- | `POST /api/v1/embeddings/generate` | βœ… Works | Generate text embeddings |
23
- | `POST /api/v1/subtitles/extract` | ❌ Blocked | YouTube downloads (self-hosted only) |
24
-
25
- ---
26
-
27
- ## πŸš€ Usage
28
-
29
- ### Step 1: Download Audio Locally
30
- ```bash
31
- # User downloads audio on their machine
32
- pip install yt-dlp
33
- yt-dlp -x --audio-format mp3 "https://youtube.com/watch?v=VIDEO_ID" -o audio.mp3
34
- ```
35
-
36
- ### Step 2: Upload to HF Spaces
37
- ```bash
38
- curl -X POST https://your-space.hf.space/api/v1/subtitles/transcribe \
39
- -H "x-api-key: your-key" \
40
- -F "file=@audio.mp3" \
41
- -F "lang=en"
42
- ```
43
-
44
- ### Step 3: Get Transcription
45
- ```json
46
- {
47
- "status": "success",
48
- "language": "en",
49
- "file_name": "audio.mp3",
50
- "transcription": [
51
- "Transcribed text segment 1",
52
- "Transcribed text segment 2",
53
- "..."
54
- ]
55
- }
56
- ```
57
-
58
- ---
59
-
60
- ## πŸ“ Python Example
61
-
62
- ```python
63
- import requests
64
-
65
- # Upload audio file for transcription
66
- url = "https://your-space.hf.space/api/v1/subtitles/transcribe"
67
- headers = {"x-api-key": "your-api-key"}
68
-
69
- with open("audio.mp3", "rb") as f:
70
- files = {"file": f}
71
- data = {"lang": "en"}
72
- response = requests.post(url, headers=headers, files=files, data=data)
73
-
74
- result = response.json()
75
- print(result["transcription"])
76
- ```
77
-
78
- ---
79
-
80
- ## 🎯 Key Points
81
-
82
- 1. **YouTube extraction doesn't work on HF Spaces** - this is by design, not a bug
83
- 2. **Audio upload DOES work** - users download audio locally, then upload
84
- 3. **Embeddings work fine** - no network access needed
85
- 4. **For YouTube extraction** - use self-hosted deployment (Docker, VPS, cloud)
86
-
87
- ---
88
-
89
- ## πŸ“¦ Supported Audio Formats
90
-
91
- - βœ… MP3
92
- - βœ… WAV
93
- - βœ… M4A
94
- - βœ… FLAC
95
- - βœ… OGG
96
- - βœ… WEBM
97
-
98
- ---
99
-
100
- ## πŸ”§ HF Spaces Configuration
101
-
102
- ### Required Secret
103
- ```
104
- API_KEYS=your-secure-api-key
105
- ```
106
-
107
- ### Optional Variables
108
- ```
109
- WHISPER_MODEL=base # tiny, base, small, medium
110
- LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR
111
- RATE_LIMIT_REQUESTS=100 # Requests per minute
112
- ```
113
-
114
- ---
115
-
116
- ## πŸ› Common Issues
117
-
118
- ### "Unauthorized" Error
119
- **Fix:** Add `API_KEYS` secret in HF Spaces settings
120
-
121
- ### "Network connectivity issue"
122
- **Fix:** This is expected - use audio upload endpoint instead
123
-
124
- ### "Out of memory"
125
- **Fix:** Use smaller Whisper model: `WHISPER_MODEL=tiny` or `WHISPER_MODEL=base`
126
-
127
- ---
128
-
129
- ## πŸŽ“ Recommended Workflow
130
-
131
- ### For End Users:
132
- 1. Download YouTube audio using yt-dlp
133
- 2. Upload audio file to your HF Space
134
- 3. Receive transcription
135
-
136
- ### For Developers:
137
- 1. Deploy on HF Spaces (free hosting)
138
- 2. Document the two-step process
139
- 3. Or deploy self-hosted for direct YouTube access
140
-
141
- ---
142
-
143
- ## πŸš€ Self-Hosted Alternative
144
-
145
- If you need direct YouTube extraction:
146
-
147
- ```bash
148
- # Deploy with Docker
149
- docker build -t multiutility-server .
150
- docker run -p 7860:7860 -e API_KEYS=your-key multiutility-server
151
- ```
152
-
153
- Then YouTube extraction works natively (no upload needed).
154
-
155
- ---
156
-
157
- ## βœ… Summary
158
-
159
- **What Changed:**
160
- - βœ… Added audio file upload endpoint (`/transcribe`)
161
- - βœ… Works perfectly on HF Spaces
162
- - βœ… No external dependencies needed
163
- - ❌ Removed proxy service (too complex)
164
- - ❌ YouTube endpoint doesn't work on HF Spaces (as expected)
165
-
166
- **Recommendation:**
167
- - Deploy on HF Spaces for free hosting
168
- - Use audio upload for transcription
169
- - Document the workflow for users
170
- - Or use self-hosted for YouTube extraction
171
-
172
- **Current Status:** βœ… Ready to deploy and use!
173
-
174
- ---
175
-
176
- For detailed guide, see: `HF_SPACES_GUIDE.md`