Alaaharoun commited on
Commit
9e4d788
Β·
verified Β·
1 Parent(s): dcf548c

Upload 7 files

Browse files
Files changed (7) hide show
  1. .dockerignore +25 -0
  2. Dockerfile +35 -0
  3. README.md +152 -12
  4. app.py +306 -0
  5. config.json +4 -0
  6. docker-compose.yml +18 -0
  7. requirements.txt +5 -0
.dockerignore ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ .git
2
+ .gitignore
3
+ README.md
4
+ .env
5
+ *.log
6
+ __pycache__
7
+ *.pyc
8
+ *.pyo
9
+ *.pyd
10
+ .Python
11
+ env
12
+ pip-log.txt
13
+ pip-delete-this-directory.txt
14
+ .tox
15
+ .coverage
16
+ .coverage.*
17
+ .cache
18
+ nosetests.xml
19
+ coverage.xml
20
+ *.cover
21
+ *.log
22
+ .git
23
+ .mypy_cache
24
+ .pytest_cache
25
+ .hypothesis
Dockerfile ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Use Python 3.9 slim image as base
2
+ FROM python:3.9-slim
3
+
4
+ # Set working directory
5
+ WORKDIR /app
6
+
7
+ # Install system dependencies including FFmpeg
8
+ RUN apt-get update && apt-get install -y \
9
+ ffmpeg \
10
+ curl \
11
+ && rm -rf /var/lib/apt/lists/*
12
+
13
+ # Copy requirements first for better caching
14
+ COPY requirements.txt .
15
+
16
+ # Install Python dependencies
17
+ RUN pip install --no-cache-dir -r requirements.txt
18
+
19
+ # Copy application code
20
+ COPY app.py .
21
+
22
+ # Create a non-root user for security
23
+ RUN useradd --create-home --shell /bin/bash app \
24
+ && chown -R app:app /app
25
+ USER app
26
+
27
+ # Expose port
28
+ EXPOSE 7860
29
+
30
+ # Health check
31
+ HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 \
32
+ CMD curl -f http://localhost:7860/health || exit 1
33
+
34
+ # Run the application
35
+ CMD ["python", "app.py"]
README.md CHANGED
@@ -1,12 +1,152 @@
1
- ---
2
- title: Faster Whisper Api
3
- emoji: 😻
4
- colorFrom: purple
5
- colorTo: gray
6
- sdk: docker
7
- pinned: false
8
- license: apache-2.0
9
- short_description: Alaaharoun/faster-whisper-api
10
- ---
11
-
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: "Faster Whisper API"
3
+ emoji: "🎀"
4
+ colorFrom: "blue"
5
+ colorTo: "purple"
6
+ sdk: "docker"
7
+ sdk_version: "latest"
8
+ app_file: "app.py"
9
+ pinned: false
10
+ ---
11
+
12
+ # 🎀 Faster Whisper API - Fixed Version
13
+
14
+ ## πŸ†• Latest Fixes Applied:
15
+
16
+ ### βœ… Critical Bug Fixes:
17
+ - **Fixed "name 'traceback' is not defined" error** - Removed problematic traceback import
18
+ - **Improved error handling** - Better error messages and logging
19
+ - **Enhanced CORS middleware** - Better browser compatibility
20
+ - **Added detailed logging** - For easier debugging on Hugging Face Spaces
21
+
22
+ ### πŸ”§ Performance Improvements:
23
+ - **Better file validation** - 25MB file size limit
24
+ - **Enhanced VAD support** - Voice Activity Detection with fallback
25
+ - **Improved model loading** - Better error handling during startup
26
+ - **Added health check endpoint** - For monitoring service status
27
+
28
+ ## πŸš€ Quick Start:
29
+
30
+ ### Health Check:
31
+ ```bash
32
+ curl https://alaaharoun-faster-whisper-api.hf.space/health
33
+ ```
34
+
35
+ ### Transcribe Audio (without VAD):
36
+ ```bash
37
+ curl -X POST \
38
+ -F "file=@audio.wav" \
39
+ -F "language=en" \
40
+ -F "task=transcribe" \
41
+ https://alaaharoun-faster-whisper-api.hf.space/transcribe
42
+ ```
43
+
44
+ ### Transcribe Audio (with VAD):
45
+ ```bash
46
+ curl -X POST \
47
+ -F "file=@audio.wav" \
48
+ -F "language=en" \
49
+ -F "task=transcribe" \
50
+ -F "vad_filter=true" \
51
+ -F "vad_parameters=threshold=0.5" \
52
+ https://alaaharoun-faster-whisper-api.hf.space/transcribe
53
+ ```
54
+
55
+ ## πŸ“Š Supported Parameters:
56
+
57
+ - **`file`**: Audio file (WAV, MP3, M4A, FLAC, OGG, WEBM)
58
+ - **`language`**: Language code (optional, e.g., "en", "ar", "es")
59
+ - **`task`**: "transcribe" or "translate" (default: "transcribe")
60
+ - **`vad_filter`**: Enable Voice Activity Detection (default: false)
61
+ - **`vad_parameters`**: VAD parameters (default: "threshold=0.5")
62
+
63
+ ## πŸ”§ Response Format:
64
+
65
+ ### Success Response:
66
+ ```json
67
+ {
68
+ "success": true,
69
+ "text": "Transcribed text here",
70
+ "language": "en",
71
+ "language_probability": 0.95,
72
+ "vad_enabled": false,
73
+ "vad_threshold": null
74
+ }
75
+ ```
76
+
77
+ ### Error Response:
78
+ ```json
79
+ {
80
+ "error": "Error message",
81
+ "error_type": "ExceptionType",
82
+ "success": false
83
+ }
84
+ ```
85
+
86
+ ## πŸ› οΈ Local Development:
87
+
88
+ ```bash
89
+ # Install dependencies
90
+ pip install -r requirements.txt
91
+
92
+ # Run the server
93
+ python app.py
94
+ ```
95
+
96
+ Or with uvicorn:
97
+ ```bash
98
+ uvicorn app:app --host 0.0.0.0 --port 7860
99
+ ```
100
+
101
+ ## πŸ“ Important Notes:
102
+
103
+ - **Maximum file size**: 25MB
104
+ - **Supported formats**: WAV, MP3, M4A, FLAC, OGG, WEBM
105
+ - **VAD support**: Configurable threshold with fallback mechanism
106
+ - **Language detection**: Automatic if not specified
107
+ - **Error handling**: Detailed error messages for debugging
108
+
109
+ ## πŸ” Troubleshooting:
110
+
111
+ ### Common Issues:
112
+
113
+ 1. **500 Internal Server Error**:
114
+ - Check if the model is loaded properly
115
+ - Verify file format and size
116
+ - Check server logs for detailed error messages
117
+
118
+ 2. **VAD Issues**:
119
+ - The service will automatically fallback to standard transcription
120
+ - Check VAD parameters format
121
+
122
+ 3. **File Upload Issues**:
123
+ - Ensure file size is under 25MB
124
+ - Check file format compatibility
125
+
126
+ ## 🌐 Service URLs:
127
+
128
+ - **Main Service**: https://alaaharoun-faster-whisper-api.hf.space
129
+ - **Health Check**: https://alaaharoun-faster-whisper-api.hf.space/health
130
+ - **API Documentation**: https://alaaharoun-faster-whisper-api.hf.space/docs
131
+
132
+ ## πŸ“ˆ Performance:
133
+
134
+ - **Model**: Whisper base model with int8 quantization
135
+ - **Processing**: Optimized for real-time transcription
136
+ - **Memory**: Efficient memory usage for Hugging Face Spaces
137
+ - **Concurrency**: Supports multiple concurrent requests
138
+
139
+ ## πŸ”’ Security:
140
+
141
+ - **CORS**: Configured for cross-origin requests
142
+ - **File Validation**: Strict file type and size validation
143
+ - **Error Handling**: No sensitive information in error messages
144
+ - **Authentication**: Optional API token support (currently disabled)
145
+
146
+ ## πŸ“ž Support:
147
+
148
+ For issues or questions:
149
+ 1. Check the health endpoint first
150
+ 2. Review server logs for detailed error messages
151
+ 3. Test with a simple audio file
152
+ 4. Verify file format and size requirements
app.py ADDED
@@ -0,0 +1,306 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from fastapi import FastAPI, UploadFile, File, Form, HTTPException, Depends
2
+ from fastapi.responses import JSONResponse
3
+ from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
4
+ from fastapi.middleware.cors import CORSMiddleware
5
+ from faster_whisper import WhisperModel
6
+ import shutil
7
+ import os
8
+ import tempfile
9
+ import sys
10
+ from typing import Optional
11
+
12
+ # Create FastAPI app
13
+ app = FastAPI(
14
+ title="Faster Whisper Service",
15
+ description="High-performance speech-to-text service using Faster Whisper",
16
+ version="1.0.0"
17
+ )
18
+
19
+ # Add CORS middleware
20
+ app.add_middleware(
21
+ CORSMiddleware,
22
+ allow_origins=["*"],
23
+ allow_credentials=True,
24
+ allow_methods=["*"],
25
+ allow_headers=["*"],
26
+ )
27
+
28
+ # Security
29
+ security = HTTPBearer(auto_error=False)
30
+
31
+ # Configuration
32
+ API_TOKEN = ""
33
+ REQUIRE_AUTH = False
34
+
35
+ # Global model variable
36
+ model = None
37
+
38
+ def load_model():
39
+ """Load the Whisper model"""
40
+ global model
41
+ try:
42
+ print("πŸ”„ Loading Whisper model...")
43
+ model = WhisperModel("base", compute_type="int8")
44
+ print("βœ… Model loaded successfully")
45
+ return True
46
+ except Exception as e:
47
+ print(f"❌ Error loading model: {e}")
48
+ print(f"Python version: {sys.version}")
49
+ print(f"Current working directory: {os.getcwd()}")
50
+ model = None
51
+ return False
52
+
53
+ def verify_token(credentials: HTTPAuthorizationCredentials = Depends(security)):
54
+ """Verify API token if authentication is required"""
55
+ if REQUIRE_AUTH:
56
+ if not credentials:
57
+ raise HTTPException(
58
+ status_code=401,
59
+ detail="API token required",
60
+ headers={"WWW-Authenticate": "Bearer"},
61
+ )
62
+
63
+ if credentials.credentials != API_TOKEN:
64
+ raise HTTPException(
65
+ status_code=403,
66
+ detail="Invalid API token",
67
+ headers={"WWW-Authenticate": "Bearer"},
68
+ )
69
+
70
+ return credentials
71
+
72
+ @app.on_event("startup")
73
+ async def startup_event():
74
+ """Load model on startup"""
75
+ load_model()
76
+
77
+ @app.get("/")
78
+ async def root():
79
+ """Root endpoint"""
80
+ return {"message": "Faster Whisper Service is running"}
81
+
82
+ @app.get("/health")
83
+ async def health_check(credentials: HTTPAuthorizationCredentials = Depends(verify_token)):
84
+ """Health check endpoint"""
85
+ return {
86
+ "status": "healthy",
87
+ "model_loaded": model is not None,
88
+ "service": "faster-whisper",
89
+ "auth_required": REQUIRE_AUTH,
90
+ "auth_configured": bool(API_TOKEN),
91
+ "vad_support": True,
92
+ "python_version": sys.version
93
+ }
94
+
95
+ @app.post("/transcribe")
96
+ async def transcribe(
97
+ file: UploadFile = File(...),
98
+ language: Optional[str] = Form(None),
99
+ task: Optional[str] = Form("transcribe"),
100
+ vad_filter: Optional[bool] = Form(False),
101
+ vad_parameters: Optional[str] = Form("threshold=0.5"),
102
+ credentials: HTTPAuthorizationCredentials = Depends(verify_token)
103
+ ):
104
+ """
105
+ Transcribe audio file to text with optional VAD support
106
+ """
107
+ temp_path = None
108
+ try:
109
+ print(f"🎡 Starting transcription for file: {file.filename}")
110
+
111
+ # Check if model is loaded
112
+ if model is None:
113
+ print("❌ Model not loaded")
114
+ return JSONResponse(
115
+ status_code=500,
116
+ content={"error": "Model not loaded", "success": False}
117
+ )
118
+
119
+ # Validate file
120
+ if not file.filename:
121
+ print("❌ No file provided")
122
+ return JSONResponse(
123
+ status_code=400,
124
+ content={"error": "No file provided", "success": False}
125
+ )
126
+
127
+ # Validate file size (25MB limit)
128
+ file.file.seek(0, 2)
129
+ file_size = file.file.tell()
130
+ file.file.seek(0)
131
+
132
+ print(f"πŸ“ File size: {file_size} bytes")
133
+
134
+ if file_size > 25 * 1024 * 1024: # 25MB
135
+ print("❌ File too large")
136
+ return JSONResponse(
137
+ status_code=400,
138
+ content={"error": "File too large. Maximum size is 25MB", "success": False}
139
+ )
140
+
141
+ # Create temporary file
142
+ print("πŸ“ Creating temporary file...")
143
+ with tempfile.NamedTemporaryFile(delete=False, suffix='.wav') as temp_file:
144
+ shutil.copyfileobj(file.file, temp_file)
145
+ temp_path = temp_file.name
146
+
147
+ print(f"βœ… Temporary file created: {temp_path}")
148
+
149
+ # Parse VAD parameters
150
+ vad_threshold = 0.5 # default
151
+ if vad_filter and vad_parameters:
152
+ try:
153
+ for param in vad_parameters.split(','):
154
+ if '=' in param:
155
+ key, value = param.strip().split('=')
156
+ if key == 'threshold':
157
+ vad_threshold = float(value)
158
+ except Exception as e:
159
+ print(f"⚠️ Warning: Failed to parse VAD parameters: {e}")
160
+
161
+ # Transcribe audio
162
+ print("🎀 Starting transcription...")
163
+ if vad_filter:
164
+ print(f"πŸ”Š Using VAD with threshold: {vad_threshold}")
165
+ try:
166
+ if language:
167
+ segments, info = model.transcribe(
168
+ temp_path,
169
+ language=language,
170
+ task=task,
171
+ vad_filter=True,
172
+ vad_parameters=f"threshold={vad_threshold}"
173
+ )
174
+ else:
175
+ segments, info = model.transcribe(
176
+ temp_path,
177
+ task=task,
178
+ vad_filter=True,
179
+ vad_parameters=f"threshold={vad_threshold}"
180
+ )
181
+ except Exception as vad_error:
182
+ print(f"⚠️ VAD transcription failed, falling back to standard: {vad_error}")
183
+ if language:
184
+ segments, info = model.transcribe(temp_path, language=language, task=task)
185
+ else:
186
+ segments, info = model.transcribe(temp_path, task=task)
187
+ else:
188
+ if language:
189
+ segments, info = model.transcribe(temp_path, language=language, task=task)
190
+ else:
191
+ segments, info = model.transcribe(temp_path, task=task)
192
+
193
+ # Collect transcription results
194
+ transcription = " ".join([seg.text for seg in segments])
195
+
196
+ print(f"βœ… Transcription completed: {len(transcription)} characters")
197
+ print(f"🌍 Detected language: {info.language} (probability: {info.language_probability:.2f})")
198
+
199
+ # Prepare response
200
+ response = {
201
+ "success": True,
202
+ "text": transcription,
203
+ "language": info.language,
204
+ "language_probability": info.language_probability,
205
+ "vad_enabled": vad_filter,
206
+ "vad_threshold": vad_threshold if vad_filter else None
207
+ }
208
+
209
+ return JSONResponse(content=response)
210
+
211
+ except Exception as e:
212
+ error_msg = str(e)
213
+ error_type = type(e).__name__
214
+ print(f"❌ Transcription error ({error_type}): {error_msg}")
215
+
216
+ return JSONResponse(
217
+ status_code=500,
218
+ content={
219
+ "error": error_msg,
220
+ "error_type": error_type,
221
+ "success": False
222
+ }
223
+ )
224
+ finally:
225
+ # Clean up temporary file
226
+ if temp_path and os.path.exists(temp_path):
227
+ try:
228
+ os.unlink(temp_path)
229
+ print(f"🧹 Cleaned up temporary file: {temp_path}")
230
+ except Exception as e:
231
+ print(f"⚠️ Warning: Failed to delete temp file: {e}")
232
+
233
+ @app.post("/detect-language")
234
+ async def detect_language(
235
+ file: UploadFile = File(...),
236
+ credentials: HTTPAuthorizationCredentials = Depends(verify_token)
237
+ ):
238
+ """
239
+ Detect the language of an audio file
240
+ """
241
+ temp_path = None
242
+ try:
243
+ print(f"🌍 Starting language detection for file: {file.filename}")
244
+
245
+ # Check if model is loaded
246
+ if model is None:
247
+ print("❌ Model not loaded")
248
+ return JSONResponse(
249
+ status_code=500,
250
+ content={"error": "Model not loaded", "success": False}
251
+ )
252
+
253
+ # Validate file
254
+ if not file.filename:
255
+ print("❌ No file provided")
256
+ return JSONResponse(
257
+ status_code=400,
258
+ content={"error": "No file provided", "success": False}
259
+ )
260
+
261
+ # Create temporary file
262
+ print("πŸ“ Creating temporary file...")
263
+ with tempfile.NamedTemporaryFile(delete=False, suffix='.wav') as temp_file:
264
+ shutil.copyfileobj(file.file, temp_file)
265
+ temp_path = temp_file.name
266
+
267
+ print(f"βœ… Temporary file created: {temp_path}")
268
+
269
+ # Detect language
270
+ print("🌍 Detecting language...")
271
+ segments, info = model.transcribe(temp_path)
272
+
273
+ print(f"βœ… Language detected: {info.language} (probability: {info.language_probability:.2f})")
274
+
275
+ return JSONResponse(content={
276
+ "success": True,
277
+ "language": info.language,
278
+ "language_probability": info.language_probability
279
+ })
280
+
281
+ except Exception as e:
282
+ error_msg = str(e)
283
+ error_type = type(e).__name__
284
+ print(f"❌ Language detection error ({error_type}): {error_msg}")
285
+
286
+ return JSONResponse(
287
+ status_code=500,
288
+ content={
289
+ "error": error_msg,
290
+ "error_type": error_type,
291
+ "success": False
292
+ }
293
+ )
294
+ finally:
295
+ # Clean up temporary file
296
+ if temp_path and os.path.exists(temp_path):
297
+ try:
298
+ os.unlink(temp_path)
299
+ print(f"🧹 Cleaned up temporary file: {temp_path}")
300
+ except Exception as e:
301
+ print(f"⚠️ Warning: Failed to delete temp file: {e}")
302
+
303
+ # For Hugging Face Spaces compatibility
304
+ if __name__ == "__main__":
305
+ import uvicorn
306
+ uvicorn.run(app, host="0.0.0.0", port=7860)
config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "sdk": "docker",
3
+ "app_file": "app.py"
4
+ }
docker-compose.yml ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ version: '3.8'
2
+
3
+ services:
4
+ faster-whisper-api:
5
+ build: .
6
+ ports:
7
+ - "7860:7860"
8
+ environment:
9
+ - PYTHONUNBUFFERED=1
10
+ volumes:
11
+ - .:/app
12
+ restart: unless-stopped
13
+ healthcheck:
14
+ test: ["CMD", "curl", "-f", "http://localhost:7860/health"]
15
+ interval: 30s
16
+ timeout: 10s
17
+ retries: 3
18
+ start_period: 40s
requirements.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ fastapi==0.104.1
2
+ uvicorn==0.24.0
3
+ faster-whisper==0.9.0
4
+ python-multipart==0.0.6
5
+ python-multipart