pgits Claude commited on
Commit
1434d57
Β·
1 Parent(s): 1015eb4

fix: Update client to use Gradio API endpoint format

Browse files

- Changed from /api/transcribe to /api/simple_transcribe endpoint
- Updated request format to use Gradio data array structure
- Fixed response parsing to handle JSON string returned by Gradio API
- Improved error handling and debug messages

πŸ€– Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

Files changed (5) hide show
  1. CLAUDE.md +105 -0
  2. app.py +40 -41
  3. client-stt/v2-audio-client.js +628 -0
  4. client-stt/v2-index.html +243 -0
  5. test_client.py +186 -0
CLAUDE.md ADDED
@@ -0,0 +1,105 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CLAUDE.md
2
+
3
+ This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4
+
5
+ ## Project Overview
6
+
7
+ STT GPU Service is a GPU-accelerated Speech-to-Text microservice built with Gradio and OpenAI Whisper. It serves as a replacement for Streamlit-based solutions to eliminate iframe communication barriers in WebRTC applications, specifically designed for VoiceCalendar integration.
8
+
9
+ ## Architecture
10
+
11
+ The service is built around a single-file architecture with these core components:
12
+
13
+ - **STTService class** (`app.py:45-221`): Core transcription engine with Whisper model management
14
+ - **Gradio interface** (`app.py:284-486`): Web UI with three main tabs:
15
+ - File upload transcription
16
+ - WebRTC memory transcription (base64 audio processing)
17
+ - API integration documentation
18
+ - **WebRTC compatibility layer**: Handles base64 audio conversion and validation
19
+
20
+ ## Key Technologies
21
+
22
+ - **PyTorch + CUDA**: GPU acceleration for Whisper models
23
+ - **OpenAI Whisper**: Speech-to-text engine (tiny to large models)
24
+ - **Gradio 4.44.1**: Web interface and API endpoints
25
+ - **pydub**: Audio format conversion (WebM β†’ WAV)
26
+ - **debugpy**: Remote debugging support for HuggingFace Spaces
27
+
28
+ ## Development Commands
29
+
30
+ ### Running the Service
31
+ ```bash
32
+ python3 app.py
33
+ ```
34
+ Service runs on port 7860 by default.
35
+
36
+ ### HuggingFace Spaces Deployment
37
+ The service is designed for HuggingFace Spaces with GPU support:
38
+ - Hardware requirement: A10G Small ($0.40/hour) or better
39
+ - Uses Git LFS for model files (configured in `.gitattributes`)
40
+
41
+ ### Environment Variables
42
+ - `WHISPER_MODEL_SIZE`: Model size (default: "base")
43
+ - `DEFAULT_LANGUAGE`: Default transcription language (default: "en")
44
+ - `DEBUG_MODE`: Enable debugpy remote debugging (default: "false")
45
+
46
+ ## API Endpoints (via Gradio)
47
+
48
+ ### Core Transcription
49
+ - **Endpoint**: `/api/transcribe` (POST)
50
+ - **Function**: `gradio_transcribe_memory()` (`app.py:250-264`)
51
+ - **Input**: JSON with `audio_base64`, `language`, `model_size`
52
+ - **Output**: Transcription text
53
+
54
+ ### Health Check
55
+ - **Function**: `get_system_status()` (`app.py:266-281`)
56
+ - **Returns**: GPU status, model info, device information
57
+
58
+ ## WebRTC Integration Pattern
59
+
60
+ The service processes WebRTC audio through this pipeline:
61
+ 1. JavaScript captures audio as WebM/Opus blobs
62
+ 2. Convert to base64 encoding
63
+ 3. POST to `/api/transcribe` endpoint
64
+ 4. Service decodes base64 β†’ writes temp file β†’ converts to WAV β†’ transcribes with Whisper
65
+ 5. Returns transcription text
66
+
67
+ Key implementation details:
68
+ - Handles data URL format: `data:audio/webm;codecs=opus;base64,<data>`
69
+ - Validates base64 encoding before processing
70
+ - Automatic padding for base64 alignment
71
+ - Temporary file cleanup after processing
72
+
73
+ ## Model Management
74
+
75
+ Models are loaded dynamically based on requests:
76
+ - **Model sizes**: tiny, base, small, medium, large
77
+ - **GPU optimization**: Uses fp16 precision when CUDA available
78
+ - **Memory requirements**: See README.md performance table
79
+ - **Language support**: 10+ languages plus auto-detection
80
+
81
+ ## Error Handling
82
+
83
+ The service includes comprehensive error handling:
84
+ - Base64 validation and cleanup
85
+ - Audio format conversion errors
86
+ - Whisper model loading failures
87
+ - Temporary file cleanup on errors
88
+ - Detailed logging for debugging
89
+
90
+ ## Debugging
91
+
92
+ Remote debugging is available when `DEBUG_MODE=true`:
93
+ - debugpy listens on port 5679
94
+ - Compatible with VS Code/Cursor remote attach
95
+ - Useful for HuggingFace Spaces development
96
+
97
+ ## Integration Notes
98
+
99
+ This service specifically eliminates Streamlit iframe communication issues by:
100
+ - Providing direct HTTP API endpoints instead of component bridges
101
+ - Supporting native WebRTC audio processing
102
+ - Removing postMessage/iframe complexity
103
+ - Enabling direct JavaScript fetch() calls
104
+
105
+ The architecture supports the "unmute.sh methodology" referenced in the user's requirements.
app.py CHANGED
@@ -488,57 +488,56 @@ with gr.Blocks(
488
  refresh_btn = gr.Button("πŸ”„ Refresh System Status", variant="secondary")
489
  refresh_btn.click(fn=lambda: get_system_status(), outputs=status_md)
490
 
491
- # Custom API models for direct endpoints
492
- class TranscribeRequest(BaseModel):
493
- audio_base64: str
494
- language: str = "en"
495
- model_size: str = "base"
496
-
497
- class TranscribeResponse(BaseModel):
498
- transcription: str
499
- processing_time: float
500
- device: str
501
- model_size: str
502
-
503
- # Get the FastAPI app that Gradio creates
504
- app = demo.app
505
-
506
- # Add custom endpoint
507
- @app.post("/api/transcribe", response_model=TranscribeResponse)
508
- async def api_transcribe(request: TranscribeRequest):
509
- """Direct API endpoint bypassing Gradio queue system"""
510
  try:
511
  import time
 
512
  start_time = time.time()
513
 
514
- # Use the same function but return structured response
515
- result_text = gradio_transcribe_memory(
516
- request.audio_base64,
517
- request.language,
518
- request.model_size
519
- )
520
 
521
  processing_time = time.time() - start_time
522
 
523
- return TranscribeResponse(
524
- transcription=result_text,
525
- processing_time=processing_time,
526
- device=stt_service.device,
527
- model_size=stt_service.model_size
528
- )
 
 
 
 
529
 
530
  except Exception as e:
531
- raise HTTPException(status_code=500, detail=str(e))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
532
 
533
- @app.get("/api/health")
534
- async def api_health():
535
- """Health check endpoint"""
536
- return {
537
- "status": "healthy",
538
- "device": stt_service.device,
539
- "model_size": stt_service.model_size,
540
- "gpu_available": torch.cuda.is_available()
541
- }
542
 
543
  # Launch interface
544
  if __name__ == "__main__":
 
488
  refresh_btn = gr.Button("πŸ”„ Refresh System Status", variant="secondary")
489
  refresh_btn.click(fn=lambda: get_system_status(), outputs=status_md)
490
 
491
+ # Create a simpler API endpoint using Gradio's built-in API system
492
+ def api_transcribe_simple(audio_base64, language="en", model_size="base"):
493
+ """Simple API function that returns JSON string instead of complex objects"""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
494
  try:
495
  import time
496
+ import json
497
  start_time = time.time()
498
 
499
+ # Use the same function but return structured response as JSON string
500
+ result_text = gradio_transcribe_memory(audio_base64, language, model_size)
 
 
 
 
501
 
502
  processing_time = time.time() - start_time
503
 
504
+ # Return JSON string that can be parsed by client
505
+ response = {
506
+ "transcription": result_text,
507
+ "processing_time": processing_time,
508
+ "device": stt_service.device,
509
+ "model_size": stt_service.model_size,
510
+ "success": True
511
+ }
512
+
513
+ return json.dumps(response)
514
 
515
  except Exception as e:
516
+ import json
517
+ error_response = {
518
+ "error": str(e),
519
+ "success": False
520
+ }
521
+ return json.dumps(error_response)
522
+
523
+ # Add this as a hidden Gradio interface for API access
524
+ with gr.Blocks() as api_demo:
525
+ api_input = gr.Textbox(visible=False)
526
+ api_language = gr.Textbox(visible=False)
527
+ api_model = gr.Textbox(visible=False)
528
+ api_output = gr.Textbox(visible=False)
529
+
530
+ # This creates an API endpoint at /api/predict with fn_index for this function
531
+ api_button = gr.Button(visible=False)
532
+ api_button.click(
533
+ fn=api_transcribe_simple,
534
+ inputs=[api_input, api_language, api_model],
535
+ outputs=[api_output],
536
+ api_name="simple_transcribe" # This creates /api/simple_transcribe endpoint
537
+ )
538
 
539
+ # Mount the API interface
540
+ demo = gr.TabbedInterface([demo, api_demo], ["Main Interface", "API"], tab_names=["Interface", "API"])
 
 
 
 
 
 
 
541
 
542
  # Launch interface
543
  if __name__ == "__main__":
client-stt/v2-audio-client.js ADDED
@@ -0,0 +1,628 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /**
2
+ * STT GPU Service v2 Audio Client
3
+ * HTTP API client for pgits-stt-gpu-service-v2
4
+ */
5
+
6
+ class STTv2Client {
7
+ constructor() {
8
+ this.isRecording = false;
9
+ this.mediaRecorder = null;
10
+ this.audioChunks = [];
11
+ this.serverUrl = '';
12
+ this.resultCounter = 0;
13
+
14
+ this.initializeElements();
15
+ this.attachEventListeners();
16
+ this.updateDebugInfo('Client initialized');
17
+ }
18
+
19
+ generateSessionHash() {
20
+ return Math.random().toString(36).substring(2, 15) + Math.random().toString(36).substring(2, 15);
21
+ }
22
+
23
+ async listenForQueueResult(queueResponse, startTime, sessionHash) {
24
+ return new Promise((resolve, reject) => {
25
+ // Try different WebSocket URLs for Gradio queue
26
+ const possibleWsUrls = [
27
+ this.serverUrl.replace('https://', 'wss://').replace('http://', 'ws://') + '/queue/data',
28
+ this.serverUrl.replace('https://', 'wss://').replace('http://', 'ws://') + '/queue/join',
29
+ this.serverUrl.replace('https://', 'wss://').replace('http://', 'ws://') + '/ws'
30
+ ];
31
+
32
+ let wsUrl = possibleWsUrls[0];
33
+ this.updateDebugInfo(`Attempting WebSocket connection to: ${wsUrl}`);
34
+
35
+ const ws = new WebSocket(wsUrl);
36
+
37
+ const timeout = setTimeout(() => {
38
+ ws.close();
39
+ reject(new Error('Queue timeout after 30 seconds'));
40
+ }, 30000);
41
+
42
+ ws.onopen = () => {
43
+ this.updateDebugInfo('WebSocket connected successfully');
44
+ // Send event_id to subscribe to updates
45
+ if (queueResponse.event_id) {
46
+ ws.send(JSON.stringify({
47
+ event_id: queueResponse.event_id
48
+ }));
49
+ this.updateDebugInfo(`Sent event_id: ${queueResponse.event_id}`);
50
+ }
51
+ };
52
+
53
+ ws.onmessage = (event) => {
54
+ try {
55
+ const data = JSON.parse(event.data);
56
+ this.updateDebugInfo(`Queue message: ${JSON.stringify(data)}`);
57
+
58
+ if (data.msg === 'process_completed' && data.output && data.output.data) {
59
+ clearTimeout(timeout);
60
+ ws.close();
61
+ resolve(data.output.data[0]); // First output element
62
+ } else if (data.msg === 'process_starts') {
63
+ this.updateStatus('Processing on server...', 'processing');
64
+ } else if (data.msg === 'queue_full') {
65
+ clearTimeout(timeout);
66
+ ws.close();
67
+ reject(new Error('Server queue is full'));
68
+ } else if (data.msg === 'send_data') {
69
+ // Some Gradio versions expect data to be sent after connection
70
+ this.updateDebugInfo('Server requesting data');
71
+ }
72
+ } catch (e) {
73
+ this.updateDebugInfo(`WebSocket parse error: ${e.message}`);
74
+ this.updateDebugInfo(`Raw message: ${event.data}`);
75
+ }
76
+ };
77
+
78
+ ws.onerror = (error) => {
79
+ this.updateDebugInfo(`WebSocket error details: ${JSON.stringify(error)}`);
80
+ this.updateDebugInfo(`WebSocket readyState: ${ws.readyState}`);
81
+ clearTimeout(timeout);
82
+ // Try polling as fallback instead of failing
83
+ this.updateDebugInfo('WebSocket failed, trying polling fallback...');
84
+ this.pollForResult(queueResponse.event_id, startTime, sessionHash).then(resolve).catch(reject);
85
+ };
86
+
87
+ ws.onclose = (event) => {
88
+ this.updateDebugInfo(`WebSocket closed: code=${event.code}, reason=${event.reason}`);
89
+ clearTimeout(timeout);
90
+ };
91
+ });
92
+ }
93
+
94
+ async pollForResult(eventId, startTime, sessionHash = null) {
95
+ // Fast polling - transcription takes ~0.5s, so poll immediately and frequently
96
+ this.updateDebugInfo(`Starting FAST polling for event: ${eventId}, session: ${sessionHash}`);
97
+ const maxAttempts = 20; // 20 attempts over ~10 seconds
98
+
99
+ const actualSessionHash = sessionHash || this.generateSessionHash();
100
+ const pollEndpoints = [
101
+ `/queue/data?event_id=${eventId}&session_hash=${actualSessionHash}`,
102
+ `/queue/status?event_id=${eventId}`,
103
+ ];
104
+
105
+ this.updateDebugInfo(`Using session_hash: ${actualSessionHash}`);
106
+
107
+ // First attempt immediately (no delay)
108
+ for (let attempt = 0; attempt < maxAttempts; attempt++) {
109
+ const endpoint = pollEndpoints[attempt % pollEndpoints.length];
110
+ this.updateDebugInfo(`FAST poll attempt ${attempt + 1} trying: ${endpoint}`);
111
+
112
+ try {
113
+ const response = await fetch(`${this.serverUrl}${endpoint}`);
114
+ if (response.ok) {
115
+ const responseText = await response.text();
116
+ this.updateDebugInfo(`Poll attempt ${attempt + 1} (${endpoint}): ${responseText.substring(0, 300)}`);
117
+
118
+ // Handle SSE format for queue/data endpoints
119
+ if (endpoint.includes('/queue/data') && responseText.includes('data: ')) {
120
+ const lines = responseText.split('\n');
121
+ for (const line of lines) {
122
+ if (line.startsWith('data: ')) {
123
+ try {
124
+ const data = JSON.parse(line.substring(6));
125
+ this.updateDebugInfo(`SSE data: ${JSON.stringify(data)}`);
126
+
127
+ if (data.msg === 'process_completed' && data.output && data.output.data) {
128
+ return data.output.data[0];
129
+ } else if (data.msg === 'process_starts') {
130
+ this.updateStatus('Processing on server...', 'processing');
131
+ }
132
+ } catch (parseError) {
133
+ this.updateDebugInfo(`SSE parse error: ${parseError.message}`);
134
+ }
135
+ }
136
+ }
137
+ } else {
138
+ // Try parsing as regular JSON
139
+ try {
140
+ const status = JSON.parse(responseText);
141
+ if (status.msg === 'process_completed' && status.output && status.output.data) {
142
+ return status.output.data[0];
143
+ } else if (status.msg === 'process_starts') {
144
+ this.updateStatus('Processing on server...', 'processing');
145
+ }
146
+ } catch (e) {
147
+ // Not JSON, that's ok
148
+ }
149
+ }
150
+ } else if (response.status === 404) {
151
+ this.updateDebugInfo(`Endpoint ${endpoint} not found, trying next...`);
152
+ } else {
153
+ this.updateDebugInfo(`Endpoint ${endpoint} returned ${response.status}`);
154
+ }
155
+ } catch (e) {
156
+ this.updateDebugInfo(`Poll error on ${endpoint}: ${e.message}`);
157
+ }
158
+
159
+ // Short delay - transcription is fast (~0.5s), need to catch result quickly
160
+ if (attempt === 0) {
161
+ // No delay for first attempt
162
+ } else if (attempt < 5) {
163
+ await new Promise(resolve => setTimeout(resolve, 200)); // 200ms for first few
164
+ } else {
165
+ await new Promise(resolve => setTimeout(resolve, 500)); // 500ms after that
166
+ }
167
+ }
168
+
169
+ throw new Error('Fast polling timeout - no result after 10 seconds');
170
+ }
171
+
172
+
173
+ initializeElements() {
174
+ this.recordButton = document.getElementById('recordButton');
175
+ this.testHealthButton = document.getElementById('testHealthButton');
176
+ this.clearButton = document.getElementById('clearButton');
177
+ this.uploadFileButton = document.getElementById('uploadFileButton');
178
+ this.testBase64Button = document.getElementById('testBase64Button');
179
+ this.serverUrlInput = document.getElementById('serverUrl');
180
+ this.languageSelect = document.getElementById('language');
181
+ this.modelSizeSelect = document.getElementById('modelSize');
182
+ this.audioFileInput = document.getElementById('audioFileInput');
183
+ this.statusDiv = document.getElementById('status');
184
+ this.transcriptionDiv = document.getElementById('transcription');
185
+ this.debugInfoDiv = document.getElementById('debugInfo');
186
+ }
187
+
188
+ attachEventListeners() {
189
+ this.recordButton.addEventListener('click', () => this.toggleRecording());
190
+ this.testHealthButton.addEventListener('click', () => this.testHealth());
191
+ this.clearButton.addEventListener('click', () => this.clearResults());
192
+ this.uploadFileButton.addEventListener('click', () => this.uploadFile());
193
+ this.testBase64Button.addEventListener('click', () => this.testBase64Data());
194
+ this.serverUrlInput.addEventListener('input', () => this.updateServerUrl());
195
+
196
+ // Initialize server URL
197
+ this.updateServerUrl();
198
+ }
199
+
200
+ updateServerUrl() {
201
+ this.serverUrl = this.serverUrlInput.value.trim();
202
+ this.updateDebugInfo(`Server URL updated: ${this.serverUrl}`);
203
+ }
204
+
205
+ updateStatus(message, className) {
206
+ this.statusDiv.textContent = message;
207
+ this.statusDiv.className = `status ${className}`;
208
+ this.updateDebugInfo(`Status: ${message}`);
209
+ }
210
+
211
+ updateDebugInfo(message) {
212
+ const timestamp = new Date().toLocaleTimeString();
213
+ this.debugInfoDiv.innerHTML += `[${timestamp}] ${message}<br>`;
214
+ this.debugInfoDiv.scrollTop = this.debugInfoDiv.scrollHeight;
215
+ }
216
+
217
+ addResult(text, metadata = {}) {
218
+ this.resultCounter++;
219
+ const resultDiv = document.createElement('div');
220
+ resultDiv.className = 'result-item';
221
+
222
+ const metaDiv = document.createElement('div');
223
+ metaDiv.className = 'result-meta';
224
+ metaDiv.textContent = `#${this.resultCounter} | ${new Date().toLocaleTimeString()} | ${metadata.processingTime || 'N/A'} | ${metadata.device || 'Unknown'}`;
225
+
226
+ const textDiv = document.createElement('div');
227
+ textDiv.className = 'result-text';
228
+ textDiv.textContent = text;
229
+
230
+ resultDiv.appendChild(metaDiv);
231
+ resultDiv.appendChild(textDiv);
232
+
233
+ this.transcriptionDiv.appendChild(resultDiv);
234
+ this.transcriptionDiv.scrollTop = this.transcriptionDiv.scrollHeight;
235
+ }
236
+
237
+ clearResults() {
238
+ this.transcriptionDiv.innerHTML = 'Results cleared...';
239
+ this.resultCounter = 0;
240
+ this.updateDebugInfo('Results cleared');
241
+ }
242
+
243
+ async testHealth() {
244
+ this.updateStatus('Testing health...', 'processing');
245
+ this.updateDebugInfo('Testing service connectivity...');
246
+
247
+ try {
248
+ // Test the main service endpoint
249
+ const response = await fetch(`${this.serverUrl}/`, {
250
+ method: 'HEAD',
251
+ timeout: 10000
252
+ });
253
+
254
+ if (response.ok) {
255
+ this.updateStatus('Service is live and responding', 'ready');
256
+ this.addResult('βœ… Service health check passed - ready for transcription', { device: 'Web' });
257
+ this.updateDebugInfo(`Service responding: HTTP ${response.status}`);
258
+ } else {
259
+ throw new Error(`Service check failed: ${response.status}`);
260
+ }
261
+ } catch (error) {
262
+ this.updateStatus(`Health check failed: ${error.message}`, 'error');
263
+ this.addResult(`❌ Service unreachable: ${error.message}`, { device: 'Error' });
264
+ this.updateDebugInfo(`Health error: ${error.message}`);
265
+ }
266
+ }
267
+
268
+ async toggleRecording() {
269
+ if (!this.isRecording) {
270
+ await this.startRecording();
271
+ } else {
272
+ await this.stopRecording();
273
+ }
274
+ }
275
+
276
+ async startRecording() {
277
+ try {
278
+ const stream = await navigator.mediaDevices.getUserMedia({
279
+ audio: {
280
+ sampleRate: 44100,
281
+ channelCount: 1,
282
+ echoCancellation: true,
283
+ noiseSuppression: true
284
+ }
285
+ });
286
+
287
+ this.mediaRecorder = new MediaRecorder(stream, {
288
+ mimeType: 'audio/webm;codecs=opus'
289
+ });
290
+
291
+ this.audioChunks = [];
292
+
293
+ this.mediaRecorder.ondataavailable = (event) => {
294
+ if (event.data.size > 0) {
295
+ this.audioChunks.push(event.data);
296
+ }
297
+ };
298
+
299
+ this.mediaRecorder.onstop = () => {
300
+ this.processRecording();
301
+ };
302
+
303
+ this.mediaRecorder.start();
304
+ this.isRecording = true;
305
+ this.recordButton.textContent = 'Stop Recording';
306
+ this.recordButton.className = 'stop btn';
307
+ this.updateStatus('Recording... (speak now)', 'recording');
308
+ this.updateDebugInfo('Recording started');
309
+
310
+ } catch (error) {
311
+ this.updateStatus(`Recording failed: ${error.message}`, 'error');
312
+ this.updateDebugInfo(`Recording error: ${error.message}`);
313
+ }
314
+ }
315
+
316
+ async stopRecording() {
317
+ if (this.mediaRecorder && this.isRecording) {
318
+ this.mediaRecorder.stop();
319
+ this.isRecording = false;
320
+ this.recordButton.textContent = 'Start Recording';
321
+ this.recordButton.className = 'start btn';
322
+ this.updateStatus('Processing recording...', 'processing');
323
+ this.updateDebugInfo('Recording stopped, processing...');
324
+
325
+ // Stop all tracks
326
+ this.mediaRecorder.stream.getTracks().forEach(track => track.stop());
327
+ }
328
+ }
329
+
330
+ async processRecording() {
331
+ if (this.audioChunks.length === 0) {
332
+ this.updateStatus('No audio recorded', 'error');
333
+ return;
334
+ }
335
+
336
+ try {
337
+ // Create blob from chunks
338
+ const audioBlob = new Blob(this.audioChunks, { type: 'audio/webm;codecs=opus' });
339
+ this.updateDebugInfo(`Audio blob created: ${audioBlob.size} bytes`);
340
+
341
+ // Convert to base64
342
+ const audioBase64 = await this.blobToBase64(audioBlob);
343
+ this.updateDebugInfo(`Base64 length: ${audioBase64.length} characters`);
344
+
345
+ // Send to transcription service
346
+ await this.transcribeAudio(audioBase64);
347
+
348
+ } catch (error) {
349
+ this.updateStatus(`Processing failed: ${error.message}`, 'error');
350
+ this.updateDebugInfo(`Processing error: ${error.message}`);
351
+ }
352
+ }
353
+
354
+ async blobToBase64(blob) {
355
+ return new Promise((resolve, reject) => {
356
+ const reader = new FileReader();
357
+ reader.onloadend = () => {
358
+ const result = reader.result;
359
+ // Extract base64 part from data URL
360
+ const base64 = result.split(',')[1];
361
+ resolve(base64);
362
+ };
363
+ reader.onerror = reject;
364
+ reader.readAsDataURL(blob);
365
+ });
366
+ }
367
+
368
+ async transcribeAudio(audioBase64, metadata = {}) {
369
+ // Gradio API format: data array with inputs in order [audio_base64, language, model_size]
370
+ // fn_index: 1 corresponds to the second function (transcribe_memory_btn.click)
371
+ const sessionHash = this.generateSessionHash();
372
+ const payload = {
373
+ data: [
374
+ audioBase64,
375
+ this.languageSelect.value,
376
+ this.modelSizeSelect.value
377
+ ],
378
+ fn_index: 1,
379
+ session_hash: sessionHash
380
+ };
381
+
382
+ this.updateDebugInfo(`Sending Gradio API request: fn_index=1, data=[base64(${audioBase64.length} chars), "${this.languageSelect.value}", "${this.modelSizeSelect.value}"]`);
383
+
384
+ try {
385
+ const startTime = Date.now();
386
+
387
+ // Try the Gradio API endpoint - uses Gradio's built-in API system
388
+ this.updateDebugInfo('Trying Gradio /api/simple_transcribe endpoint...');
389
+ let response = await fetch(`${this.serverUrl}/api/simple_transcribe`, {
390
+ method: 'POST',
391
+ headers: {
392
+ 'Content-Type': 'application/json',
393
+ },
394
+ body: JSON.stringify({
395
+ data: [audioBase64, this.languageSelect.value, this.modelSizeSelect.value]
396
+ })
397
+ });
398
+
399
+ // Check if direct API worked - NO fallback to broken queue system
400
+ if (!response.ok) {
401
+ const errorText = await response.text();
402
+ this.updateDebugInfo(`Gradio /api/simple_transcribe failed: ${response.status} - ${errorText}`);
403
+ throw new Error(`Direct API failed: ${response.status} - ${errorText}`);
404
+ }
405
+
406
+ const responseData = await response.json();
407
+ this.updateDebugInfo(`Response: ${JSON.stringify(responseData)}`);
408
+
409
+ let result;
410
+
411
+ // Check if this is Gradio API response
412
+ if (responseData.data && Array.isArray(responseData.data) && responseData.data.length > 0) {
413
+ // Gradio API response - data[0] contains the JSON string from our function
414
+ const jsonResult = responseData.data[0];
415
+ this.updateDebugInfo(`Got Gradio API response: ${jsonResult.substring(0, 200)}`);
416
+
417
+ try {
418
+ const parsedResult = JSON.parse(jsonResult);
419
+ if (parsedResult.success) {
420
+ result = parsedResult.transcription;
421
+ this.updateDebugInfo(`Parsed transcription: ${result.substring(0, 100)}`);
422
+ } else {
423
+ throw new Error(parsedResult.error || 'Unknown transcription error');
424
+ }
425
+ } catch (parseError) {
426
+ this.updateDebugInfo(`Failed to parse JSON result: ${parseError.message}`);
427
+ throw new Error(`Failed to parse transcription result: ${parseError.message}`);
428
+ }
429
+ } else {
430
+ throw new Error(`Unexpected Gradio response format: ${JSON.stringify(responseData)}`);
431
+ }
432
+
433
+ if (result) {
434
+ const processingTime = (Date.now() - startTime) / 1000;
435
+ this.updateStatus('Transcription complete', 'ready');
436
+ this.addResult(
437
+ result,
438
+ {
439
+ processingTime: `${processingTime.toFixed(2)}s`,
440
+ device: 'GPU',
441
+ ...metadata
442
+ }
443
+ );
444
+
445
+ this.updateDebugInfo(`Transcription successful: "${result.substring(0, 100)}"`);
446
+ }
447
+
448
+ } catch (error) {
449
+ this.updateStatus(`Transcription failed: ${error.message}`, 'error');
450
+ this.addResult(`❌ Transcription error: ${error.message}`, { device: 'Error' });
451
+ this.updateDebugInfo(`Transcription error: ${error.message}`);
452
+ }
453
+ }
454
+
455
+ async uploadFile() {
456
+ const file = this.audioFileInput.files[0];
457
+ if (!file) {
458
+ this.updateStatus('No file selected', 'error');
459
+ return;
460
+ }
461
+
462
+ this.updateStatus(`Processing file: ${file.name}`, 'processing');
463
+ this.updateDebugInfo(`File selected: ${file.name} (${file.size} bytes)`);
464
+
465
+ try {
466
+ // Since file upload API is failing with 500 error,
467
+ // convert file to base64 and use the working base64 API (fn_index: 1)
468
+ this.updateDebugInfo('Converting file to base64 for base64 API...');
469
+
470
+ const audioBase64 = await this.fileToBase64(file);
471
+ this.updateDebugInfo(`File converted to base64: ${audioBase64.length} characters`);
472
+
473
+ // Use the working base64 transcription method
474
+ await this.transcribeAudio(audioBase64, { source: `File: ${file.name}` });
475
+
476
+ } catch (error) {
477
+ this.updateStatus(`File processing failed: ${error.message}`, 'error');
478
+ this.updateDebugInfo(`File error: ${error.message}`);
479
+ }
480
+ }
481
+
482
+ async fileToBase64(file) {
483
+ return new Promise((resolve, reject) => {
484
+ const reader = new FileReader();
485
+ reader.onloadend = () => {
486
+ const result = reader.result;
487
+ const base64 = result.split(',')[1];
488
+ resolve(base64);
489
+ };
490
+ reader.onerror = reject;
491
+ reader.readAsDataURL(file);
492
+ });
493
+ }
494
+
495
+ async testBase64Data() {
496
+ this.updateStatus('Testing with sample base64 data...', 'processing');
497
+ this.updateDebugInfo('Testing with minimal base64 data');
498
+
499
+ // Create minimal test data
500
+ const testData = new Uint8Array(100);
501
+ testData.fill(42); // Fill with test data
502
+ const testBlob = new Blob([testData], { type: 'audio/webm' });
503
+ const testBase64 = await this.blobToBase64(testBlob);
504
+
505
+ await this.transcribeAudio(testBase64, { source: 'Test data' });
506
+ }
507
+
508
+ handleFileUploadResult(resultArray, startTime, metadata) {
509
+ // File upload returns 3 values: [transcription, timing, status]
510
+ const processingTime = (Date.now() - startTime) / 1000;
511
+
512
+ if (resultArray && resultArray.length >= 3) {
513
+ const [transcription, timing, status] = resultArray;
514
+
515
+ this.updateStatus('File transcription complete', 'ready');
516
+ this.addResult(
517
+ transcription,
518
+ {
519
+ processingTime: `${processingTime.toFixed(2)}s`,
520
+ timing: timing,
521
+ status: status,
522
+ device: 'GPU',
523
+ ...metadata
524
+ }
525
+ );
526
+
527
+ this.updateDebugInfo(`File transcription successful: "${transcription.substring(0, 100)}"`);
528
+ } else {
529
+ throw new Error(`Invalid file upload result format: ${JSON.stringify(resultArray)}`);
530
+ }
531
+ }
532
+
533
+ async listenForFileUploadResult(queueResponse, startTime, metadata) {
534
+ // Similar to regular queue listening but expects 3 outputs
535
+ return new Promise((resolve, reject) => {
536
+ const wsUrl = this.serverUrl.replace('https://', 'wss://').replace('http://', 'ws://') + '/queue/data';
537
+ this.updateDebugInfo(`File upload WebSocket: ${wsUrl}`);
538
+
539
+ const ws = new WebSocket(wsUrl);
540
+
541
+ const timeout = setTimeout(() => {
542
+ ws.close();
543
+ reject(new Error('File upload queue timeout after 30 seconds'));
544
+ }, 30000);
545
+
546
+ ws.onopen = () => {
547
+ this.updateDebugInfo('File upload WebSocket connected');
548
+ if (queueResponse.event_id) {
549
+ ws.send(JSON.stringify({
550
+ event_id: queueResponse.event_id
551
+ }));
552
+ this.updateDebugInfo(`Sent file upload event_id: ${queueResponse.event_id}`);
553
+ }
554
+ };
555
+
556
+ ws.onmessage = (event) => {
557
+ try {
558
+ const data = JSON.parse(event.data);
559
+ this.updateDebugInfo(`File upload queue message: ${JSON.stringify(data)}`);
560
+
561
+ if (data.msg === 'process_completed' && data.output && data.output.data) {
562
+ clearTimeout(timeout);
563
+ ws.close();
564
+ this.handleFileUploadResult(data.output.data, startTime, metadata);
565
+ resolve(data.output.data);
566
+ } else if (data.msg === 'process_starts') {
567
+ this.updateStatus('Processing file on server...', 'processing');
568
+ }
569
+ } catch (e) {
570
+ this.updateDebugInfo(`File upload WebSocket parse error: ${e.message}`);
571
+ }
572
+ };
573
+
574
+ ws.onerror = (error) => {
575
+ this.updateDebugInfo('File upload WebSocket failed, trying polling...');
576
+ clearTimeout(timeout);
577
+ this.pollForFileUploadResult(queueResponse.event_id, startTime, metadata).then(resolve).catch(reject);
578
+ };
579
+
580
+ ws.onclose = (event) => {
581
+ this.updateDebugInfo(`File upload WebSocket closed: ${event.code}`);
582
+ clearTimeout(timeout);
583
+ };
584
+ });
585
+ }
586
+
587
+ async pollForFileUploadResult(eventId, startTime, metadata) {
588
+ // Polling fallback for file uploads
589
+ this.updateDebugInfo(`Polling for file upload event: ${eventId}`);
590
+ const maxAttempts = 30;
591
+
592
+ const pollEndpoints = [
593
+ `/queue/data?event_id=${eventId}`,
594
+ `/queue/status?event_id=${eventId}`
595
+ ];
596
+
597
+ for (let attempt = 0; attempt < maxAttempts; attempt++) {
598
+ const endpoint = pollEndpoints[attempt % pollEndpoints.length];
599
+
600
+ try {
601
+ const response = await fetch(`${this.serverUrl}${endpoint}`);
602
+ if (response.ok) {
603
+ const status = await response.json();
604
+ this.updateDebugInfo(`File poll attempt ${attempt + 1} (${endpoint}): ${JSON.stringify(status)}`);
605
+
606
+ if (status.msg === 'process_completed' && status.output && status.output.data) {
607
+ this.handleFileUploadResult(status.output.data, startTime, metadata);
608
+ return status.output.data;
609
+ } else if (status.msg === 'process_starts') {
610
+ this.updateStatus('Processing file on server...', 'processing');
611
+ }
612
+ }
613
+ } catch (e) {
614
+ this.updateDebugInfo(`File poll error on ${endpoint}: ${e.message}`);
615
+ }
616
+
617
+ await new Promise(resolve => setTimeout(resolve, 1000));
618
+ }
619
+
620
+ throw new Error('File upload polling timeout - no result after 30 seconds');
621
+ }
622
+ }
623
+
624
+ // Initialize client when page loads
625
+ document.addEventListener('DOMContentLoaded', () => {
626
+ window.sttClient = new STTv2Client();
627
+ console.log('STT v2 Client initialized');
628
+ });
client-stt/v2-index.html ADDED
@@ -0,0 +1,243 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>STT GPU Service v2 Client</title>
7
+ <style>
8
+ body {
9
+ font-family: Arial, sans-serif;
10
+ max-width: 800px;
11
+ margin: 50px auto;
12
+ padding: 20px;
13
+ background-color: #f5f5f5;
14
+ }
15
+
16
+ .container {
17
+ background: white;
18
+ padding: 30px;
19
+ border-radius: 10px;
20
+ box-shadow: 0 2px 10px rgba(0,0,0,0.1);
21
+ }
22
+
23
+ h1 {
24
+ text-align: center;
25
+ color: #333;
26
+ margin-bottom: 30px;
27
+ }
28
+
29
+ .controls {
30
+ display: flex;
31
+ gap: 15px;
32
+ margin-bottom: 20px;
33
+ align-items: center;
34
+ flex-wrap: wrap;
35
+ }
36
+
37
+ #recordButton {
38
+ padding: 12px 24px;
39
+ font-size: 16px;
40
+ border: none;
41
+ border-radius: 5px;
42
+ cursor: pointer;
43
+ transition: background-color 0.3s;
44
+ }
45
+
46
+ #recordButton.start {
47
+ background-color: #4CAF50;
48
+ color: white;
49
+ }
50
+
51
+ #recordButton.stop {
52
+ background-color: #f44336;
53
+ color: white;
54
+ }
55
+
56
+ #recordButton:disabled {
57
+ background-color: #cccccc;
58
+ cursor: not-allowed;
59
+ }
60
+
61
+ .input-group {
62
+ display: flex;
63
+ flex-direction: column;
64
+ gap: 5px;
65
+ }
66
+
67
+ label {
68
+ font-weight: bold;
69
+ color: #333;
70
+ }
71
+
72
+ input[type="text"], select, input[type="file"] {
73
+ padding: 8px 12px;
74
+ border: 1px solid #ddd;
75
+ border-radius: 4px;
76
+ font-size: 14px;
77
+ }
78
+
79
+ .status {
80
+ padding: 10px;
81
+ border-radius: 5px;
82
+ margin-bottom: 20px;
83
+ font-weight: bold;
84
+ }
85
+
86
+ .status.ready {
87
+ background-color: #d4edda;
88
+ color: #155724;
89
+ }
90
+
91
+ .status.recording {
92
+ background-color: #fff3cd;
93
+ color: #856404;
94
+ }
95
+
96
+ .status.processing {
97
+ background-color: #d1ecf1;
98
+ color: #0c5460;
99
+ }
100
+
101
+ .status.error {
102
+ background-color: #f8d7da;
103
+ color: #721c24;
104
+ }
105
+
106
+ .transcription {
107
+ background-color: #f8f9fa;
108
+ border: 1px solid #dee2e6;
109
+ border-radius: 5px;
110
+ padding: 20px;
111
+ min-height: 200px;
112
+ max-height: 400px;
113
+ overflow-y: auto;
114
+ font-size: 16px;
115
+ line-height: 1.5;
116
+ white-space: pre-wrap;
117
+ word-wrap: break-word;
118
+ scroll-behavior: smooth;
119
+ }
120
+
121
+ .result-item {
122
+ background: #f0f8ff;
123
+ border-left: 4px solid #007bff;
124
+ padding: 10px;
125
+ margin: 10px 0;
126
+ border-radius: 4px;
127
+ }
128
+
129
+ .result-meta {
130
+ font-size: 12px;
131
+ color: #666;
132
+ margin-bottom: 5px;
133
+ }
134
+
135
+ .result-text {
136
+ font-size: 16px;
137
+ color: #333;
138
+ }
139
+
140
+ .button-group {
141
+ display: flex;
142
+ gap: 10px;
143
+ flex-wrap: wrap;
144
+ }
145
+
146
+ .btn {
147
+ padding: 12px 24px;
148
+ border: none;
149
+ border-radius: 5px;
150
+ cursor: pointer;
151
+ font-size: 14px;
152
+ transition: background-color 0.3s;
153
+ }
154
+
155
+ .btn-primary { background-color: #007bff; color: white; }
156
+ .btn-success { background-color: #28a745; color: white; }
157
+ .btn-warning { background-color: #ffc107; color: black; }
158
+ .btn-info { background-color: #17a2b8; color: white; }
159
+ </style>
160
+ </head>
161
+ <body>
162
+ <div class="container">
163
+ <h1>STT GPU Service v2 Client</h1>
164
+ <p style="text-align: center; color: #666; margin-bottom: 20px;">
165
+ <strong>Backend:</strong> pgits-stt-gpu-service-v2 (HTTP API, GPU-accelerated Whisper)
166
+ </p>
167
+
168
+ <div class="controls">
169
+ <div class="button-group">
170
+ <button id="recordButton" class="start btn">Start Recording</button>
171
+ <button id="testHealthButton" class="btn btn-info">Test Health</button>
172
+ <button id="clearButton" class="btn btn-warning">Clear Results</button>
173
+ </div>
174
+
175
+ <div class="input-group">
176
+ <label for="serverUrl">Server URL:</label>
177
+ <input type="text" id="serverUrl" value="https://pgits-stt-gpu-service-v2.hf.space" />
178
+ </div>
179
+
180
+ <div class="input-group">
181
+ <label for="language">Language:</label>
182
+ <select id="language">
183
+ <option value="en">English</option>
184
+ <option value="es">Spanish</option>
185
+ <option value="fr">French</option>
186
+ <option value="de">German</option>
187
+ <option value="it">Italian</option>
188
+ <option value="pt">Portuguese</option>
189
+ <option value="ru">Russian</option>
190
+ <option value="ja">Japanese</option>
191
+ <option value="ko">Korean</option>
192
+ <option value="zh">Chinese</option>
193
+ <option value="auto">Auto Detect</option>
194
+ </select>
195
+ </div>
196
+
197
+ <div class="input-group">
198
+ <label for="modelSize">Model Size:</label>
199
+ <select id="modelSize">
200
+ <option value="tiny">Tiny (Fast)</option>
201
+ <option value="base" selected>Base (Balanced)</option>
202
+ <option value="small">Small (Better)</option>
203
+ <option value="medium">Medium (High Quality)</option>
204
+ <option value="large">Large (Best Quality)</option>
205
+ </select>
206
+ </div>
207
+ </div>
208
+
209
+ <div class="controls" style="border-top: 1px solid #dee2e6; padding-top: 20px;">
210
+ <div class="input-group">
211
+ <label for="audioFileInput">Test with Audio File:</label>
212
+ <input type="file" id="audioFileInput" accept="audio/*" />
213
+ </div>
214
+ <div class="button-group">
215
+ <button id="uploadFileButton" class="btn btn-primary">Upload & Transcribe File</button>
216
+ <button id="testBase64Button" class="btn btn-success">Test Base64 Data</button>
217
+ </div>
218
+ <p style="font-size: 12px; color: #666; margin-top: 10px;">
219
+ πŸ“„ v2 service supports both file upload and WebRTC base64 audio processing
220
+ </p>
221
+ </div>
222
+
223
+ <div id="status" class="status ready">Ready - Click "Test Health" to verify service</div>
224
+
225
+ <div class="transcription" id="transcription">
226
+ Click "Test Health" to verify service connectivity, then try recording or upload a file...
227
+ </div>
228
+
229
+ <div style="background: #f8f9fa; border: 1px solid #dee2e6; border-radius: 5px; padding: 15px; margin-top: 15px;">
230
+ <h4 style="margin-top: 0; color: #495057;">Debug Information</h4>
231
+ <div id="debugInfo" style="font-family: monospace; font-size: 11px; color: #6c757d;">
232
+ Ready for testing...
233
+ </div>
234
+ </div>
235
+
236
+ <div style="text-align: center; margin-top: 10px; padding: 10px; border-top: 1px solid #dee2e6; color: #6c757d; font-size: 12px;">
237
+ STT GPU Service v2 Client v1.0.0 | HTTP API | HuggingFace GPU Backend
238
+ </div>
239
+ </div>
240
+
241
+ <script src="v2-audio-client.js"></script>
242
+ </body>
243
+ </html>
test_client.py ADDED
@@ -0,0 +1,186 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Test client for STT GPU Service on HuggingFace Spaces
4
+ Tests the transcription API with audio files or generates test audio
5
+ """
6
+
7
+ import os
8
+ import sys
9
+ import base64
10
+ import json
11
+ import requests
12
+ import tempfile
13
+ import time
14
+ from typing import Optional
15
+
16
+ # HuggingFace Space URL
17
+ STT_SERVICE_URL = "https://pgits-stt-gpu-service-v2.hf.space"
18
+
19
+ def test_health_check():
20
+ """Test the health endpoint"""
21
+ print("πŸ” Testing health check...")
22
+ try:
23
+ response = requests.get(f"{STT_SERVICE_URL}/api/health", timeout=10)
24
+ if response.status_code == 200:
25
+ print("βœ… Health check passed")
26
+ return True
27
+ else:
28
+ print(f"❌ Health check failed: {response.status_code}")
29
+ return False
30
+ except Exception as e:
31
+ print(f"❌ Health check error: {e}")
32
+ return False
33
+
34
+ def create_test_audio_file() -> str:
35
+ """Create a simple test audio file using system tools"""
36
+ print("🎡 Creating test audio file...")
37
+
38
+ try:
39
+ # Try to create a simple sine wave audio file using ffmpeg if available
40
+ test_audio_path = "/tmp/test_audio.wav"
41
+
42
+ # Generate 3 seconds of 440Hz sine wave
43
+ os.system(f"ffmpeg -f lavfi -i 'sine=frequency=440:duration=3' -y {test_audio_path} 2>/dev/null")
44
+
45
+ if os.path.exists(test_audio_path) and os.path.getsize(test_audio_path) > 1000:
46
+ print(f"βœ… Test audio created: {test_audio_path}")
47
+ return test_audio_path
48
+ else:
49
+ print("⚠️ ffmpeg not available or failed, using minimal test data")
50
+ return None
51
+
52
+ except Exception as e:
53
+ print(f"⚠️ Could not create test audio: {e}")
54
+ return None
55
+
56
+ def test_transcription_with_file(audio_file_path: str, language: str = "en", model_size: str = "base"):
57
+ """Test transcription with an audio file"""
58
+ print(f"🎀 Testing transcription with file: {audio_file_path}")
59
+
60
+ try:
61
+ # Read and encode audio file
62
+ with open(audio_file_path, "rb") as f:
63
+ audio_data = f.read()
64
+ audio_base64 = base64.b64encode(audio_data).decode('utf-8')
65
+
66
+ print(f"πŸ“„ Audio file size: {len(audio_data)} bytes")
67
+ print(f"πŸ“„ Base64 size: {len(audio_base64)} characters")
68
+
69
+ # Send transcription request
70
+ payload = {
71
+ "audio_base64": audio_base64,
72
+ "language": language,
73
+ "model_size": model_size
74
+ }
75
+
76
+ print(f"πŸš€ Sending request to {STT_SERVICE_URL}/api/transcribe")
77
+ start_time = time.time()
78
+
79
+ response = requests.post(
80
+ f"{STT_SERVICE_URL}/api/transcribe",
81
+ json=payload,
82
+ timeout=30,
83
+ headers={"Content-Type": "application/json"}
84
+ )
85
+
86
+ processing_time = time.time() - start_time
87
+
88
+ print(f"πŸ“Š Request completed in {processing_time:.2f}s")
89
+ print(f"πŸ“Š Response status: {response.status_code}")
90
+
91
+ if response.status_code == 200:
92
+ try:
93
+ result = response.json()
94
+ print(f"βœ… Transcription result: {result}")
95
+ return result
96
+ except json.JSONDecodeError:
97
+ print(f"βœ… Transcription result (text): {response.text}")
98
+ return response.text
99
+ else:
100
+ print(f"❌ Request failed: {response.status_code}")
101
+ print(f"❌ Error response: {response.text}")
102
+ return None
103
+
104
+ except Exception as e:
105
+ print(f"❌ Transcription test failed: {e}")
106
+ return None
107
+
108
+ def test_transcription_with_minimal_data():
109
+ """Test transcription with minimal test data"""
110
+ print("πŸ§ͺ Testing with minimal base64 data...")
111
+
112
+ # Create minimal test data (should trigger demo response)
113
+ test_data = b"test audio data for demo"
114
+ audio_base64 = base64.b64encode(test_data).decode('utf-8')
115
+
116
+ payload = {
117
+ "audio_base64": audio_base64,
118
+ "language": "en",
119
+ "model_size": "base"
120
+ }
121
+
122
+ try:
123
+ response = requests.post(
124
+ f"{STT_SERVICE_URL}/api/transcribe",
125
+ json=payload,
126
+ timeout=10,
127
+ headers={"Content-Type": "application/json"}
128
+ )
129
+
130
+ if response.status_code == 200:
131
+ try:
132
+ result = response.json()
133
+ print(f"βœ… Demo test result: {result}")
134
+ except json.JSONDecodeError:
135
+ print(f"βœ… Demo test result (text): {response.text}")
136
+ else:
137
+ print(f"❌ Demo test failed: {response.status_code} - {response.text}")
138
+
139
+ except Exception as e:
140
+ print(f"❌ Demo test error: {e}")
141
+
142
+ def main():
143
+ """Main test function"""
144
+ print("🎀 STT GPU Service Test Client")
145
+ print("=" * 50)
146
+
147
+ # Test health check first
148
+ if not test_health_check():
149
+ print("❌ Service appears to be down, continuing with other tests...")
150
+
151
+ print()
152
+
153
+ # Test with minimal data first
154
+ test_transcription_with_minimal_data()
155
+ print()
156
+
157
+ # Check if user provided an audio file
158
+ if len(sys.argv) > 1:
159
+ audio_file = sys.argv[1]
160
+ if os.path.exists(audio_file):
161
+ print(f"🎡 Using provided audio file: {audio_file}")
162
+ test_transcription_with_file(audio_file)
163
+ else:
164
+ print(f"❌ Audio file not found: {audio_file}")
165
+ else:
166
+ # Try to create test audio
167
+ test_audio = create_test_audio_file()
168
+ if test_audio:
169
+ test_transcription_with_file(test_audio)
170
+ # Clean up
171
+ try:
172
+ os.unlink(test_audio)
173
+ except:
174
+ pass
175
+ else:
176
+ print("⚠️ No audio file provided and couldn't create test audio")
177
+ print("πŸ’‘ Usage: python test_client.py [audio_file.wav]")
178
+ print("πŸ’‘ Or install ffmpeg to auto-generate test audio")
179
+
180
+ print()
181
+ print("🎯 Test complete!")
182
+ print("πŸ’‘ You can also test via web interface at:")
183
+ print(f" {STT_SERVICE_URL}")
184
+
185
+ if __name__ == "__main__":
186
+ main()