Spaces:

pgits
/

stt-gpu-service-v2

Sleeping

pgits Claude commited on Sep 16, 2025

Commit

1434d57

1 Parent(s): 1015eb4

fix: Update client to use Gradio API endpoint format

- Changed from /api/transcribe to /api/simple_transcribe endpoint
- Updated request format to use Gradio data array structure
- Fixed response parsing to handle JSON string returned by Gradio API
- Improved error handling and debug messages

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

Files changed (5) hide show

CLAUDE.md +105 -0
app.py +40 -41
client-stt/v2-audio-client.js +628 -0
client-stt/v2-index.html +243 -0
test_client.py +186 -0

CLAUDE.md ADDED Viewed

	@@ -0,0 +1,105 @@

+# CLAUDE.md
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+## Project Overview
+STT GPU Service is a GPU-accelerated Speech-to-Text microservice built with Gradio and OpenAI Whisper. It serves as a replacement for Streamlit-based solutions to eliminate iframe communication barriers in WebRTC applications, specifically designed for VoiceCalendar integration.
+## Architecture
+The service is built around a single-file architecture with these core components:
+- **STTService class** (`app.py:45-221`): Core transcription engine with Whisper model management
+- **Gradio interface** (`app.py:284-486`): Web UI with three main tabs:
+  - File upload transcription
+  - WebRTC memory transcription (base64 audio processing)
+  - API integration documentation
+- **WebRTC compatibility layer**: Handles base64 audio conversion and validation
+## Key Technologies
+- **PyTorch + CUDA**: GPU acceleration for Whisper models
+- **OpenAI Whisper**: Speech-to-text engine (tiny to large models)
+- **Gradio 4.44.1**: Web interface and API endpoints
+- **pydub**: Audio format conversion (WebM → WAV)
+- **debugpy**: Remote debugging support for HuggingFace Spaces
+## Development Commands
+### Running the Service
+```bash
+python3 app.py
+```
+Service runs on port 7860 by default.
+### HuggingFace Spaces Deployment
+The service is designed for HuggingFace Spaces with GPU support:
+- Hardware requirement: A10G Small ($0.40/hour) or better
+- Uses Git LFS for model files (configured in `.gitattributes`)
+### Environment Variables
+- `WHISPER_MODEL_SIZE`: Model size (default: "base")
+- `DEFAULT_LANGUAGE`: Default transcription language (default: "en")
+- `DEBUG_MODE`: Enable debugpy remote debugging (default: "false")
+## API Endpoints (via Gradio)
+### Core Transcription
+- **Endpoint**: `/api/transcribe` (POST)
+- **Function**: `gradio_transcribe_memory()` (`app.py:250-264`)
+- **Input**: JSON with `audio_base64`, `language`, `model_size`
+- **Output**: Transcription text
+### Health Check
+- **Function**: `get_system_status()` (`app.py:266-281`)
+- **Returns**: GPU status, model info, device information
+## WebRTC Integration Pattern
+The service processes WebRTC audio through this pipeline:
+1. JavaScript captures audio as WebM/Opus blobs
+2. Convert to base64 encoding
+3. POST to `/api/transcribe` endpoint
+4. Service decodes base64 → writes temp file → converts to WAV → transcribes with Whisper
+5. Returns transcription text
+Key implementation details:
+- Handles data URL format: `data:audio/webm;codecs=opus;base64,<data>`
+- Validates base64 encoding before processing
+- Automatic padding for base64 alignment
+- Temporary file cleanup after processing
+## Model Management
+Models are loaded dynamically based on requests:
+- **Model sizes**: tiny, base, small, medium, large
+- **GPU optimization**: Uses fp16 precision when CUDA available
+- **Memory requirements**: See README.md performance table
+- **Language support**: 10+ languages plus auto-detection
+## Error Handling
+The service includes comprehensive error handling:
+- Base64 validation and cleanup
+- Audio format conversion errors
+- Whisper model loading failures
+- Temporary file cleanup on errors
+- Detailed logging for debugging
+## Debugging
+Remote debugging is available when `DEBUG_MODE=true`:
+- debugpy listens on port 5679
+- Compatible with VS Code/Cursor remote attach
+- Useful for HuggingFace Spaces development
+## Integration Notes
+This service specifically eliminates Streamlit iframe communication issues by:
+- Providing direct HTTP API endpoints instead of component bridges
+- Supporting native WebRTC audio processing
+- Removing postMessage/iframe complexity
+- Enabling direct JavaScript fetch() calls
+The architecture supports the "unmute.sh methodology" referenced in the user's requirements.

app.py CHANGED Viewed

@@ -488,57 +488,56 @@ with gr.Blocks(
     refresh_btn = gr.Button("🔄 Refresh System Status", variant="secondary")
     refresh_btn.click(fn=lambda: get_system_status(), outputs=status_md)
-# Custom API models for direct endpoints
-class TranscribeRequest(BaseModel):
-    audio_base64: str
-    language: str = "en"
-    model_size: str = "base"
-class TranscribeResponse(BaseModel):
-    transcription: str
-    processing_time: float
-    device: str
-    model_size: str
-# Get the FastAPI app that Gradio creates
-app = demo.app
-# Add custom endpoint
-@app.post("/api/transcribe", response_model=TranscribeResponse)
-async def api_transcribe(request: TranscribeRequest):
-    """Direct API endpoint bypassing Gradio queue system"""
     try:
         import time
         start_time = time.time()
-        # Use the same function but return structured response
-        result_text = gradio_transcribe_memory(
-            request.audio_base64,
-            request.language,
-            request.model_size
-        )
         processing_time = time.time() - start_time
-        return TranscribeResponse(
-            transcription=result_text,
-            processing_time=processing_time,
-            device=stt_service.device,
-            model_size=stt_service.model_size
-        )
     except Exception as e:
-        raise HTTPException(status_code=500, detail=str(e))
-@app.get("/api/health")
-async def api_health():
-    """Health check endpoint"""
-    return {
-        "status": "healthy",
-        "device": stt_service.device,
-        "model_size": stt_service.model_size,
-        "gpu_available": torch.cuda.is_available()
-    }
 # Launch interface
 if __name__ == "__main__":

     refresh_btn = gr.Button("🔄 Refresh System Status", variant="secondary")
     refresh_btn.click(fn=lambda: get_system_status(), outputs=status_md)
+# Create a simpler API endpoint using Gradio's built-in API system
+def api_transcribe_simple(audio_base64, language="en", model_size="base"):
+    """Simple API function that returns JSON string instead of complex objects"""
     try:
         import time
+        import json
         start_time = time.time()
+        # Use the same function but return structured response as JSON string
+        result_text = gradio_transcribe_memory(audio_base64, language, model_size)
         processing_time = time.time() - start_time
+        # Return JSON string that can be parsed by client
+        response = {
+            "transcription": result_text,
+            "processing_time": processing_time,
+            "device": stt_service.device,
+            "model_size": stt_service.model_size,
+            "success": True
+        }
+        return json.dumps(response)
     except Exception as e:
+        import json
+        error_response = {
+            "error": str(e),
+            "success": False
+        }
+        return json.dumps(error_response)
+# Add this as a hidden Gradio interface for API access
+with gr.Blocks() as api_demo:
+    api_input = gr.Textbox(visible=False)
+    api_language = gr.Textbox(visible=False)
+    api_model = gr.Textbox(visible=False)
+    api_output = gr.Textbox(visible=False)
+    # This creates an API endpoint at /api/predict with fn_index for this function
+    api_button = gr.Button(visible=False)
+    api_button.click(
+        fn=api_transcribe_simple,
+        inputs=[api_input, api_language, api_model],
+        outputs=[api_output],
+        api_name="simple_transcribe"  # This creates /api/simple_transcribe endpoint
+    )
+# Mount the API interface
+demo = gr.TabbedInterface([demo, api_demo], ["Main Interface", "API"], tab_names=["Interface", "API"])
 # Launch interface
 if __name__ == "__main__":

client-stt/v2-audio-client.js ADDED Viewed

	@@ -0,0 +1,628 @@

+/**
+ * STT GPU Service v2 Audio Client
+ * HTTP API client for pgits-stt-gpu-service-v2
+ */
+class STTv2Client {
+    constructor() {
+        this.isRecording = false;
+        this.mediaRecorder = null;
+        this.audioChunks = [];
+        this.serverUrl = '';
+        this.resultCounter = 0;
+        this.initializeElements();
+        this.attachEventListeners();
+        this.updateDebugInfo('Client initialized');
+    }
+    generateSessionHash() {
+        return Math.random().toString(36).substring(2, 15) + Math.random().toString(36).substring(2, 15);
+    }
+    async listenForQueueResult(queueResponse, startTime, sessionHash) {
+        return new Promise((resolve, reject) => {
+            // Try different WebSocket URLs for Gradio queue
+            const possibleWsUrls = [
+                this.serverUrl.replace('https://', 'wss://').replace('http://', 'ws://') + '/queue/data',
+                this.serverUrl.replace('https://', 'wss://').replace('http://', 'ws://') + '/queue/join',
+                this.serverUrl.replace('https://', 'wss://').replace('http://', 'ws://') + '/ws'
+            ];
+            let wsUrl = possibleWsUrls[0];
+            this.updateDebugInfo(`Attempting WebSocket connection to: ${wsUrl}`);
+            const ws = new WebSocket(wsUrl);
+            const timeout = setTimeout(() => {
+                ws.close();
+                reject(new Error('Queue timeout after 30 seconds'));
+            }, 30000);
+            ws.onopen = () => {
+                this.updateDebugInfo('WebSocket connected successfully');
+                // Send event_id to subscribe to updates
+                if (queueResponse.event_id) {
+                    ws.send(JSON.stringify({
+                        event_id: queueResponse.event_id
+                    }));
+                    this.updateDebugInfo(`Sent event_id: ${queueResponse.event_id}`);
+                }
+            };
+            ws.onmessage = (event) => {
+                try {
+                    const data = JSON.parse(event.data);
+                    this.updateDebugInfo(`Queue message: ${JSON.stringify(data)}`);
+                    if (data.msg === 'process_completed' && data.output && data.output.data) {
+                        clearTimeout(timeout);
+                        ws.close();
+                        resolve(data.output.data[0]); // First output element
+                    } else if (data.msg === 'process_starts') {
+                        this.updateStatus('Processing on server...', 'processing');
+                    } else if (data.msg === 'queue_full') {
+                        clearTimeout(timeout);
+                        ws.close();
+                        reject(new Error('Server queue is full'));
+                    } else if (data.msg === 'send_data') {
+                        // Some Gradio versions expect data to be sent after connection
+                        this.updateDebugInfo('Server requesting data');
+                    }
+                } catch (e) {
+                    this.updateDebugInfo(`WebSocket parse error: ${e.message}`);
+                    this.updateDebugInfo(`Raw message: ${event.data}`);
+                }
+            };
+            ws.onerror = (error) => {
+                this.updateDebugInfo(`WebSocket error details: ${JSON.stringify(error)}`);
+                this.updateDebugInfo(`WebSocket readyState: ${ws.readyState}`);
+                clearTimeout(timeout);
+                // Try polling as fallback instead of failing
+                this.updateDebugInfo('WebSocket failed, trying polling fallback...');
+                this.pollForResult(queueResponse.event_id, startTime, sessionHash).then(resolve).catch(reject);
+            };
+            ws.onclose = (event) => {
+                this.updateDebugInfo(`WebSocket closed: code=${event.code}, reason=${event.reason}`);
+                clearTimeout(timeout);
+            };
+        });
+    }
+    async pollForResult(eventId, startTime, sessionHash = null) {
+        // Fast polling - transcription takes ~0.5s, so poll immediately and frequently
+        this.updateDebugInfo(`Starting FAST polling for event: ${eventId}, session: ${sessionHash}`);
+        const maxAttempts = 20; // 20 attempts over ~10 seconds
+        const actualSessionHash = sessionHash || this.generateSessionHash();
+        const pollEndpoints = [
+            `/queue/data?event_id=${eventId}&session_hash=${actualSessionHash}`,
+            `/queue/status?event_id=${eventId}`,
+        ];
+        this.updateDebugInfo(`Using session_hash: ${actualSessionHash}`);
+        // First attempt immediately (no delay)
+        for (let attempt = 0; attempt < maxAttempts; attempt++) {
+            const endpoint = pollEndpoints[attempt % pollEndpoints.length];
+            this.updateDebugInfo(`FAST poll attempt ${attempt + 1} trying: ${endpoint}`);
+            try {
+                const response = await fetch(`${this.serverUrl}${endpoint}`);
+                if (response.ok) {
+                    const responseText = await response.text();
+                    this.updateDebugInfo(`Poll attempt ${attempt + 1} (${endpoint}): ${responseText.substring(0, 300)}`);
+                    // Handle SSE format for queue/data endpoints
+                    if (endpoint.includes('/queue/data') && responseText.includes('data: ')) {
+                        const lines = responseText.split('\n');
+                        for (const line of lines) {
+                            if (line.startsWith('data: ')) {
+                                try {
+                                    const data = JSON.parse(line.substring(6));
+                                    this.updateDebugInfo(`SSE data: ${JSON.stringify(data)}`);
+                                    if (data.msg === 'process_completed' && data.output && data.output.data) {
+                                        return data.output.data[0];
+                                    } else if (data.msg === 'process_starts') {
+                                        this.updateStatus('Processing on server...', 'processing');
+                                    }
+                                } catch (parseError) {
+                                    this.updateDebugInfo(`SSE parse error: ${parseError.message}`);
+                                }
+                            }
+                        }
+                    } else {
+                        // Try parsing as regular JSON
+                        try {
+                            const status = JSON.parse(responseText);
+                            if (status.msg === 'process_completed' && status.output && status.output.data) {
+                                return status.output.data[0];
+                            } else if (status.msg === 'process_starts') {
+                                this.updateStatus('Processing on server...', 'processing');
+                            }
+                        } catch (e) {
+                            // Not JSON, that's ok
+                        }
+                    }
+                } else if (response.status === 404) {
+                    this.updateDebugInfo(`Endpoint ${endpoint} not found, trying next...`);
+                } else {
+                    this.updateDebugInfo(`Endpoint ${endpoint} returned ${response.status}`);
+                }
+            } catch (e) {
+                this.updateDebugInfo(`Poll error on ${endpoint}: ${e.message}`);
+            }
+            // Short delay - transcription is fast (~0.5s), need to catch result quickly
+            if (attempt === 0) {
+                // No delay for first attempt
+            } else if (attempt < 5) {
+                await new Promise(resolve => setTimeout(resolve, 200)); // 200ms for first few
+            } else {
+                await new Promise(resolve => setTimeout(resolve, 500)); // 500ms after that
+            }
+        }
+        throw new Error('Fast polling timeout - no result after 10 seconds');
+    }
+    initializeElements() {
+        this.recordButton = document.getElementById('recordButton');
+        this.testHealthButton = document.getElementById('testHealthButton');
+        this.clearButton = document.getElementById('clearButton');
+        this.uploadFileButton = document.getElementById('uploadFileButton');
+        this.testBase64Button = document.getElementById('testBase64Button');
+        this.serverUrlInput = document.getElementById('serverUrl');
+        this.languageSelect = document.getElementById('language');
+        this.modelSizeSelect = document.getElementById('modelSize');
+        this.audioFileInput = document.getElementById('audioFileInput');
+        this.statusDiv = document.getElementById('status');
+        this.transcriptionDiv = document.getElementById('transcription');
+        this.debugInfoDiv = document.getElementById('debugInfo');
+    }
+    attachEventListeners() {
+        this.recordButton.addEventListener('click', () => this.toggleRecording());
+        this.testHealthButton.addEventListener('click', () => this.testHealth());
+        this.clearButton.addEventListener('click', () => this.clearResults());
+        this.uploadFileButton.addEventListener('click', () => this.uploadFile());
+        this.testBase64Button.addEventListener('click', () => this.testBase64Data());
+        this.serverUrlInput.addEventListener('input', () => this.updateServerUrl());
+        // Initialize server URL
+        this.updateServerUrl();
+    }
+    updateServerUrl() {
+        this.serverUrl = this.serverUrlInput.value.trim();
+        this.updateDebugInfo(`Server URL updated: ${this.serverUrl}`);
+    }
+    updateStatus(message, className) {
+        this.statusDiv.textContent = message;
+        this.statusDiv.className = `status ${className}`;
+        this.updateDebugInfo(`Status: ${message}`);
+    }
+    updateDebugInfo(message) {
+        const timestamp = new Date().toLocaleTimeString();
+        this.debugInfoDiv.innerHTML += `[${timestamp}] ${message}<br>`;
+        this.debugInfoDiv.scrollTop = this.debugInfoDiv.scrollHeight;
+    }
+    addResult(text, metadata = {}) {
+        this.resultCounter++;
+        const resultDiv = document.createElement('div');
+        resultDiv.className = 'result-item';
+        const metaDiv = document.createElement('div');
+        metaDiv.className = 'result-meta';
+        metaDiv.textContent = `#${this.resultCounter} | ${new Date().toLocaleTimeString()} | ${metadata.processingTime || 'N/A'} | ${metadata.device || 'Unknown'}`;
+        const textDiv = document.createElement('div');
+        textDiv.className = 'result-text';
+        textDiv.textContent = text;
+        resultDiv.appendChild(metaDiv);
+        resultDiv.appendChild(textDiv);
+        this.transcriptionDiv.appendChild(resultDiv);
+        this.transcriptionDiv.scrollTop = this.transcriptionDiv.scrollHeight;
+    }
+    clearResults() {
+        this.transcriptionDiv.innerHTML = 'Results cleared...';
+        this.resultCounter = 0;
+        this.updateDebugInfo('Results cleared');
+    }
+    async testHealth() {
+        this.updateStatus('Testing health...', 'processing');
+        this.updateDebugInfo('Testing service connectivity...');
+        try {
+            // Test the main service endpoint
+            const response = await fetch(`${this.serverUrl}/`, {
+                method: 'HEAD',
+                timeout: 10000
+            });
+            if (response.ok) {
+                this.updateStatus('Service is live and responding', 'ready');
+                this.addResult('✅ Service health check passed - ready for transcription', { device: 'Web' });
+                this.updateDebugInfo(`Service responding: HTTP ${response.status}`);
+            } else {
+                throw new Error(`Service check failed: ${response.status}`);
+            }
+        } catch (error) {
+            this.updateStatus(`Health check failed: ${error.message}`, 'error');
+            this.addResult(`❌ Service unreachable: ${error.message}`, { device: 'Error' });
+            this.updateDebugInfo(`Health error: ${error.message}`);
+        }
+    }
+    async toggleRecording() {
+        if (!this.isRecording) {
+            await this.startRecording();
+        } else {
+            await this.stopRecording();
+        }
+    }
+    async startRecording() {
+        try {
+            const stream = await navigator.mediaDevices.getUserMedia({
+                audio: {
+                    sampleRate: 44100,
+                    channelCount: 1,
+                    echoCancellation: true,
+                    noiseSuppression: true
+                }
+            });
+            this.mediaRecorder = new MediaRecorder(stream, {
+                mimeType: 'audio/webm;codecs=opus'
+            });
+            this.audioChunks = [];
+            this.mediaRecorder.ondataavailable = (event) => {
+                if (event.data.size > 0) {
+                    this.audioChunks.push(event.data);
+                }
+            };
+            this.mediaRecorder.onstop = () => {
+                this.processRecording();
+            };
+            this.mediaRecorder.start();
+            this.isRecording = true;
+            this.recordButton.textContent = 'Stop Recording';
+            this.recordButton.className = 'stop btn';
+            this.updateStatus('Recording... (speak now)', 'recording');
+            this.updateDebugInfo('Recording started');
+        } catch (error) {
+            this.updateStatus(`Recording failed: ${error.message}`, 'error');
+            this.updateDebugInfo(`Recording error: ${error.message}`);
+        }
+    }
+    async stopRecording() {
+        if (this.mediaRecorder && this.isRecording) {
+            this.mediaRecorder.stop();
+            this.isRecording = false;
+            this.recordButton.textContent = 'Start Recording';
+            this.recordButton.className = 'start btn';
+            this.updateStatus('Processing recording...', 'processing');
+            this.updateDebugInfo('Recording stopped, processing...');
+            // Stop all tracks
+            this.mediaRecorder.stream.getTracks().forEach(track => track.stop());
+        }
+    }
+    async processRecording() {
+        if (this.audioChunks.length === 0) {
+            this.updateStatus('No audio recorded', 'error');
+            return;
+        }
+        try {
+            // Create blob from chunks
+            const audioBlob = new Blob(this.audioChunks, { type: 'audio/webm;codecs=opus' });
+            this.updateDebugInfo(`Audio blob created: ${audioBlob.size} bytes`);
+            // Convert to base64
+            const audioBase64 = await this.blobToBase64(audioBlob);
+            this.updateDebugInfo(`Base64 length: ${audioBase64.length} characters`);
+            // Send to transcription service
+            await this.transcribeAudio(audioBase64);
+        } catch (error) {
+            this.updateStatus(`Processing failed: ${error.message}`, 'error');
+            this.updateDebugInfo(`Processing error: ${error.message}`);
+        }
+    }
+    async blobToBase64(blob) {
+        return new Promise((resolve, reject) => {
+            const reader = new FileReader();
+            reader.onloadend = () => {
+                const result = reader.result;
+                // Extract base64 part from data URL
+                const base64 = result.split(',')[1];
+                resolve(base64);
+            };
+            reader.onerror = reject;
+            reader.readAsDataURL(blob);
+        });
+    }
+    async transcribeAudio(audioBase64, metadata = {}) {
+        // Gradio API format: data array with inputs in order [audio_base64, language, model_size]
+        // fn_index: 1 corresponds to the second function (transcribe_memory_btn.click)
+        const sessionHash = this.generateSessionHash();
+        const payload = {
+            data: [
+                audioBase64,
+                this.languageSelect.value,
+                this.modelSizeSelect.value
+            ],
+            fn_index: 1,
+            session_hash: sessionHash
+        };
+        this.updateDebugInfo(`Sending Gradio API request: fn_index=1, data=[base64(${audioBase64.length} chars), "${this.languageSelect.value}", "${this.modelSizeSelect.value}"]`);
+        try {
+            const startTime = Date.now();
+            // Try the Gradio API endpoint - uses Gradio's built-in API system
+            this.updateDebugInfo('Trying Gradio /api/simple_transcribe endpoint...');
+            let response = await fetch(`${this.serverUrl}/api/simple_transcribe`, {
+                method: 'POST',
+                headers: {
+                    'Content-Type': 'application/json',
+                },
+                body: JSON.stringify({
+                    data: [audioBase64, this.languageSelect.value, this.modelSizeSelect.value]
+                })
+            });
+            // Check if direct API worked - NO fallback to broken queue system
+            if (!response.ok) {
+                const errorText = await response.text();
+                this.updateDebugInfo(`Gradio /api/simple_transcribe failed: ${response.status} - ${errorText}`);
+                throw new Error(`Direct API failed: ${response.status} - ${errorText}`);
+            }
+            const responseData = await response.json();
+            this.updateDebugInfo(`Response: ${JSON.stringify(responseData)}`);
+            let result;
+            // Check if this is Gradio API response
+            if (responseData.data && Array.isArray(responseData.data) && responseData.data.length > 0) {
+                // Gradio API response - data[0] contains the JSON string from our function
+                const jsonResult = responseData.data[0];
+                this.updateDebugInfo(`Got Gradio API response: ${jsonResult.substring(0, 200)}`);
+                try {
+                    const parsedResult = JSON.parse(jsonResult);
+                    if (parsedResult.success) {
+                        result = parsedResult.transcription;
+                        this.updateDebugInfo(`Parsed transcription: ${result.substring(0, 100)}`);
+                    } else {
+                        throw new Error(parsedResult.error || 'Unknown transcription error');
+                    }
+                } catch (parseError) {
+                    this.updateDebugInfo(`Failed to parse JSON result: ${parseError.message}`);
+                    throw new Error(`Failed to parse transcription result: ${parseError.message}`);
+                }
+            } else {
+                throw new Error(`Unexpected Gradio response format: ${JSON.stringify(responseData)}`);
+            }
+            if (result) {
+                const processingTime = (Date.now() - startTime) / 1000;
+                this.updateStatus('Transcription complete', 'ready');
+                this.addResult(
+                    result,
+                    {
+                        processingTime: `${processingTime.toFixed(2)}s`,
+                        device: 'GPU',
+                        ...metadata
+                    }
+                );
+                this.updateDebugInfo(`Transcription successful: "${result.substring(0, 100)}"`);
+            }
+        } catch (error) {
+            this.updateStatus(`Transcription failed: ${error.message}`, 'error');
+            this.addResult(`❌ Transcription error: ${error.message}`, { device: 'Error' });
+            this.updateDebugInfo(`Transcription error: ${error.message}`);
+        }
+    }
+    async uploadFile() {
+        const file = this.audioFileInput.files[0];
+        if (!file) {
+            this.updateStatus('No file selected', 'error');
+            return;
+        }
+        this.updateStatus(`Processing file: ${file.name}`, 'processing');
+        this.updateDebugInfo(`File selected: ${file.name} (${file.size} bytes)`);
+        try {
+            // Since file upload API is failing with 500 error,
+            // convert file to base64 and use the working base64 API (fn_index: 1)
+            this.updateDebugInfo('Converting file to base64 for base64 API...');
+            const audioBase64 = await this.fileToBase64(file);
+            this.updateDebugInfo(`File converted to base64: ${audioBase64.length} characters`);
+            // Use the working base64 transcription method
+            await this.transcribeAudio(audioBase64, { source: `File: ${file.name}` });
+        } catch (error) {
+            this.updateStatus(`File processing failed: ${error.message}`, 'error');
+            this.updateDebugInfo(`File error: ${error.message}`);
+        }
+    }
+    async fileToBase64(file) {
+        return new Promise((resolve, reject) => {
+            const reader = new FileReader();
+            reader.onloadend = () => {
+                const result = reader.result;
+                const base64 = result.split(',')[1];
+                resolve(base64);
+            };
+            reader.onerror = reject;
+            reader.readAsDataURL(file);
+        });
+    }
+    async testBase64Data() {
+        this.updateStatus('Testing with sample base64 data...', 'processing');
+        this.updateDebugInfo('Testing with minimal base64 data');
+        // Create minimal test data
+        const testData = new Uint8Array(100);
+        testData.fill(42); // Fill with test data
+        const testBlob = new Blob([testData], { type: 'audio/webm' });
+        const testBase64 = await this.blobToBase64(testBlob);
+        await this.transcribeAudio(testBase64, { source: 'Test data' });
+    }
+    handleFileUploadResult(resultArray, startTime, metadata) {
+        // File upload returns 3 values: [transcription, timing, status]
+        const processingTime = (Date.now() - startTime) / 1000;
+        if (resultArray && resultArray.length >= 3) {
+            const [transcription, timing, status] = resultArray;
+            this.updateStatus('File transcription complete', 'ready');
+            this.addResult(
+                transcription,
+                {
+                    processingTime: `${processingTime.toFixed(2)}s`,
+                    timing: timing,
+                    status: status,
+                    device: 'GPU',
+                    ...metadata
+                }
+            );
+            this.updateDebugInfo(`File transcription successful: "${transcription.substring(0, 100)}"`);
+        } else {
+            throw new Error(`Invalid file upload result format: ${JSON.stringify(resultArray)}`);
+        }
+    }
+    async listenForFileUploadResult(queueResponse, startTime, metadata) {
+        // Similar to regular queue listening but expects 3 outputs
+        return new Promise((resolve, reject) => {
+            const wsUrl = this.serverUrl.replace('https://', 'wss://').replace('http://', 'ws://') + '/queue/data';
+            this.updateDebugInfo(`File upload WebSocket: ${wsUrl}`);
+            const ws = new WebSocket(wsUrl);
+            const timeout = setTimeout(() => {
+                ws.close();
+                reject(new Error('File upload queue timeout after 30 seconds'));
+            }, 30000);
+            ws.onopen = () => {
+                this.updateDebugInfo('File upload WebSocket connected');
+                if (queueResponse.event_id) {
+                    ws.send(JSON.stringify({
+                        event_id: queueResponse.event_id
+                    }));
+                    this.updateDebugInfo(`Sent file upload event_id: ${queueResponse.event_id}`);
+                }
+            };
+            ws.onmessage = (event) => {
+                try {
+                    const data = JSON.parse(event.data);
+                    this.updateDebugInfo(`File upload queue message: ${JSON.stringify(data)}`);
+                    if (data.msg === 'process_completed' && data.output && data.output.data) {
+                        clearTimeout(timeout);
+                        ws.close();
+                        this.handleFileUploadResult(data.output.data, startTime, metadata);
+                        resolve(data.output.data);
+                    } else if (data.msg === 'process_starts') {
+                        this.updateStatus('Processing file on server...', 'processing');
+                    }
+                } catch (e) {
+                    this.updateDebugInfo(`File upload WebSocket parse error: ${e.message}`);
+                }
+            };
+            ws.onerror = (error) => {
+                this.updateDebugInfo('File upload WebSocket failed, trying polling...');
+                clearTimeout(timeout);
+                this.pollForFileUploadResult(queueResponse.event_id, startTime, metadata).then(resolve).catch(reject);
+            };
+            ws.onclose = (event) => {
+                this.updateDebugInfo(`File upload WebSocket closed: ${event.code}`);
+                clearTimeout(timeout);
+            };
+        });
+    }
+    async pollForFileUploadResult(eventId, startTime, metadata) {
+        // Polling fallback for file uploads
+        this.updateDebugInfo(`Polling for file upload event: ${eventId}`);
+        const maxAttempts = 30;
+        const pollEndpoints = [
+            `/queue/data?event_id=${eventId}`,
+            `/queue/status?event_id=${eventId}`
+        ];
+        for (let attempt = 0; attempt < maxAttempts; attempt++) {
+            const endpoint = pollEndpoints[attempt % pollEndpoints.length];
+            try {
+                const response = await fetch(`${this.serverUrl}${endpoint}`);
+                if (response.ok) {
+                    const status = await response.json();
+                    this.updateDebugInfo(`File poll attempt ${attempt + 1} (${endpoint}): ${JSON.stringify(status)}`);
+                    if (status.msg === 'process_completed' && status.output && status.output.data) {
+                        this.handleFileUploadResult(status.output.data, startTime, metadata);
+                        return status.output.data;
+                    } else if (status.msg === 'process_starts') {
+                        this.updateStatus('Processing file on server...', 'processing');
+                    }
+                }
+            } catch (e) {
+                this.updateDebugInfo(`File poll error on ${endpoint}: ${e.message}`);
+            }
+            await new Promise(resolve => setTimeout(resolve, 1000));
+        }
+        throw new Error('File upload polling timeout - no result after 30 seconds');
+    }
+}
+// Initialize client when page loads
+document.addEventListener('DOMContentLoaded', () => {
+    window.sttClient = new STTv2Client();
+    console.log('STT v2 Client initialized');
+});

client-stt/v2-index.html ADDED Viewed

	@@ -0,0 +1,243 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>STT GPU Service v2 Client</title>
+    <style>
+        body {
+            font-family: Arial, sans-serif;
+            max-width: 800px;
+            margin: 50px auto;
+            padding: 20px;
+            background-color: #f5f5f5;
+        }
+        .container {
+            background: white;
+            padding: 30px;
+            border-radius: 10px;
+            box-shadow: 0 2px 10px rgba(0,0,0,0.1);
+        }
+        h1 {
+            text-align: center;
+            color: #333;
+            margin-bottom: 30px;
+        }
+        .controls {
+            display: flex;
+            gap: 15px;
+            margin-bottom: 20px;
+            align-items: center;
+            flex-wrap: wrap;
+        }
+        #recordButton {
+            padding: 12px 24px;
+            font-size: 16px;
+            border: none;
+            border-radius: 5px;
+            cursor: pointer;
+            transition: background-color 0.3s;
+        }
+        #recordButton.start {
+            background-color: #4CAF50;
+            color: white;
+        }
+        #recordButton.stop {
+            background-color: #f44336;
+            color: white;
+        }
+        #recordButton:disabled {
+            background-color: #cccccc;
+            cursor: not-allowed;
+        }
+        .input-group {
+            display: flex;
+            flex-direction: column;
+            gap: 5px;
+        }
+        label {
+            font-weight: bold;
+            color: #333;
+        }
+        input[type="text"], select, input[type="file"] {
+            padding: 8px 12px;
+            border: 1px solid #ddd;
+            border-radius: 4px;
+            font-size: 14px;
+        }
+        .status {
+            padding: 10px;
+            border-radius: 5px;
+            margin-bottom: 20px;
+            font-weight: bold;
+        }
+        .status.ready {
+            background-color: #d4edda;
+            color: #155724;
+        }
+        .status.recording {
+            background-color: #fff3cd;
+            color: #856404;
+        }
+        .status.processing {
+            background-color: #d1ecf1;
+            color: #0c5460;
+        }
+        .status.error {
+            background-color: #f8d7da;
+            color: #721c24;
+        }
+        .transcription {
+            background-color: #f8f9fa;
+            border: 1px solid #dee2e6;
+            border-radius: 5px;
+            padding: 20px;
+            min-height: 200px;
+            max-height: 400px;
+            overflow-y: auto;
+            font-size: 16px;
+            line-height: 1.5;
+            white-space: pre-wrap;
+            word-wrap: break-word;
+            scroll-behavior: smooth;
+        }
+        .result-item {
+            background: #f0f8ff;
+            border-left: 4px solid #007bff;
+            padding: 10px;
+            margin: 10px 0;
+            border-radius: 4px;
+        }
+        .result-meta {
+            font-size: 12px;
+            color: #666;
+            margin-bottom: 5px;
+        }
+        .result-text {
+            font-size: 16px;
+            color: #333;
+        }
+        .button-group {
+            display: flex;
+            gap: 10px;
+            flex-wrap: wrap;
+        }
+        .btn {
+            padding: 12px 24px;
+            border: none;
+            border-radius: 5px;
+            cursor: pointer;
+            font-size: 14px;
+            transition: background-color 0.3s;
+        }
+        .btn-primary { background-color: #007bff; color: white; }
+        .btn-success { background-color: #28a745; color: white; }
+        .btn-warning { background-color: #ffc107; color: black; }
+        .btn-info { background-color: #17a2b8; color: white; }
+    </style>
+</head>
+<body>
+    <div class="container">
+        <h1>STT GPU Service v2 Client</h1>
+        <p style="text-align: center; color: #666; margin-bottom: 20px;">
+            <strong>Backend:</strong> pgits-stt-gpu-service-v2 (HTTP API, GPU-accelerated Whisper)
+        </p>
+        <div class="controls">
+            <div class="button-group">
+                <button id="recordButton" class="start btn">Start Recording</button>
+                <button id="testHealthButton" class="btn btn-info">Test Health</button>
+                <button id="clearButton" class="btn btn-warning">Clear Results</button>
+            </div>
+            <div class="input-group">
+                <label for="serverUrl">Server URL:</label>
+                <input type="text" id="serverUrl" value="https://pgits-stt-gpu-service-v2.hf.space" />
+            </div>
+            <div class="input-group">
+                <label for="language">Language:</label>
+                <select id="language">
+                    <option value="en">English</option>
+                    <option value="es">Spanish</option>
+                    <option value="fr">French</option>
+                    <option value="de">German</option>
+                    <option value="it">Italian</option>
+                    <option value="pt">Portuguese</option>
+                    <option value="ru">Russian</option>
+                    <option value="ja">Japanese</option>
+                    <option value="ko">Korean</option>
+                    <option value="zh">Chinese</option>
+                    <option value="auto">Auto Detect</option>
+                </select>
+            </div>
+            <div class="input-group">
+                <label for="modelSize">Model Size:</label>
+                <select id="modelSize">
+                    <option value="tiny">Tiny (Fast)</option>
+                    <option value="base" selected>Base (Balanced)</option>
+                    <option value="small">Small (Better)</option>
+                    <option value="medium">Medium (High Quality)</option>
+                    <option value="large">Large (Best Quality)</option>
+                </select>
+            </div>
+        </div>
+        <div class="controls" style="border-top: 1px solid #dee2e6; padding-top: 20px;">
+            <div class="input-group">
+                <label for="audioFileInput">Test with Audio File:</label>
+                <input type="file" id="audioFileInput" accept="audio/*" />
+            </div>
+            <div class="button-group">
+                <button id="uploadFileButton" class="btn btn-primary">Upload & Transcribe File</button>
+                <button id="testBase64Button" class="btn btn-success">Test Base64 Data</button>
+            </div>
+            <p style="font-size: 12px; color: #666; margin-top: 10px;">
+                📄 v2 service supports both file upload and WebRTC base64 audio processing
+            </p>
+        </div>
+        <div id="status" class="status ready">Ready - Click "Test Health" to verify service</div>
+        <div class="transcription" id="transcription">
+            Click "Test Health" to verify service connectivity, then try recording or upload a file...
+        </div>
+        <div style="background: #f8f9fa; border: 1px solid #dee2e6; border-radius: 5px; padding: 15px; margin-top: 15px;">
+            <h4 style="margin-top: 0; color: #495057;">Debug Information</h4>
+            <div id="debugInfo" style="font-family: monospace; font-size: 11px; color: #6c757d;">
+                Ready for testing...
+            </div>
+        </div>
+        <div style="text-align: center; margin-top: 10px; padding: 10px; border-top: 1px solid #dee2e6; color: #6c757d; font-size: 12px;">
+            STT GPU Service v2 Client v1.0.0 | HTTP API | HuggingFace GPU Backend
+        </div>
+    </div>
+    <script src="v2-audio-client.js"></script>
+</body>
+</html>

test_client.py ADDED Viewed

	@@ -0,0 +1,186 @@

+#!/usr/bin/env python3
+"""
+Test client for STT GPU Service on HuggingFace Spaces
+Tests the transcription API with audio files or generates test audio
+"""
+import os
+import sys
+import base64
+import json
+import requests
+import tempfile
+import time
+from typing import Optional
+# HuggingFace Space URL
+STT_SERVICE_URL = "https://pgits-stt-gpu-service-v2.hf.space"
+def test_health_check():
+    """Test the health endpoint"""
+    print("🔍 Testing health check...")
+    try:
+        response = requests.get(f"{STT_SERVICE_URL}/api/health", timeout=10)
+        if response.status_code == 200:
+            print("✅ Health check passed")
+            return True
+        else:
+            print(f"❌ Health check failed: {response.status_code}")
+            return False
+    except Exception as e:
+        print(f"❌ Health check error: {e}")
+        return False
+def create_test_audio_file() -> str:
+    """Create a simple test audio file using system tools"""
+    print("🎵 Creating test audio file...")
+    try:
+        # Try to create a simple sine wave audio file using ffmpeg if available
+        test_audio_path = "/tmp/test_audio.wav"
+        # Generate 3 seconds of 440Hz sine wave
+        os.system(f"ffmpeg -f lavfi -i 'sine=frequency=440:duration=3' -y {test_audio_path} 2>/dev/null")
+        if os.path.exists(test_audio_path) and os.path.getsize(test_audio_path) > 1000:
+            print(f"✅ Test audio created: {test_audio_path}")
+            return test_audio_path
+        else:
+            print("⚠️  ffmpeg not available or failed, using minimal test data")
+            return None
+    except Exception as e:
+        print(f"⚠️  Could not create test audio: {e}")
+        return None
+def test_transcription_with_file(audio_file_path: str, language: str = "en", model_size: str = "base"):
+    """Test transcription with an audio file"""
+    print(f"🎤 Testing transcription with file: {audio_file_path}")
+    try:
+        # Read and encode audio file
+        with open(audio_file_path, "rb") as f:
+            audio_data = f.read()
+            audio_base64 = base64.b64encode(audio_data).decode('utf-8')
+        print(f"📄 Audio file size: {len(audio_data)} bytes")
+        print(f"📄 Base64 size: {len(audio_base64)} characters")
+        # Send transcription request
+        payload = {
+            "audio_base64": audio_base64,
+            "language": language,
+            "model_size": model_size
+        }
+        print(f"🚀 Sending request to {STT_SERVICE_URL}/api/transcribe")
+        start_time = time.time()
+        response = requests.post(
+            f"{STT_SERVICE_URL}/api/transcribe",
+            json=payload,
+            timeout=30,
+            headers={"Content-Type": "application/json"}
+        )
+        processing_time = time.time() - start_time
+        print(f"📊 Request completed in {processing_time:.2f}s")
+        print(f"📊 Response status: {response.status_code}")
+        if response.status_code == 200:
+            try:
+                result = response.json()
+                print(f"✅ Transcription result: {result}")
+                return result
+            except json.JSONDecodeError:
+                print(f"✅ Transcription result (text): {response.text}")
+                return response.text
+        else:
+            print(f"❌ Request failed: {response.status_code}")
+            print(f"❌ Error response: {response.text}")
+            return None
+    except Exception as e:
+        print(f"❌ Transcription test failed: {e}")
+        return None
+def test_transcription_with_minimal_data():
+    """Test transcription with minimal test data"""
+    print("🧪 Testing with minimal base64 data...")
+    # Create minimal test data (should trigger demo response)
+    test_data = b"test audio data for demo"
+    audio_base64 = base64.b64encode(test_data).decode('utf-8')
+    payload = {
+        "audio_base64": audio_base64,
+        "language": "en",
+        "model_size": "base"
+    }
+    try:
+        response = requests.post(
+            f"{STT_SERVICE_URL}/api/transcribe",
+            json=payload,
+            timeout=10,
+            headers={"Content-Type": "application/json"}
+        )
+        if response.status_code == 200:
+            try:
+                result = response.json()
+                print(f"✅ Demo test result: {result}")
+            except json.JSONDecodeError:
+                print(f"✅ Demo test result (text): {response.text}")
+        else:
+            print(f"❌ Demo test failed: {response.status_code} - {response.text}")
+    except Exception as e:
+        print(f"❌ Demo test error: {e}")
+def main():
+    """Main test function"""
+    print("🎤 STT GPU Service Test Client")
+    print("=" * 50)
+    # Test health check first
+    if not test_health_check():
+        print("❌ Service appears to be down, continuing with other tests...")
+    print()
+    # Test with minimal data first
+    test_transcription_with_minimal_data()
+    print()
+    # Check if user provided an audio file
+    if len(sys.argv) > 1:
+        audio_file = sys.argv[1]
+        if os.path.exists(audio_file):
+            print(f"🎵 Using provided audio file: {audio_file}")
+            test_transcription_with_file(audio_file)
+        else:
+            print(f"❌ Audio file not found: {audio_file}")
+    else:
+        # Try to create test audio
+        test_audio = create_test_audio_file()
+        if test_audio:
+            test_transcription_with_file(test_audio)
+            # Clean up
+            try:
+                os.unlink(test_audio)
+            except:
+                pass
+        else:
+            print("⚠️  No audio file provided and couldn't create test audio")
+            print("💡 Usage: python test_client.py [audio_file.wav]")
+            print("💡 Or install ffmpeg to auto-generate test audio")
+    print()
+    print("🎯 Test complete!")
+    print("💡 You can also test via web interface at:")
+    print(f"   {STT_SERVICE_URL}")
+if __name__ == "__main__":
+    main()