Spaces:

pgits
/

voiceCal-ai-v2

Sleeping

pgits Claude commited on Sep 25, 2025

Commit

74910cf

1 Parent(s): e4ce72e

FEATURE: Groq STT Integration - Replace HuggingFace with Groq Whisper

- Add new /api/stt/transcribe endpoint using Groq Whisper-large-v3-turbo
- Replace complex WebSocket transcribeAudio method with direct HTTP API calls
- Remove 90+ lines of WebSocket queue management and polling logic
- Simplified error handling with standard HTTP request/response pattern
- Maintains identical user experience while improving performance and reliability
- Uses existing GROQ_API_KEY for consolidated authentication

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

Files changed (4) hide show

LinkedIn.md +110 -87
app/api/chat_widget.py +13 -124
app/api/main.py +53 -0
version.txt +1 -1

LinkedIn.md CHANGED Viewed

@@ -1,95 +1,118 @@
-# 🚀 VoiceCal.ai v1.0.0 - Clean Architecture Migration
-## Migration Success: From Nested Chaos to Clean Deployment
-Today I successfully migrated **ChatCal.ai to VoiceCal.ai v1.0.0** with a completely clean repository structure for HuggingFace Spaces deployment.
-### The Problem We Solved
-- **Nested directory confusion**: `/chatcal-ai/chatcal-ai/app/` causing path issues
-- **Git workflow inconsistencies**: File upload vs. proper version control
-- **Docker caching issues**: HuggingFace showing stale Dockerfiles
-- **Deployment complexity**: Multiple path adjustments breaking deployments
-### The Solution: Fresh Start Architecture
-#### Before (Problematic):
-```
-/chatCal.ai/chatcal-ai/  (repo root)
-├── chatcal-ai/          (actual app nested)
-├── speech2text/         (unrelated)
-├── CLAUDE.md           (unrelated)
-└── legacy files...
 ```
-#### After (Clean):
-```
-/voiceCal-ai-v1/        (repo root = HF Space name)
-├── app/                (FastAPI directly at root)
-├── Dockerfile          (standard deployment)
-├── app.py             (simple entry point)
-├── requirements.txt
-└── pyproject.toml
-```
-### Technical Improvements
-#### 🏗️ **Architecture**
-- **Standard Docker structure**: No more nested path confusion
-- **Direct file access**: All imports work without path manipulation
-- **Clean git workflow**: Proper semantic versioning (v1.0.0)
-#### 🔧 **Development**
-- **Debugging tools**: wget, git, curl, procps, htop, nano, vim, net-tools, lsof, strace
-- **Clean Dockerfile**: No more tee command causing container exit
-- **Proper logging**: Both stdout and `/tmp/app.log` for SSH debugging
-#### 🎯 **Deployment**
-- **Fresh HuggingFace Space**: `pgits/voiceCal-ai-v1`
-- **No legacy caching**: Clean start eliminates previous issues
-- **Consistent versioning**: Semantic version matches deployment
-### Migration Process
-1. ✅ **Created clean directory structure** matching HF Space name
-2. ✅ **Copied essential files** (app/, Dockerfile, requirements.txt, etc.)
-3. ✅ **Updated all path references** for standard deployment
-4. ✅ **Initialized fresh git repo** with proper main branch
-5. ✅ **Ready for fresh HF Space** deployment
-### Key Lessons Learned
-#### 🎯 **Project Structure Matters**
-- Directory naming should match deployment target
-- Avoid nested structures that complicate Docker deployments
-- Keep unrelated files separate from deployment artifacts
-#### 🔄 **Git Workflow Discipline**
-- Always use proper git workflow vs. ad-hoc file uploads
-- Semantic versioning prevents deployment confusion
-- Clean commit history aids debugging
-#### 🐳 **Docker Best Practices**
-- Standard WORKDIR structure
-- Include debugging tools for production troubleshooting
-- Avoid complex shell commands in CMD that can cause exit issues
-### Next Steps
-Now ready to create fresh HuggingFace Space `pgits/voiceCal-ai-v1` with:
-- ✅ Clean repository structure
-- ✅ Standard Docker deployment
-- ✅ All debugging tools included
-- ✅ Proper git workflow established
-- ✅ Semantic versioning (v1.0.0)
-**Migration Summary:**
-- **Old**: Nested complexity, path confusion, deployment issues
-- **New**: Clean structure, standard paths, reliable deployment
-This migration eliminates the file inconsistencies and deployment issues we were experiencing, setting up a professional foundation for future development.
 ---
-#VoiceAI #HuggingFace #Docker #SoftwareEngineering #CleanArchitecture #FastAPI #Deployment #GitWorkflow
-**VoiceCal.ai v1.0.0** - Professional voice-first calendar booking with Google Calendar integration, STT/TTS, and Groq LLM backend.

+# 🚀 Groq STT Integration Plan: HuggingFace to Groq Migration Strategy
+## Executive Summary
+Following our successful TTS migration from Kyutai HuggingFace service to Groq (achieving significant performance improvements), we're now planning a surgical replacement of our Speech-to-Text (STT) service from HuggingFace STT-GPU-Service-v2 to Groq's Whisper-large-v3-turbo implementation.
+## Current STT Architecture (To Be Replaced)
+**HuggingFace Integration:**
+- External service: `pgits-stt-gpu-service-v2.hf.space`
+- Complex WebSocket queue system for results
+- HTTP POST → WebSocket listener pattern
+- Base64 audio transmission
+- Gradio client integration with session management
+**Technical Stack:**
+- Frontend: JavaScript MediaRecorder → Base64 conversion
+- Transport: HTTP POST + WebSocket queue listener
+- Backend: External HuggingFace Spaces service
+- Dependencies: External service availability, queue management
+## Proposed Groq STT Architecture
+**Groq Integration:**
+- Direct API calls to Groq's Whisper service
+- Simplified HTTP request/response pattern
+- FastAPI proxy endpoint for CORS handling
+- Same audio quality with reduced complexity
+**Implementation Details:**
+```python
+# New FastAPI Endpoint
+@app.post("/api/stt/transcribe")
+async def stt_transcribe(file: UploadFile = File(...)):
+    client = Groq(api_key=os.environ.get("GROQ_API_KEY"))
+    transcription = client.audio.transcriptions.create(
+        file=file.file,
+        model="whisper-large-v3-turbo",
+        response_format="json",
+        language="en",
+        temperature=0.0
+    )
+    return {"text": transcription.text}
 ```
+```javascript
+// Simplified Frontend Integration
+async transcribeAudio(audioBase64) {
+    const audioBlob = this.base64ToBlob(audioBase64);
+    const formData = new FormData();
+    formData.append('file', audioBlob, 'audio.wav');
+    const response = await fetch('/api/stt/transcribe', {
+        method: 'POST', body: formData
+    });
+    const result = await response.json();
+    this.addTranscriptionToInput(result.text);
+}
+```
+## Migration Benefits
+### Performance Improvements
+- **Elimination of WebSocket complexity** - Direct HTTP API calls
+- **Reduced latency** - No external queue system
+- **Faster transcription** - Groq's optimized Whisper implementation
+- **Simplified error handling** - No connection state management
+### Operational Benefits
+- **Consolidated authentication** - Uses existing GROQ_API_KEY
+- **Reduced dependencies** - No external HuggingFace service reliance
+- **Cost optimization** - Direct API usage vs. external compute
+- **Improved reliability** - Fewer points of failure
+### Development Benefits
+- **Code simplification** - Remove WebSocket queue logic
+- **Easier debugging** - Standard HTTP request/response pattern
+- **Better error visibility** - Direct API error responses
+- **Consistent architecture** - Matches our TTS implementation pattern
+## Surgical Implementation Plan
+### Files to Modify (Minimal Impact)
+1. **app/api/main.py** - Add new `/api/stt/transcribe` endpoint
+2. **app/api/chat_widget.py** - Replace `transcribeAudio()` method (lines 1151-1211)
+3. **Requirements** - Already satisfied (groq>=0.4.0 from TTS migration)
+### Files NOT Modified (Preservation Strategy)
+- Audio recording logic (MediaRecorder)
+- Visual state management (STT indicators)
+- User interface components
+- Session management
+- TTS interruption system (recently enhanced)
+## Risk Mitigation
+- **Identical API contract** - Same input (audio) → output (text) pattern
+- **Progressive deployment** - Can switch back via configuration
+- **Preserved user experience** - No UI changes required
+- **Same audio quality** - WebM/Opus → Whisper transcription path maintained
+## Success Metrics
+- **Transcription latency reduction** (target: <2 seconds)
+- **Error rate improvement** (eliminate WebSocket timeouts)
+- **Code complexity reduction** (remove 100+ lines of WebSocket handling)
+- **Infrastructure simplification** (single API key vs. external service)
+## Timeline
+- **Phase 1:** Implementation (FastAPI endpoint + frontend method)
+- **Phase 2:** Testing (transcription accuracy and performance)
+- **Phase 3:** Deployment (surgical replacement with rollback capability)
+## Architectural Philosophy
+This migration continues our platform consolidation strategy: moving from distributed external services to unified API providers while maintaining service quality and user experience. The Groq ecosystem (TTS + STT) provides performance advantages and operational simplification compared to our current mixed-provider approach.
 ---
+*This document serves as the technical blueprint for our HuggingFace → Groq STT migration, ensuring stakeholder alignment and implementation clarity.*
+#AI #SpeechToText #Groq #HuggingFace #TechnicalStrategy #VoiceAI #SystemArchitecture

app/api/chat_widget.py CHANGED Viewed

@@ -1149,51 +1149,33 @@ async def chat_widget(request: Request, email: str = None):
                 }
                 async transcribeAudio(audioBase64) {
-                    const sessionHash = this.generateSessionHash();
-                    const payload = {
-                        data: [
-                            audioBase64,
-                            this.language,
-                            this.modelSize
-                        ],
-                        session_hash: sessionHash
-                    };
-                    console.log(`📤 Sending to STT v2 service: ${this.serverUrl}/call/gradio_transcribe_memory`);
                     try {
                         const startTime = Date.now();
-                        const response = await fetch(`${this.serverUrl}/call/gradio_transcribe_memory`, {
                             method: 'POST',
-                            headers: {
-                                'Content-Type': 'application/json',
-                            },
-                            body: JSON.stringify(payload)
                         });
                         if (!response.ok) {
-                            throw new Error(`STT v2 request failed: ${response.status}`);
                         }
                         const responseData = await response.json();
-                        console.log('📨 STT v2 queue response:', responseData);
-                        let result;
-                        if (responseData.event_id) {
-                            console.log(`🎯 Got queue event_id: ${responseData.event_id}`);
-                            result = await this.listenForQueueResult(responseData, startTime, sessionHash);
-                        } else if (responseData.data && Array.isArray(responseData.data)) {
-                            result = responseData.data[0];
-                            console.log('📥 Got direct response from queue');
-                        } else {
-                            throw new Error(`Unexpected response format: ${JSON.stringify(responseData)}`);
-                        }
                         if (result && result.trim()) {
                             const processingTime = (Date.now() - startTime) / 1000;
-                            console.log(`✅ STT v2 transcription successful (${processingTime.toFixed(2)}s): "${result.substring(0, 100)}"`);
                             // Add transcription to message input
                             this.addTranscriptionToInput(result);
@@ -1204,105 +1186,12 @@ async def chat_widget(request: Request, email: str = None):
                         }
                     } catch (error) {
-                        console.error('❌ STT v2 transcription failed:', error);
                         updateSTTVisualState('error');
                         setTimeout(() => updateSTTVisualState('ready'), 3000);
                     }
                 }
-                async listenForQueueResult(queueResponse, startTime, sessionHash) {
-                    return new Promise((resolve, reject) => {
-                        const wsUrl = this.serverUrl.replace('https://', 'wss://').replace('http://', 'ws://') + '/queue/data';
-                        console.log(`🔌 Connecting to STT v2 WebSocket: ${wsUrl}`);
-                        const ws = new WebSocket(wsUrl);
-                        const timeout = setTimeout(() => {
-                            ws.close();
-                            reject(new Error('STT v2 queue timeout after 30 seconds'));
-                        }, 30000);
-                        ws.onopen = () => {
-                            console.log('✅ STT v2 WebSocket connected');
-                            if (queueResponse.event_id) {
-                                ws.send(JSON.stringify({
-                                    event_id: queueResponse.event_id
-                                }));
-                                console.log(`📤 Sent event_id: ${queueResponse.event_id}`);
-                            }
-                        };
-                        ws.onmessage = (event) => {
-                            try {
-                                const data = JSON.parse(event.data);
-                                console.log('📨 STT v2 queue message:', data);
-                                if (data.msg === 'process_completed' && data.output && data.output.data) {
-                                    clearTimeout(timeout);
-                                    ws.close();
-                                    resolve(data.output.data[0]);
-                                } else if (data.msg === 'process_starts') {
-                                    updateSTTVisualState('processing');
-                                }
-                            } catch (e) {
-                                console.warn('⚠️ STT v2 WebSocket parse error:', e.message);
-                            }
-                        };
-                        ws.onerror = (error) => {
-                            console.error('❌ STT v2 WebSocket error:', error);
-                            clearTimeout(timeout);
-                            // Try polling as fallback
-                            this.pollForResult(queueResponse.event_id, startTime, sessionHash).then(resolve).catch(reject);
-                        };
-                        ws.onclose = (event) => {
-                            console.log(`🔌 STT v2 WebSocket closed: code=${event.code}`);
-                            clearTimeout(timeout);
-                        };
-                    });
-                }
-                async pollForResult(eventId, startTime, sessionHash) {
-                    console.log(`🔄 Starting STT v2 polling for event: ${eventId}`);
-                    const maxAttempts = 20;
-                    for (let attempt = 0; attempt < maxAttempts; attempt++) {
-                        try {
-                            const endpoint = `/queue/data?event_id=${eventId}&session_hash=${sessionHash}`;
-                            const response = await fetch(`${this.serverUrl}${endpoint}`);
-                            if (response.ok) {
-                                const responseText = await response.text();
-                                console.log(`📊 STT v2 poll attempt ${attempt + 1}: ${responseText.substring(0, 200)}`);
-                                if (responseText.includes('data: ')) {
-                                    const lines = responseText.split('\\n');
-                                    for (const line of lines) {
-                                        if (line.startsWith('data: ')) {
-                                            try {
-                                                const data = JSON.parse(line.substring(6));
-                                                if (data.msg === 'process_completed' && data.output && data.output.data) {
-                                                    return data.output.data[0];
-                                                }
-                                            } catch (parseError) {
-                                                console.warn('⚠️ STT v2 SSE parse error:', parseError.message);
-                                            }
-                                        }
-                                    }
-                                }
-                            }
-                        } catch (e) {
-                            console.warn(`⚠️ STT v2 poll error attempt ${attempt + 1}:`, e.message);
-                        }
-                        // Progressive delay
-                        const delay = attempt < 5 ? 200 : 500;
-                        await new Promise(resolve => setTimeout(resolve, delay));
-                    }
-                    throw new Error('STT v2 polling timeout - no result after 20 attempts');
-                }
                 addTranscriptionToInput(transcription) {
                     const currentValue = messageInput.value;

                 }
                 async transcribeAudio(audioBase64) {
+                    console.log(`📤 Sending to Groq STT service: /api/stt/transcribe`);
                     try {
                         const startTime = Date.now();
+                        // Convert base64 to blob
+                        const audioBlob = this.base64ToBlob(audioBase64);
+                        const formData = new FormData();
+                        formData.append('file', audioBlob, 'audio.wav');
+                        const response = await fetch('/api/stt/transcribe', {
                             method: 'POST',
+                            body: formData
                         });
                         if (!response.ok) {
+                            throw new Error(`Groq STT request failed: ${response.status}`);
                         }
                         const responseData = await response.json();
+                        console.log('📨 Groq STT response:', responseData);
+                        const result = responseData.text;
                         if (result && result.trim()) {
                             const processingTime = (Date.now() - startTime) / 1000;
+                            console.log(`✅ Groq STT transcription successful (${processingTime.toFixed(2)}s): "${result.substring(0, 100)}"`);
                             // Add transcription to message input
                             this.addTranscriptionToInput(result);
                         }
                     } catch (error) {
+                        console.error('❌ Groq STT transcription failed:', error);
                         updateSTTVisualState('error');
                         setTimeout(() => updateSTTVisualState('ready'), 3000);
                     }
                 }
                 addTranscriptionToInput(transcription) {
                     const currentValue = messageInput.value;

app/api/main.py CHANGED Viewed

@@ -701,6 +701,59 @@ async def get_tts_audio(file_id: str):
         raise HTTPException(status_code=500, detail=f"Audio serving failed: {str(e)}")
 @app.get("/auth/login", response_model=AuthResponse)
 async def google_auth_login(request: Request, state: Optional[str] = None):
     """Initiate Google OAuth login."""

         raise HTTPException(status_code=500, detail=f"Audio serving failed: {str(e)}")
+@app.post("/api/stt/transcribe")
+async def stt_transcribe(file: UploadFile = File(...)):
+    """STT transcription using Groq Whisper API."""
+    try:
+        from groq import Groq
+        import os
+        import tempfile
+        logger.info(f"🎤 STT transcription request: {file.filename} ({file.content_type})")
+        # Create Groq STT client
+        client = Groq(api_key=os.environ.get("GROQ_API_KEY"))
+        # Time the STT generation for performance monitoring
+        import time
+        start_time = time.time()
+        # Create transcription using Groq Whisper
+        transcription = client.audio.transcriptions.create(
+            file=file.file,
+            model="whisper-large-v3-turbo",
+            response_format="json",
+            language="en",
+            temperature=0.0
+        )
+        transcription_time = time.time() - start_time
+        logger.info(f"⏱️ STT transcription took {transcription_time:.2f} seconds")
+        if transcription and transcription.text:
+            logger.info(f"🎤 STT transcription successful: \"{transcription.text[:100]}...\"")
+            return {
+                "success": True,
+                "text": transcription.text,
+                "processing_time": round(transcription_time, 2)
+            }
+        else:
+            raise HTTPException(status_code=500, detail="Empty transcription result")
+    except Exception as e:
+        # Enhanced error logging for Groq API issues
+        error_msg = str(e)
+        if "Error code: 401" in error_msg:
+            logger.error("Groq API authentication error - check GROQ_API_KEY")
+            raise HTTPException(status_code=500, detail="STT service authentication failed")
+        elif "Error code: 400" in error_msg:
+            logger.error(f"Groq API validation error: {error_msg}")
+            raise HTTPException(status_code=400, detail="STT request validation failed")
+        else:
+            logger.error(f"STT transcription error: {e}")
+            raise HTTPException(status_code=500, detail=f"STT transcription failed: {str(e)}")
 @app.get("/auth/login", response_model=AuthResponse)
 async def google_auth_login(request: Request, state: Optional[str] = None):
     """Initiate Google OAuth login."""

version.txt CHANGED Viewed

	@@ -1 +1 @@
1	- 2.0.4-~~enhanced~~-~~tts~~-~~interrupt~~


1	+ 2.0.5-groq-stt-integration