Spaces:

pgits
/

voiceCal-ai-v2

Sleeping

App Files Files Community

voiceCal-ai-v2 / LinkedIn.md

pgits

FEATURE: Groq STT Integration - Replace HuggingFace with Groq Whisper

74910cf 7 months ago

preview code

raw

history blame contribute delete

4.83 kB

🚀 Groq STT Integration Plan: HuggingFace to Groq Migration Strategy

Executive Summary

Following our successful TTS migration from Kyutai HuggingFace service to Groq (achieving significant performance improvements), we're now planning a surgical replacement of our Speech-to-Text (STT) service from HuggingFace STT-GPU-Service-v2 to Groq's Whisper-large-v3-turbo implementation.

Current STT Architecture (To Be Replaced)

HuggingFace Integration:

External service: pgits-stt-gpu-service-v2.hf.space
Complex WebSocket queue system for results
HTTP POST → WebSocket listener pattern
Base64 audio transmission
Gradio client integration with session management

Technical Stack:

Frontend: JavaScript MediaRecorder → Base64 conversion
Transport: HTTP POST + WebSocket queue listener
Backend: External HuggingFace Spaces service
Dependencies: External service availability, queue management

Proposed Groq STT Architecture

Groq Integration:

Direct API calls to Groq's Whisper service
Simplified HTTP request/response pattern
FastAPI proxy endpoint for CORS handling
Same audio quality with reduced complexity

Implementation Details:

# New FastAPI Endpoint
@app.post("/api/stt/transcribe")
async def stt_transcribe(file: UploadFile = File(...)):
    client = Groq(api_key=os.environ.get("GROQ_API_KEY"))

    transcription = client.audio.transcriptions.create(
        file=file.file,
        model="whisper-large-v3-turbo",
        response_format="json",
        language="en",
        temperature=0.0
    )

    return {"text": transcription.text}

// Simplified Frontend Integration
async transcribeAudio(audioBase64) {
    const audioBlob = this.base64ToBlob(audioBase64);
    const formData = new FormData();
    formData.append('file', audioBlob, 'audio.wav');

    const response = await fetch('/api/stt/transcribe', {
        method: 'POST', body: formData
    });

    const result = await response.json();
    this.addTranscriptionToInput(result.text);
}

Migration Benefits

Performance Improvements

Elimination of WebSocket complexity - Direct HTTP API calls
Reduced latency - No external queue system
Faster transcription - Groq's optimized Whisper implementation
Simplified error handling - No connection state management

Operational Benefits

Consolidated authentication - Uses existing GROQ_API_KEY
Reduced dependencies - No external HuggingFace service reliance
Cost optimization - Direct API usage vs. external compute
Improved reliability - Fewer points of failure

Development Benefits

Code simplification - Remove WebSocket queue logic
Easier debugging - Standard HTTP request/response pattern
Better error visibility - Direct API error responses
Consistent architecture - Matches our TTS implementation pattern

Surgical Implementation Plan

Files to Modify (Minimal Impact)

app/api/main.py - Add new /api/stt/transcribe endpoint
app/api/chat_widget.py - Replace transcribeAudio() method (lines 1151-1211)
Requirements - Already satisfied (groq>=0.4.0 from TTS migration)

Files NOT Modified (Preservation Strategy)

Audio recording logic (MediaRecorder)
Visual state management (STT indicators)
User interface components
Session management
TTS interruption system (recently enhanced)

Risk Mitigation

Identical API contract - Same input (audio) → output (text) pattern
Progressive deployment - Can switch back via configuration
Preserved user experience - No UI changes required
Same audio quality - WebM/Opus → Whisper transcription path maintained

Success Metrics

Transcription latency reduction (target: <2 seconds)
Error rate improvement (eliminate WebSocket timeouts)
Code complexity reduction (remove 100+ lines of WebSocket handling)
Infrastructure simplification (single API key vs. external service)

Timeline

Phase 1: Implementation (FastAPI endpoint + frontend method)
Phase 2: Testing (transcription accuracy and performance)
Phase 3: Deployment (surgical replacement with rollback capability)

Architectural Philosophy

This migration continues our platform consolidation strategy: moving from distributed external services to unified API providers while maintaining service quality and user experience. The Groq ecosystem (TTS + STT) provides performance advantages and operational simplification compared to our current mixed-provider approach.

This document serves as the technical blueprint for our HuggingFace → Groq STT migration, ensuring stakeholder alignment and implementation clarity.

#AI #SpeechToText #Groq #HuggingFace #TechnicalStrategy #VoiceAI #SystemArchitecture