Spaces:
Sleeping
Sleeping
FEATURE: Groq STT Integration - Replace HuggingFace with Groq Whisper
Browse files- Add new /api/stt/transcribe endpoint using Groq Whisper-large-v3-turbo
- Replace complex WebSocket transcribeAudio method with direct HTTP API calls
- Remove 90+ lines of WebSocket queue management and polling logic
- Simplified error handling with standard HTTP request/response pattern
- Maintains identical user experience while improving performance and reliability
- Uses existing GROQ_API_KEY for consolidated authentication
π€ Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- LinkedIn.md +110 -87
- app/api/chat_widget.py +13 -124
- app/api/main.py +53 -0
- version.txt +1 -1
LinkedIn.md
CHANGED
|
@@ -1,95 +1,118 @@
|
|
| 1 |
-
# π
|
| 2 |
-
|
| 3 |
-
##
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
-
|
| 9 |
-
-
|
| 10 |
-
-
|
| 11 |
-
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
```
|
| 23 |
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
βββ requirements.txt
|
| 31 |
-
βββ pyproject.toml
|
| 32 |
-
```
|
| 33 |
-
|
| 34 |
-
### Technical Improvements
|
| 35 |
-
|
| 36 |
-
#### ποΈ **Architecture**
|
| 37 |
-
- **Standard Docker structure**: No more nested path confusion
|
| 38 |
-
- **Direct file access**: All imports work without path manipulation
|
| 39 |
-
- **Clean git workflow**: Proper semantic versioning (v1.0.0)
|
| 40 |
-
|
| 41 |
-
#### π§ **Development**
|
| 42 |
-
- **Debugging tools**: wget, git, curl, procps, htop, nano, vim, net-tools, lsof, strace
|
| 43 |
-
- **Clean Dockerfile**: No more tee command causing container exit
|
| 44 |
-
- **Proper logging**: Both stdout and `/tmp/app.log` for SSH debugging
|
| 45 |
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
- **Consistent versioning**: Semantic version matches deployment
|
| 50 |
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
3. β
**Updated all path references** for standard deployment
|
| 56 |
-
4. β
**Initialized fresh git repo** with proper main branch
|
| 57 |
-
5. β
**Ready for fresh HF Space** deployment
|
| 58 |
-
|
| 59 |
-
### Key Lessons Learned
|
| 60 |
-
|
| 61 |
-
#### π― **Project Structure Matters**
|
| 62 |
-
- Directory naming should match deployment target
|
| 63 |
-
- Avoid nested structures that complicate Docker deployments
|
| 64 |
-
- Keep unrelated files separate from deployment artifacts
|
| 65 |
-
|
| 66 |
-
#### π **Git Workflow Discipline**
|
| 67 |
-
- Always use proper git workflow vs. ad-hoc file uploads
|
| 68 |
-
- Semantic versioning prevents deployment confusion
|
| 69 |
-
- Clean commit history aids debugging
|
| 70 |
-
|
| 71 |
-
#### π³ **Docker Best Practices**
|
| 72 |
-
- Standard WORKDIR structure
|
| 73 |
-
- Include debugging tools for production troubleshooting
|
| 74 |
-
- Avoid complex shell commands in CMD that can cause exit issues
|
| 75 |
-
|
| 76 |
-
### Next Steps
|
| 77 |
-
|
| 78 |
-
Now ready to create fresh HuggingFace Space `pgits/voiceCal-ai-v1` with:
|
| 79 |
-
- β
Clean repository structure
|
| 80 |
-
- β
Standard Docker deployment
|
| 81 |
-
- β
All debugging tools included
|
| 82 |
-
- β
Proper git workflow established
|
| 83 |
-
- β
Semantic versioning (v1.0.0)
|
| 84 |
-
|
| 85 |
-
**Migration Summary:**
|
| 86 |
-
- **Old**: Nested complexity, path confusion, deployment issues
|
| 87 |
-
- **New**: Clean structure, standard paths, reliable deployment
|
| 88 |
|
| 89 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 90 |
|
| 91 |
---
|
|
|
|
| 92 |
|
| 93 |
-
#
|
| 94 |
-
|
| 95 |
-
**VoiceCal.ai v1.0.0** - Professional voice-first calendar booking with Google Calendar integration, STT/TTS, and Groq LLM backend.
|
|
|
|
| 1 |
+
# π Groq STT Integration Plan: HuggingFace to Groq Migration Strategy
|
| 2 |
+
|
| 3 |
+
## Executive Summary
|
| 4 |
+
Following our successful TTS migration from Kyutai HuggingFace service to Groq (achieving significant performance improvements), we're now planning a surgical replacement of our Speech-to-Text (STT) service from HuggingFace STT-GPU-Service-v2 to Groq's Whisper-large-v3-turbo implementation.
|
| 5 |
+
|
| 6 |
+
## Current STT Architecture (To Be Replaced)
|
| 7 |
+
**HuggingFace Integration:**
|
| 8 |
+
- External service: `pgits-stt-gpu-service-v2.hf.space`
|
| 9 |
+
- Complex WebSocket queue system for results
|
| 10 |
+
- HTTP POST β WebSocket listener pattern
|
| 11 |
+
- Base64 audio transmission
|
| 12 |
+
- Gradio client integration with session management
|
| 13 |
+
|
| 14 |
+
**Technical Stack:**
|
| 15 |
+
- Frontend: JavaScript MediaRecorder β Base64 conversion
|
| 16 |
+
- Transport: HTTP POST + WebSocket queue listener
|
| 17 |
+
- Backend: External HuggingFace Spaces service
|
| 18 |
+
- Dependencies: External service availability, queue management
|
| 19 |
+
|
| 20 |
+
## Proposed Groq STT Architecture
|
| 21 |
+
**Groq Integration:**
|
| 22 |
+
- Direct API calls to Groq's Whisper service
|
| 23 |
+
- Simplified HTTP request/response pattern
|
| 24 |
+
- FastAPI proxy endpoint for CORS handling
|
| 25 |
+
- Same audio quality with reduced complexity
|
| 26 |
+
|
| 27 |
+
**Implementation Details:**
|
| 28 |
+
```python
|
| 29 |
+
# New FastAPI Endpoint
|
| 30 |
+
@app.post("/api/stt/transcribe")
|
| 31 |
+
async def stt_transcribe(file: UploadFile = File(...)):
|
| 32 |
+
client = Groq(api_key=os.environ.get("GROQ_API_KEY"))
|
| 33 |
+
|
| 34 |
+
transcription = client.audio.transcriptions.create(
|
| 35 |
+
file=file.file,
|
| 36 |
+
model="whisper-large-v3-turbo",
|
| 37 |
+
response_format="json",
|
| 38 |
+
language="en",
|
| 39 |
+
temperature=0.0
|
| 40 |
+
)
|
| 41 |
+
|
| 42 |
+
return {"text": transcription.text}
|
| 43 |
```
|
| 44 |
|
| 45 |
+
```javascript
|
| 46 |
+
// Simplified Frontend Integration
|
| 47 |
+
async transcribeAudio(audioBase64) {
|
| 48 |
+
const audioBlob = this.base64ToBlob(audioBase64);
|
| 49 |
+
const formData = new FormData();
|
| 50 |
+
formData.append('file', audioBlob, 'audio.wav');
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
|
| 52 |
+
const response = await fetch('/api/stt/transcribe', {
|
| 53 |
+
method: 'POST', body: formData
|
| 54 |
+
});
|
|
|
|
| 55 |
|
| 56 |
+
const result = await response.json();
|
| 57 |
+
this.addTranscriptionToInput(result.text);
|
| 58 |
+
}
|
| 59 |
+
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
|
| 61 |
+
## Migration Benefits
|
| 62 |
+
|
| 63 |
+
### Performance Improvements
|
| 64 |
+
- **Elimination of WebSocket complexity** - Direct HTTP API calls
|
| 65 |
+
- **Reduced latency** - No external queue system
|
| 66 |
+
- **Faster transcription** - Groq's optimized Whisper implementation
|
| 67 |
+
- **Simplified error handling** - No connection state management
|
| 68 |
+
|
| 69 |
+
### Operational Benefits
|
| 70 |
+
- **Consolidated authentication** - Uses existing GROQ_API_KEY
|
| 71 |
+
- **Reduced dependencies** - No external HuggingFace service reliance
|
| 72 |
+
- **Cost optimization** - Direct API usage vs. external compute
|
| 73 |
+
- **Improved reliability** - Fewer points of failure
|
| 74 |
+
|
| 75 |
+
### Development Benefits
|
| 76 |
+
- **Code simplification** - Remove WebSocket queue logic
|
| 77 |
+
- **Easier debugging** - Standard HTTP request/response pattern
|
| 78 |
+
- **Better error visibility** - Direct API error responses
|
| 79 |
+
- **Consistent architecture** - Matches our TTS implementation pattern
|
| 80 |
+
|
| 81 |
+
## Surgical Implementation Plan
|
| 82 |
+
|
| 83 |
+
### Files to Modify (Minimal Impact)
|
| 84 |
+
1. **app/api/main.py** - Add new `/api/stt/transcribe` endpoint
|
| 85 |
+
2. **app/api/chat_widget.py** - Replace `transcribeAudio()` method (lines 1151-1211)
|
| 86 |
+
3. **Requirements** - Already satisfied (groq>=0.4.0 from TTS migration)
|
| 87 |
+
|
| 88 |
+
### Files NOT Modified (Preservation Strategy)
|
| 89 |
+
- Audio recording logic (MediaRecorder)
|
| 90 |
+
- Visual state management (STT indicators)
|
| 91 |
+
- User interface components
|
| 92 |
+
- Session management
|
| 93 |
+
- TTS interruption system (recently enhanced)
|
| 94 |
+
|
| 95 |
+
## Risk Mitigation
|
| 96 |
+
- **Identical API contract** - Same input (audio) β output (text) pattern
|
| 97 |
+
- **Progressive deployment** - Can switch back via configuration
|
| 98 |
+
- **Preserved user experience** - No UI changes required
|
| 99 |
+
- **Same audio quality** - WebM/Opus β Whisper transcription path maintained
|
| 100 |
+
|
| 101 |
+
## Success Metrics
|
| 102 |
+
- **Transcription latency reduction** (target: <2 seconds)
|
| 103 |
+
- **Error rate improvement** (eliminate WebSocket timeouts)
|
| 104 |
+
- **Code complexity reduction** (remove 100+ lines of WebSocket handling)
|
| 105 |
+
- **Infrastructure simplification** (single API key vs. external service)
|
| 106 |
+
|
| 107 |
+
## Timeline
|
| 108 |
+
- **Phase 1:** Implementation (FastAPI endpoint + frontend method)
|
| 109 |
+
- **Phase 2:** Testing (transcription accuracy and performance)
|
| 110 |
+
- **Phase 3:** Deployment (surgical replacement with rollback capability)
|
| 111 |
+
|
| 112 |
+
## Architectural Philosophy
|
| 113 |
+
This migration continues our platform consolidation strategy: moving from distributed external services to unified API providers while maintaining service quality and user experience. The Groq ecosystem (TTS + STT) provides performance advantages and operational simplification compared to our current mixed-provider approach.
|
| 114 |
|
| 115 |
---
|
| 116 |
+
*This document serves as the technical blueprint for our HuggingFace β Groq STT migration, ensuring stakeholder alignment and implementation clarity.*
|
| 117 |
|
| 118 |
+
#AI #SpeechToText #Groq #HuggingFace #TechnicalStrategy #VoiceAI #SystemArchitecture
|
|
|
|
|
|
app/api/chat_widget.py
CHANGED
|
@@ -1149,51 +1149,33 @@ async def chat_widget(request: Request, email: str = None):
|
|
| 1149 |
}
|
| 1150 |
|
| 1151 |
async transcribeAudio(audioBase64) {
|
| 1152 |
-
|
| 1153 |
-
const payload = {
|
| 1154 |
-
data: [
|
| 1155 |
-
audioBase64,
|
| 1156 |
-
this.language,
|
| 1157 |
-
this.modelSize
|
| 1158 |
-
],
|
| 1159 |
-
session_hash: sessionHash
|
| 1160 |
-
};
|
| 1161 |
-
|
| 1162 |
-
console.log(`π€ Sending to STT v2 service: ${this.serverUrl}/call/gradio_transcribe_memory`);
|
| 1163 |
|
| 1164 |
try {
|
| 1165 |
const startTime = Date.now();
|
| 1166 |
|
| 1167 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1168 |
method: 'POST',
|
| 1169 |
-
|
| 1170 |
-
'Content-Type': 'application/json',
|
| 1171 |
-
},
|
| 1172 |
-
body: JSON.stringify(payload)
|
| 1173 |
});
|
| 1174 |
|
| 1175 |
if (!response.ok) {
|
| 1176 |
-
throw new Error(`STT
|
| 1177 |
}
|
| 1178 |
|
| 1179 |
const responseData = await response.json();
|
| 1180 |
-
console.log('π¨ STT
|
| 1181 |
-
|
| 1182 |
-
let result;
|
| 1183 |
|
| 1184 |
-
|
| 1185 |
-
console.log(`π― Got queue event_id: ${responseData.event_id}`);
|
| 1186 |
-
result = await this.listenForQueueResult(responseData, startTime, sessionHash);
|
| 1187 |
-
} else if (responseData.data && Array.isArray(responseData.data)) {
|
| 1188 |
-
result = responseData.data[0];
|
| 1189 |
-
console.log('π₯ Got direct response from queue');
|
| 1190 |
-
} else {
|
| 1191 |
-
throw new Error(`Unexpected response format: ${JSON.stringify(responseData)}`);
|
| 1192 |
-
}
|
| 1193 |
|
| 1194 |
if (result && result.trim()) {
|
| 1195 |
const processingTime = (Date.now() - startTime) / 1000;
|
| 1196 |
-
console.log(`β
STT
|
| 1197 |
|
| 1198 |
// Add transcription to message input
|
| 1199 |
this.addTranscriptionToInput(result);
|
|
@@ -1204,105 +1186,12 @@ async def chat_widget(request: Request, email: str = None):
|
|
| 1204 |
}
|
| 1205 |
|
| 1206 |
} catch (error) {
|
| 1207 |
-
console.error('β STT
|
| 1208 |
updateSTTVisualState('error');
|
| 1209 |
setTimeout(() => updateSTTVisualState('ready'), 3000);
|
| 1210 |
}
|
| 1211 |
}
|
| 1212 |
|
| 1213 |
-
async listenForQueueResult(queueResponse, startTime, sessionHash) {
|
| 1214 |
-
return new Promise((resolve, reject) => {
|
| 1215 |
-
const wsUrl = this.serverUrl.replace('https://', 'wss://').replace('http://', 'ws://') + '/queue/data';
|
| 1216 |
-
console.log(`π Connecting to STT v2 WebSocket: ${wsUrl}`);
|
| 1217 |
-
|
| 1218 |
-
const ws = new WebSocket(wsUrl);
|
| 1219 |
-
|
| 1220 |
-
const timeout = setTimeout(() => {
|
| 1221 |
-
ws.close();
|
| 1222 |
-
reject(new Error('STT v2 queue timeout after 30 seconds'));
|
| 1223 |
-
}, 30000);
|
| 1224 |
-
|
| 1225 |
-
ws.onopen = () => {
|
| 1226 |
-
console.log('β
STT v2 WebSocket connected');
|
| 1227 |
-
if (queueResponse.event_id) {
|
| 1228 |
-
ws.send(JSON.stringify({
|
| 1229 |
-
event_id: queueResponse.event_id
|
| 1230 |
-
}));
|
| 1231 |
-
console.log(`π€ Sent event_id: ${queueResponse.event_id}`);
|
| 1232 |
-
}
|
| 1233 |
-
};
|
| 1234 |
-
|
| 1235 |
-
ws.onmessage = (event) => {
|
| 1236 |
-
try {
|
| 1237 |
-
const data = JSON.parse(event.data);
|
| 1238 |
-
console.log('π¨ STT v2 queue message:', data);
|
| 1239 |
-
|
| 1240 |
-
if (data.msg === 'process_completed' && data.output && data.output.data) {
|
| 1241 |
-
clearTimeout(timeout);
|
| 1242 |
-
ws.close();
|
| 1243 |
-
resolve(data.output.data[0]);
|
| 1244 |
-
} else if (data.msg === 'process_starts') {
|
| 1245 |
-
updateSTTVisualState('processing');
|
| 1246 |
-
}
|
| 1247 |
-
} catch (e) {
|
| 1248 |
-
console.warn('β οΈ STT v2 WebSocket parse error:', e.message);
|
| 1249 |
-
}
|
| 1250 |
-
};
|
| 1251 |
-
|
| 1252 |
-
ws.onerror = (error) => {
|
| 1253 |
-
console.error('β STT v2 WebSocket error:', error);
|
| 1254 |
-
clearTimeout(timeout);
|
| 1255 |
-
// Try polling as fallback
|
| 1256 |
-
this.pollForResult(queueResponse.event_id, startTime, sessionHash).then(resolve).catch(reject);
|
| 1257 |
-
};
|
| 1258 |
-
|
| 1259 |
-
ws.onclose = (event) => {
|
| 1260 |
-
console.log(`π STT v2 WebSocket closed: code=${event.code}`);
|
| 1261 |
-
clearTimeout(timeout);
|
| 1262 |
-
};
|
| 1263 |
-
});
|
| 1264 |
-
}
|
| 1265 |
-
|
| 1266 |
-
async pollForResult(eventId, startTime, sessionHash) {
|
| 1267 |
-
console.log(`π Starting STT v2 polling for event: ${eventId}`);
|
| 1268 |
-
const maxAttempts = 20;
|
| 1269 |
-
|
| 1270 |
-
for (let attempt = 0; attempt < maxAttempts; attempt++) {
|
| 1271 |
-
try {
|
| 1272 |
-
const endpoint = `/queue/data?event_id=${eventId}&session_hash=${sessionHash}`;
|
| 1273 |
-
const response = await fetch(`${this.serverUrl}${endpoint}`);
|
| 1274 |
-
|
| 1275 |
-
if (response.ok) {
|
| 1276 |
-
const responseText = await response.text();
|
| 1277 |
-
console.log(`π STT v2 poll attempt ${attempt + 1}: ${responseText.substring(0, 200)}`);
|
| 1278 |
-
|
| 1279 |
-
if (responseText.includes('data: ')) {
|
| 1280 |
-
const lines = responseText.split('\\n');
|
| 1281 |
-
for (const line of lines) {
|
| 1282 |
-
if (line.startsWith('data: ')) {
|
| 1283 |
-
try {
|
| 1284 |
-
const data = JSON.parse(line.substring(6));
|
| 1285 |
-
if (data.msg === 'process_completed' && data.output && data.output.data) {
|
| 1286 |
-
return data.output.data[0];
|
| 1287 |
-
}
|
| 1288 |
-
} catch (parseError) {
|
| 1289 |
-
console.warn('β οΈ STT v2 SSE parse error:', parseError.message);
|
| 1290 |
-
}
|
| 1291 |
-
}
|
| 1292 |
-
}
|
| 1293 |
-
}
|
| 1294 |
-
}
|
| 1295 |
-
} catch (e) {
|
| 1296 |
-
console.warn(`β οΈ STT v2 poll error attempt ${attempt + 1}:`, e.message);
|
| 1297 |
-
}
|
| 1298 |
-
|
| 1299 |
-
// Progressive delay
|
| 1300 |
-
const delay = attempt < 5 ? 200 : 500;
|
| 1301 |
-
await new Promise(resolve => setTimeout(resolve, delay));
|
| 1302 |
-
}
|
| 1303 |
-
|
| 1304 |
-
throw new Error('STT v2 polling timeout - no result after 20 attempts');
|
| 1305 |
-
}
|
| 1306 |
|
| 1307 |
addTranscriptionToInput(transcription) {
|
| 1308 |
const currentValue = messageInput.value;
|
|
|
|
| 1149 |
}
|
| 1150 |
|
| 1151 |
async transcribeAudio(audioBase64) {
|
| 1152 |
+
console.log(`π€ Sending to Groq STT service: /api/stt/transcribe`);
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1153 |
|
| 1154 |
try {
|
| 1155 |
const startTime = Date.now();
|
| 1156 |
|
| 1157 |
+
// Convert base64 to blob
|
| 1158 |
+
const audioBlob = this.base64ToBlob(audioBase64);
|
| 1159 |
+
const formData = new FormData();
|
| 1160 |
+
formData.append('file', audioBlob, 'audio.wav');
|
| 1161 |
+
|
| 1162 |
+
const response = await fetch('/api/stt/transcribe', {
|
| 1163 |
method: 'POST',
|
| 1164 |
+
body: formData
|
|
|
|
|
|
|
|
|
|
| 1165 |
});
|
| 1166 |
|
| 1167 |
if (!response.ok) {
|
| 1168 |
+
throw new Error(`Groq STT request failed: ${response.status}`);
|
| 1169 |
}
|
| 1170 |
|
| 1171 |
const responseData = await response.json();
|
| 1172 |
+
console.log('π¨ Groq STT response:', responseData);
|
|
|
|
|
|
|
| 1173 |
|
| 1174 |
+
const result = responseData.text;
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1175 |
|
| 1176 |
if (result && result.trim()) {
|
| 1177 |
const processingTime = (Date.now() - startTime) / 1000;
|
| 1178 |
+
console.log(`β
Groq STT transcription successful (${processingTime.toFixed(2)}s): "${result.substring(0, 100)}"`);
|
| 1179 |
|
| 1180 |
// Add transcription to message input
|
| 1181 |
this.addTranscriptionToInput(result);
|
|
|
|
| 1186 |
}
|
| 1187 |
|
| 1188 |
} catch (error) {
|
| 1189 |
+
console.error('β Groq STT transcription failed:', error);
|
| 1190 |
updateSTTVisualState('error');
|
| 1191 |
setTimeout(() => updateSTTVisualState('ready'), 3000);
|
| 1192 |
}
|
| 1193 |
}
|
| 1194 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1195 |
|
| 1196 |
addTranscriptionToInput(transcription) {
|
| 1197 |
const currentValue = messageInput.value;
|
app/api/main.py
CHANGED
|
@@ -701,6 +701,59 @@ async def get_tts_audio(file_id: str):
|
|
| 701 |
raise HTTPException(status_code=500, detail=f"Audio serving failed: {str(e)}")
|
| 702 |
|
| 703 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 704 |
@app.get("/auth/login", response_model=AuthResponse)
|
| 705 |
async def google_auth_login(request: Request, state: Optional[str] = None):
|
| 706 |
"""Initiate Google OAuth login."""
|
|
|
|
| 701 |
raise HTTPException(status_code=500, detail=f"Audio serving failed: {str(e)}")
|
| 702 |
|
| 703 |
|
| 704 |
+
@app.post("/api/stt/transcribe")
|
| 705 |
+
async def stt_transcribe(file: UploadFile = File(...)):
|
| 706 |
+
"""STT transcription using Groq Whisper API."""
|
| 707 |
+
try:
|
| 708 |
+
from groq import Groq
|
| 709 |
+
import os
|
| 710 |
+
import tempfile
|
| 711 |
+
|
| 712 |
+
logger.info(f"π€ STT transcription request: {file.filename} ({file.content_type})")
|
| 713 |
+
|
| 714 |
+
# Create Groq STT client
|
| 715 |
+
client = Groq(api_key=os.environ.get("GROQ_API_KEY"))
|
| 716 |
+
|
| 717 |
+
# Time the STT generation for performance monitoring
|
| 718 |
+
import time
|
| 719 |
+
start_time = time.time()
|
| 720 |
+
|
| 721 |
+
# Create transcription using Groq Whisper
|
| 722 |
+
transcription = client.audio.transcriptions.create(
|
| 723 |
+
file=file.file,
|
| 724 |
+
model="whisper-large-v3-turbo",
|
| 725 |
+
response_format="json",
|
| 726 |
+
language="en",
|
| 727 |
+
temperature=0.0
|
| 728 |
+
)
|
| 729 |
+
|
| 730 |
+
transcription_time = time.time() - start_time
|
| 731 |
+
logger.info(f"β±οΈ STT transcription took {transcription_time:.2f} seconds")
|
| 732 |
+
|
| 733 |
+
if transcription and transcription.text:
|
| 734 |
+
logger.info(f"π€ STT transcription successful: \"{transcription.text[:100]}...\"")
|
| 735 |
+
return {
|
| 736 |
+
"success": True,
|
| 737 |
+
"text": transcription.text,
|
| 738 |
+
"processing_time": round(transcription_time, 2)
|
| 739 |
+
}
|
| 740 |
+
else:
|
| 741 |
+
raise HTTPException(status_code=500, detail="Empty transcription result")
|
| 742 |
+
|
| 743 |
+
except Exception as e:
|
| 744 |
+
# Enhanced error logging for Groq API issues
|
| 745 |
+
error_msg = str(e)
|
| 746 |
+
if "Error code: 401" in error_msg:
|
| 747 |
+
logger.error("Groq API authentication error - check GROQ_API_KEY")
|
| 748 |
+
raise HTTPException(status_code=500, detail="STT service authentication failed")
|
| 749 |
+
elif "Error code: 400" in error_msg:
|
| 750 |
+
logger.error(f"Groq API validation error: {error_msg}")
|
| 751 |
+
raise HTTPException(status_code=400, detail="STT request validation failed")
|
| 752 |
+
else:
|
| 753 |
+
logger.error(f"STT transcription error: {e}")
|
| 754 |
+
raise HTTPException(status_code=500, detail=f"STT transcription failed: {str(e)}")
|
| 755 |
+
|
| 756 |
+
|
| 757 |
@app.get("/auth/login", response_model=AuthResponse)
|
| 758 |
async def google_auth_login(request: Request, state: Optional[str] = None):
|
| 759 |
"""Initiate Google OAuth login."""
|
version.txt
CHANGED
|
@@ -1 +1 @@
|
|
| 1 |
-
2.0.
|
|
|
|
| 1 |
+
2.0.5-groq-stt-integration
|