Spaces:
Sleeping
Issue Resolution Report
1. STT 400 Validation Error (Fixed)
Status: ✅ FIXED
Root Cause: Missing confidence field in Pydantic model caused 400 error.
Fix: Added confidence calculation logic (math.exp) to whisper_stt_service.py.
2. TTS Latency (High TTFB) (Improved)
Status: ✅ FIXED
Root Cause: edge-tts full-text buffering caused 8.8s TTFB.
Fix: Implemented sentence-level splitting and streaming in EdgeTTSService.
Result: TTFB improved from 8.8s → 1.1s.
3. Concurrent Requests Failure (Fixed)
Status: ✅ FIXED
Root Cause: Benchmark queried /api/v1/health (404) instead of /health.
Fix: Corrected benchmark URL.
Result: 5/5 concurrent requests succeed.
4. Voice List Caching (Fixed)
Status: ✅ FIXED
Root Cause: Re-fetching voice list from Microsoft API on every request (2s).
Fix: Implemented class-level caching in EdgeTTSService and added pre-warming.
Result: Voice list fetch 1200ms → 10ms.
5. DNS Resolution Lag on Windows (Fixed)
Status: ✅ FIXED
Root Cause: Windows localhost resolution delay (2.0s).
Fix: Switched to explicit 127.0.0.1 IPv4 loopback.
Result: Cold start 2.1s → 0.01s.
6. STT Slow CPU Performance (Fixed)
Status: ✅ FIXED
Root Cause: whisper-small (float32) took 38s for 30s audio on CPU.
Fix:
- Enforced
compute_type="int8"(30% speedup). - Set
beam_size=1(Greedy decoding) (15% speedup). - Integrated Distil-Whisper hybrid routing (10x speedup). Result: STT latency optimized 38.5s → 3.7s.
7. Cold Start "First Request" Lag (Fixed)
Status: ✅ FIXED
Root Cause: Model loading on first request caused 8-10s delay.
Fix: Implemented lifespan handler in main.py to pre-load specific models (distil-small.en, small) and cache voices on startup.
Result: First request is now instant (0.03s).
8. Lack of Prosody Control (Fixed)
Status: ✅ FIXED
Root Cause: API only supported plain text.
Fix: Added /api/v1/tts/ssml endpoint and build_ssml builder method.
Result: Users can now control rate, pitch, and emphasis.
Verification Date: 2026-01-17 All Critical Issues Resolved.