--- title: Parakeet STT Progressive Transcription emoji: 🎤 colorFrom: blue colorTo: purple sdk: static pinned: false custom_headers: cross-origin-embedder-policy: credentialless cross-origin-opener-policy: same-origin cross-origin-resource-policy: cross-origin --- # Parakeet STT Progressive Transcription Demo Real-time speech recognition with smart progressive streaming, powered by **Parakeet TDT 0.6B v3** (ONNX) via [parakeet.js](https://github.com/ysdede/parakeet.js) and WebGPU acceleration. ## Features - **🎤 Parakeet TDT 0.6B v3**: NVIDIA's multilingual speech recognition model - 25 European languages supported - Word-level timestamps and confidence scores - WebGPU accelerated inference - **⚡ Smart Progressive Streaming**: Intelligent window management with sentence-aware boundaries - Growing window (0-15s) for accuracy - Sentence-aware sliding window (>15s) to maintain context - Real-time updates every 500ms - **🔒 Privacy-First**: All processing happens locally in your browser - no data sent to servers - **🎨 Visual Feedback**: - Yellow text: Fixed sentences (completed, won't change) - Cyan text: Active transcription (in-progress) - **📊 Developer Metrics**: Real-time performance monitoring - Latency and Real-time Factor (RTF) - Window state visualization - Memory usage tracking - Confidence scores ## Tech Stack - **Model**: [Parakeet TDT 0.6B v3 (ONNX)](https://huggingface.co/istupakov/parakeet-tdt-0.6b-v3-onnx) - **Inference**: [parakeet.js](https://www.npmjs.com/package/parakeet.js) + [ONNX Runtime Web](https://onnxruntime.ai/docs/tutorials/web/) - **Framework**: React 18 + Vite - **Styling**: Tailwind CSS ## Usage 1. **Load Model**: Click "Load Model" to download Parakeet (~2.5GB, one-time download) 2. **Start Recording**: Click "Start Recording" and grant microphone permissions 3. **Speak**: Watch real-time progressive transcriptions appear 4. **Stop Recording**: Click "Stop Recording" to finalize the transcription ## How It Works ### Progressive Streaming Algorithm This demo implements the smart progressive streaming algorithm from the [speech-to-speech repository](https://github.com/huggingface/speech-to-speech): 1. **Growing Window (0-15s)**: - Accumulates audio for better accuracy - Re-transcribes entire buffer every 500ms 2. **Sliding Window (>15s)**: - Locks completed sentences as "fixed" - Only re-transcribes active portion (last 2s) - Prevents memory growth and maintains accuracy ### Architecture ``` User Microphone ↓ Web Audio API (16kHz) ↓ Audio Processor (accumulate chunks) ↓ Progressive Streaming Handler (500ms updates) ↓ Web Worker → Parakeet ONNX Model (via parakeet.js + WebGPU) ↓ Transcription Display (yellow fixed + cyan active) ``` ## Model Information - **Model**: Parakeet TDT 0.6B v3 - **Format**: ONNX (optimized for web via parakeet.js) - **Size**: ~2.5GB - **Languages**: 25 European languages (EN, DE, FR, ES, IT, PT, NL, PL, RU, UK, CS, SK, HU, RO, BG, HR, SL, SR, DA, NO, SV, FI, ET, LV, LT) - **Sample Rate**: 16kHz - **Architecture**: Conformer encoder + RNN-Transducer decoder ## Browser Compatibility | Browser | WebGPU Support | Status | |---------|----------------|--------| | Chrome 113+ | ✅ Yes | Full support | | Edge 113+ | ✅ Yes | Full support | | Firefox | ⚠️ Limited | WASM fallback | | Safari | ⚠️ Limited | WASM fallback | ## Performance - **First result**: <500ms latency - **Progressive updates**: 500ms cadence - **RTF (Real-time Factor)**: ~0.3-0.5x with WebGPU - **Model loading**: 1-2 minutes (one-time, cached locally) **Note**: Browser-based inference is inherently slower than native implementations. For comparison, the Python MLX implementation achieves ~60x faster performance on Apple Silicon. This is a fundamental limitation of running large models in browsers. ## Credits - **Progressive Streaming Algorithm**: [speech-to-speech/STT/smart_progressive_streaming.py](https://github.com/huggingface/speech-to-speech/blob/main/STT/smart_progressive_streaming.py) - **Parakeet.js**: [ysdede/parakeet.js](https://github.com/ysdede/parakeet.js) - **ONNX Model**: [istupakov/parakeet-tdt-0.6b-v3-onnx](https://huggingface.co/istupakov/parakeet-tdt-0.6b-v3-onnx) - **Original Model**: NVIDIA Parakeet TDT 0.6B v3 ## License MIT ## References - [Parakeet.js Documentation](https://github.com/ysdede/parakeet.js) - [Parakeet.js Live Demo](https://huggingface.co/spaces/ysdede/parakeet.js-demo) - [Original Python Implementation](https://github.com/huggingface/speech-to-speech)