andito's picture
andito HF Staff
Update README: correct progressive update interval to 500ms
8718725
---
title: Parakeet STT Progressive Transcription
emoji: 🎀
colorFrom: blue
colorTo: purple
sdk: static
pinned: false
custom_headers:
cross-origin-embedder-policy: credentialless
cross-origin-opener-policy: same-origin
cross-origin-resource-policy: cross-origin
---
# Parakeet STT Progressive Transcription Demo
Real-time speech recognition with smart progressive streaming, powered by **Parakeet TDT 0.6B v3** (ONNX) via [parakeet.js](https://github.com/ysdede/parakeet.js) and WebGPU acceleration.
## Features
- **🎀 Parakeet TDT 0.6B v3**: NVIDIA's multilingual speech recognition model
- 25 European languages supported
- Word-level timestamps and confidence scores
- WebGPU accelerated inference
- **⚑ Smart Progressive Streaming**: Intelligent window management with sentence-aware boundaries
- Growing window (0-15s) for accuracy
- Sentence-aware sliding window (>15s) to maintain context
- Real-time updates every 500ms
- **πŸ”’ Privacy-First**: All processing happens locally in your browser - no data sent to servers
- **🎨 Visual Feedback**:
- Yellow text: Fixed sentences (completed, won't change)
- Cyan text: Active transcription (in-progress)
- **πŸ“Š Developer Metrics**: Real-time performance monitoring
- Latency and Real-time Factor (RTF)
- Window state visualization
- Memory usage tracking
- Confidence scores
## Tech Stack
- **Model**: [Parakeet TDT 0.6B v3 (ONNX)](https://huggingface.co/istupakov/parakeet-tdt-0.6b-v3-onnx)
- **Inference**: [parakeet.js](https://www.npmjs.com/package/parakeet.js) + [ONNX Runtime Web](https://onnxruntime.ai/docs/tutorials/web/)
- **Framework**: React 18 + Vite
- **Styling**: Tailwind CSS
## Usage
1. **Load Model**: Click "Load Model" to download Parakeet (~2.5GB, one-time download)
2. **Start Recording**: Click "Start Recording" and grant microphone permissions
3. **Speak**: Watch real-time progressive transcriptions appear
4. **Stop Recording**: Click "Stop Recording" to finalize the transcription
## How It Works
### Progressive Streaming Algorithm
This demo implements the smart progressive streaming algorithm from the [speech-to-speech repository](https://github.com/huggingface/speech-to-speech):
1. **Growing Window (0-15s)**:
- Accumulates audio for better accuracy
- Re-transcribes entire buffer every 500ms
2. **Sliding Window (>15s)**:
- Locks completed sentences as "fixed"
- Only re-transcribes active portion (last 2s)
- Prevents memory growth and maintains accuracy
### Architecture
```
User Microphone
↓
Web Audio API (16kHz)
↓
Audio Processor (accumulate chunks)
↓
Progressive Streaming Handler (500ms updates)
↓
Web Worker β†’ Parakeet ONNX Model (via parakeet.js + WebGPU)
↓
Transcription Display (yellow fixed + cyan active)
```
## Model Information
- **Model**: Parakeet TDT 0.6B v3
- **Format**: ONNX (optimized for web via parakeet.js)
- **Size**: ~2.5GB
- **Languages**: 25 European languages (EN, DE, FR, ES, IT, PT, NL, PL, RU, UK, CS, SK, HU, RO, BG, HR, SL, SR, DA, NO, SV, FI, ET, LV, LT)
- **Sample Rate**: 16kHz
- **Architecture**: Conformer encoder + RNN-Transducer decoder
## Browser Compatibility
| Browser | WebGPU Support | Status |
|---------|----------------|--------|
| Chrome 113+ | βœ… Yes | Full support |
| Edge 113+ | βœ… Yes | Full support |
| Firefox | ⚠️ Limited | WASM fallback |
| Safari | ⚠️ Limited | WASM fallback |
## Performance
- **First result**: <500ms latency
- **Progressive updates**: 500ms cadence
- **RTF (Real-time Factor)**: ~0.3-0.5x with WebGPU
- **Model loading**: 1-2 minutes (one-time, cached locally)
**Note**: Browser-based inference is inherently slower than native implementations. For comparison, the Python MLX implementation achieves ~60x faster performance on Apple Silicon. This is a fundamental limitation of running large models in browsers.
## Credits
- **Progressive Streaming Algorithm**: [speech-to-speech/STT/smart_progressive_streaming.py](https://github.com/huggingface/speech-to-speech/blob/main/STT/smart_progressive_streaming.py)
- **Parakeet.js**: [ysdede/parakeet.js](https://github.com/ysdede/parakeet.js)
- **ONNX Model**: [istupakov/parakeet-tdt-0.6b-v3-onnx](https://huggingface.co/istupakov/parakeet-tdt-0.6b-v3-onnx)
- **Original Model**: NVIDIA Parakeet TDT 0.6B v3
## License
MIT
## References
- [Parakeet.js Documentation](https://github.com/ysdede/parakeet.js)
- [Parakeet.js Live Demo](https://huggingface.co/spaces/ysdede/parakeet.js-demo)
- [Original Python Implementation](https://github.com/huggingface/speech-to-speech)