parakeet-v3-streaming

Running

App Files Files Community

parakeet-v3-streaming / README.md

andito HF Staff

Update README: correct progressive update interval to 500ms

8718725 about 2 months ago

preview code

raw

history blame contribute delete

4.62 kB

	---
	title: Parakeet STT Progressive Transcription
	emoji: 🎤
	colorFrom: blue
	colorTo: purple
	sdk: static
	pinned: false
	custom_headers:
	cross-origin-embedder-policy: credentialless
	cross-origin-opener-policy: same-origin
	cross-origin-resource-policy: cross-origin
	---

	# Parakeet STT Progressive Transcription Demo

	Real-time speech recognition with smart progressive streaming, powered by Parakeet TDT 0.6B v3 (ONNX) via [parakeet.js](https://github.com/ysdede/parakeet.js) and WebGPU acceleration.


	## Features

	- 🎤 Parakeet TDT 0.6B v3: NVIDIA's multilingual speech recognition model
	- 25 European languages supported
	- Word-level timestamps and confidence scores
	- WebGPU accelerated inference

	- ⚡ Smart Progressive Streaming: Intelligent window management with sentence-aware boundaries
	- Growing window (0-15s) for accuracy
	- Sentence-aware sliding window (>15s) to maintain context
	- Real-time updates every 500ms

	- 🔒 Privacy-First: All processing happens locally in your browser - no data sent to servers

	- 🎨 Visual Feedback:
	- Yellow text: Fixed sentences (completed, won't change)
	- Cyan text: Active transcription (in-progress)

	- 📊 Developer Metrics: Real-time performance monitoring
	- Latency and Real-time Factor (RTF)
	- Window state visualization
	- Memory usage tracking
	- Confidence scores

	## Tech Stack

	- Model: [Parakeet TDT 0.6B v3 (ONNX)](https://huggingface.co/istupakov/parakeet-tdt-0.6b-v3-onnx)
	- Inference: [parakeet.js](https://www.npmjs.com/package/parakeet.js) + [ONNX Runtime Web](https://onnxruntime.ai/docs/tutorials/web/)
	- Framework: React 18 + Vite
	- Styling: Tailwind CSS

	## Usage

	1. Load Model: Click "Load Model" to download Parakeet (~2.5GB, one-time download)
	2. Start Recording: Click "Start Recording" and grant microphone permissions
	3. Speak: Watch real-time progressive transcriptions appear
	4. Stop Recording: Click "Stop Recording" to finalize the transcription

	## How It Works

	### Progressive Streaming Algorithm

	This demo implements the smart progressive streaming algorithm from the [speech-to-speech repository](https://github.com/huggingface/speech-to-speech):

	1. Growing Window (0-15s):
	- Accumulates audio for better accuracy
	- Re-transcribes entire buffer every 500ms

	2. Sliding Window (>15s):
	- Locks completed sentences as "fixed"
	- Only re-transcribes active portion (last 2s)
	- Prevents memory growth and maintains accuracy

	### Architecture

	```
	User Microphone
	↓
	Web Audio API (16kHz)
	↓
	Audio Processor (accumulate chunks)
	↓
	Progressive Streaming Handler (500ms updates)
	↓
	Web Worker → Parakeet ONNX Model (via parakeet.js + WebGPU)
	↓
	Transcription Display (yellow fixed + cyan active)
	```

	## Model Information

	- Model: Parakeet TDT 0.6B v3
	- Format: ONNX (optimized for web via parakeet.js)
	- Size: ~2.5GB
	- Languages: 25 European languages (EN, DE, FR, ES, IT, PT, NL, PL, RU, UK, CS, SK, HU, RO, BG, HR, SL, SR, DA, NO, SV, FI, ET, LV, LT)
	- Sample Rate: 16kHz
	- Architecture: Conformer encoder + RNN-Transducer decoder

	## Browser Compatibility

	\| Browser \| WebGPU Support \| Status \|
	\|---------\|----------------\|--------\|
	\| Chrome 113+ \| ✅ Yes \| Full support \|
	\| Edge 113+ \| ✅ Yes \| Full support \|
	\| Firefox \| ⚠️ Limited \| WASM fallback \|
	\| Safari \| ⚠️ Limited \| WASM fallback \|

	## Performance

	- First result: <500ms latency
	- Progressive updates: 500ms cadence
	- RTF (Real-time Factor): ~0.3-0.5x with WebGPU
	- Model loading: 1-2 minutes (one-time, cached locally)

	Note: Browser-based inference is inherently slower than native implementations. For comparison, the Python MLX implementation achieves ~60x faster performance on Apple Silicon. This is a fundamental limitation of running large models in browsers.

	## Credits

	- Progressive Streaming Algorithm: [speech-to-speech/STT/smart_progressive_streaming.py](https://github.com/huggingface/speech-to-speech/blob/main/STT/smart_progressive_streaming.py)
	- Parakeet.js: [ysdede/parakeet.js](https://github.com/ysdede/parakeet.js)
	- ONNX Model: [istupakov/parakeet-tdt-0.6b-v3-onnx](https://huggingface.co/istupakov/parakeet-tdt-0.6b-v3-onnx)
	- Original Model: NVIDIA Parakeet TDT 0.6B v3

	## License

	MIT

	## References

	- [Parakeet.js Documentation](https://github.com/ysdede/parakeet.js)
	- [Parakeet.js Live Demo](https://huggingface.co/spaces/ysdede/parakeet.js-demo)
	- [Original Python Implementation](https://github.com/huggingface/speech-to-speech)