metadata
title: Valtec Vietnamese TTS Web Demo
emoji: 🌐
colorFrom: blue
colorTo: purple
sdk: static
pinned: false
license: mit
Valtec Vietnamese TTS - Browser Demo
🌐 Vietnamese Text-to-Speech Running Entirely in Your Browser
This demo uses ONNX Runtime Web to run Vietnamese TTS completely in your browser - no server required!
Features
- ✅ 100% Browser-Based: All processing happens in your browser
- ✅ No Backend: Direct ONNX model inference using WebAssembly
- ✅ 5 Vietnamese Voices: NF, SF, NM1, SM, NM2 (Northern/Southern accents)
- ✅ Fast Loading: Models cached after first load (~165MB)
- ✅ Privacy-First: Your text never leaves your browser
How It Works
- First Load: Downloads ONNX models from HuggingFace Hub (~165MB)
- Text Input: Enter any Vietnamese text
- Voice Selection: Choose from 5 regional voices
- Real-Time Synthesis: ONNX Runtime Web generates audio in browser
- Instant Playback: Listen to synthesized speech
Available Voices
| Voice | Region | Gender | Description |
|---|---|---|---|
| NF | Northern (Bắc) | Female | Clear, formal |
| SF | Southern (Nam) | Female | Warm, friendly |
| NM1 | Northern (Bắc) | Male | Professional |
| SM | Southern (Nam) | Male | Conversational |
| NM2 | Northern (Bắc) | Male | Authoritative |
Technical Details
ONNX Pipeline
- Text Encoder: Phoneme encoding
- Duration Predictor: Speech timing
- Flow Model: Latent transformation
- Decoder: Audio waveform generation (HiFi-GAN)
Vietnamese G2P
- Uses ported viphoneme library in JavaScript
- Accurate tone and phoneme mapping
- 99.96% accuracy vs Python reference
Browser Requirements
- Chrome 90+, Firefox 90+, Edge 90+ (Full support)
- Safari 15+ (Limited support)
- WebAssembly and AudioContext API required
Model Info
- Architecture: VITS (Conditional VAE)
- Sample Rate: 24kHz
- Model Size: 164.75 MB (ONNX)
- Speakers: 5 (Northern/Southern Vietnamese)
Performance
First load may take 30-60 seconds to download models. Subsequent visits are instant (cached).
Synthesis speed depends on device:
- Desktop: ~5-8 seconds per sentence
- Mobile: ~10-15 seconds per sentence
Links
- 🎤 Gradio Demo - Full featured demo
- 📦 ONNX Models - Pre-trained models
- 🏠 GitHub - Source code
- 📱 Android App - Mobile deployment
Privacy
All processing happens locally in your browser. No data is sent to any server. Your text input and generated audio never leave your device.
Powered by Valtec AI Team | ONNX Runtime Web | WebAssembly