Spaces:

valtecAI-team
/

valtec-vietnamese-tts-web

Running

App Files Files Community

valtec-vietnamese-tts-web / README.md

valtecAI-team's picture

Upload README.md with huggingface_hub

48cee3a verified 4 months ago

|

history blame contribute delete

3 kB

title: Valtec Vietnamese TTS Web Demo
emoji: 🌐
colorFrom: blue
colorTo: purple
sdk: static
pinned: false
license: mit

Valtec Vietnamese TTS - Browser Demo

🌐 Vietnamese Text-to-Speech Running Entirely in Your Browser

This demo uses ONNX Runtime Web to run Vietnamese TTS completely in your browser - no server required!

Features

✅ 100% Browser-Based: All processing happens in your browser
✅ No Backend: Direct ONNX model inference using WebAssembly
✅ 5 Vietnamese Voices: NF, SF, NM1, SM, NM2 (Northern/Southern accents)
✅ Fast Loading: Models cached after first load (~165MB)
✅ Privacy-First: Your text never leaves your browser

How It Works

First Load: Downloads ONNX models from HuggingFace Hub (~165MB)
Text Input: Enter any Vietnamese text
Voice Selection: Choose from 5 regional voices
Real-Time Synthesis: ONNX Runtime Web generates audio in browser
Instant Playback: Listen to synthesized speech

Available Voices

Voice	Region	Gender	Description
NF	Northern (Bắc)	Female	Clear, formal
SF	Southern (Nam)	Female	Warm, friendly
NM1	Northern (Bắc)	Male	Professional
SM	Southern (Nam)	Male	Conversational
NM2	Northern (Bắc)	Male	Authoritative

Technical Details

ONNX Pipeline

Text Encoder: Phoneme encoding
Duration Predictor: Speech timing
Flow Model: Latent transformation
Decoder: Audio waveform generation (HiFi-GAN)

Vietnamese G2P

Uses ported viphoneme library in JavaScript
Accurate tone and phoneme mapping
99.96% accuracy vs Python reference

Browser Requirements

Chrome 90+, Firefox 90+, Edge 90+ (Full support)
Safari 15+ (Limited support)
WebAssembly and AudioContext API required

Model Info

Architecture: VITS (Conditional VAE)
Sample Rate: 24kHz
Model Size: 164.75 MB (ONNX)
Speakers: 5 (Northern/Southern Vietnamese)

Performance

First load may take 30-60 seconds to download models. Subsequent visits are instant (cached).

Synthesis speed depends on device:

Desktop: ~5-8 seconds per sentence
Mobile: ~10-15 seconds per sentence

Links

🎤 Gradio Demo - Full featured demo
📦 ONNX Models - Pre-trained models
🏠 GitHub - Source code
📱 Android App - Mobile deployment

Privacy

All processing happens locally in your browser. No data is sent to any server. Your text input and generated audio never leave your device.

Powered by Valtec AI Team | ONNX Runtime Web | WebAssembly