valtecAI-team's picture
Upload README.md with huggingface_hub
48cee3a verified
metadata
title: Valtec Vietnamese TTS Web Demo
emoji: 🌐
colorFrom: blue
colorTo: purple
sdk: static
pinned: false
license: mit

Valtec Vietnamese TTS - Browser Demo

🌐 Vietnamese Text-to-Speech Running Entirely in Your Browser

This demo uses ONNX Runtime Web to run Vietnamese TTS completely in your browser - no server required!

Features

  • 100% Browser-Based: All processing happens in your browser
  • No Backend: Direct ONNX model inference using WebAssembly
  • 5 Vietnamese Voices: NF, SF, NM1, SM, NM2 (Northern/Southern accents)
  • Fast Loading: Models cached after first load (~165MB)
  • Privacy-First: Your text never leaves your browser

How It Works

  1. First Load: Downloads ONNX models from HuggingFace Hub (~165MB)
  2. Text Input: Enter any Vietnamese text
  3. Voice Selection: Choose from 5 regional voices
  4. Real-Time Synthesis: ONNX Runtime Web generates audio in browser
  5. Instant Playback: Listen to synthesized speech

Available Voices

Voice Region Gender Description
NF Northern (Bắc) Female Clear, formal
SF Southern (Nam) Female Warm, friendly
NM1 Northern (Bắc) Male Professional
SM Southern (Nam) Male Conversational
NM2 Northern (Bắc) Male Authoritative

Technical Details

ONNX Pipeline

  • Text Encoder: Phoneme encoding
  • Duration Predictor: Speech timing
  • Flow Model: Latent transformation
  • Decoder: Audio waveform generation (HiFi-GAN)

Vietnamese G2P

  • Uses ported viphoneme library in JavaScript
  • Accurate tone and phoneme mapping
  • 99.96% accuracy vs Python reference

Browser Requirements

  • Chrome 90+, Firefox 90+, Edge 90+ (Full support)
  • Safari 15+ (Limited support)
  • WebAssembly and AudioContext API required

Model Info

  • Architecture: VITS (Conditional VAE)
  • Sample Rate: 24kHz
  • Model Size: 164.75 MB (ONNX)
  • Speakers: 5 (Northern/Southern Vietnamese)

Performance

First load may take 30-60 seconds to download models. Subsequent visits are instant (cached).

Synthesis speed depends on device:

  • Desktop: ~5-8 seconds per sentence
  • Mobile: ~10-15 seconds per sentence

Links

Privacy

All processing happens locally in your browser. No data is sent to any server. Your text input and generated audio never leave your device.


Powered by Valtec AI Team | ONNX Runtime Web | WebAssembly