soprano-web-onnx / README.md
Nekochu's picture
Edit readme color
2f642c6 verified
metadata
title: Soprano TTS - Single File
emoji: 🎧
colorFrom: yellow
colorTo: purple
sdk: static
short_description: Single self-contained HTML file for browser TTS
app_file: index.html
pinned: false
license: apache-2.0
custom_headers:
  cross-origin-embedder-policy: require-corp
  cross-origin-opener-policy: same-origin
  cross-origin-resource-policy: cross-origin

Soprano TTS β€” Single Self-Contained File

Single-file version by Nekochu Β· Soprano by ekwek1 Β· ONNX port by KevinAHM Β· Apache-2.0


A single HTML file (~32KB) that provides real-time neural text-to-speech in the browser.

All code, styles, and logic are embedded in one file. Just open index.html and start using it.


✨ Features

  • πŸš€ Single File: Everything in one HTML file - no dependencies, no build process
  • 🎯 Fast Models: Uses INT8 quantized ONNX models for optimal performance
  • πŸ”Š Real-Time Streaming: Web Worker + AudioWorklet for smooth playback
  • πŸ“Š Live Metrics: Shows TTFB and Real-Time Factor
  • ☁️ CDN Models: Loads models directly from HuggingFace

πŸš€ Usage

Option 1: Direct Browser

Simply open index.html in a modern browser (Chrome, Edge, Firefox).

Option 2: Local Server

python -m http.server 8000

Then visit http://localhost:8000


πŸ“ How It Works

  1. Models (~112MB total) are loaded from HuggingFace CDN on first use
  2. Text preprocessing and ONNX inference runs in a Web Worker
  3. Audio chunks stream to AudioWorklet for smooth playback
  4. Browser caches models after first load

🎯 Models Used

  • Backbone: soprano_backbone_kv_int8.onnx (80.9 MB)
  • Decoder: soprano_decoder_int8.onnx (30.8 MB)
  • Tokenizer: Loaded from HuggingFace Transformers.js

All models load directly from HuggingFace CDN - no local files needed!


πŸ’‘ Credits


πŸ“œ License

Apache-2.0 (same as upstream Soprano)