Spaces:
Running
Running
metadata
title: Soprano TTS - Single File
emoji: π§
colorFrom: yellow
colorTo: purple
sdk: static
short_description: Single self-contained HTML file for browser TTS
app_file: index.html
pinned: false
license: apache-2.0
custom_headers:
cross-origin-embedder-policy: require-corp
cross-origin-opener-policy: same-origin
cross-origin-resource-policy: cross-origin
Soprano TTS β Single Self-Contained File
Single-file version by Nekochu Β· Soprano by ekwek1 Β· ONNX port by KevinAHM Β· Apache-2.0
A single HTML file (~32KB) that provides real-time neural text-to-speech in the browser.
All code, styles, and logic are embedded in one file. Just open index.html and start using it.
β¨ Features
- π Single File: Everything in one HTML file - no dependencies, no build process
- π― Fast Models: Uses INT8 quantized ONNX models for optimal performance
- π Real-Time Streaming: Web Worker + AudioWorklet for smooth playback
- π Live Metrics: Shows TTFB and Real-Time Factor
- βοΈ CDN Models: Loads models directly from HuggingFace
π Usage
Option 1: Direct Browser
Simply open index.html in a modern browser (Chrome, Edge, Firefox).
Option 2: Local Server
python -m http.server 8000
Then visit http://localhost:8000
π How It Works
- Models (~112MB total) are loaded from HuggingFace CDN on first use
- Text preprocessing and ONNX inference runs in a Web Worker
- Audio chunks stream to AudioWorklet for smooth playback
- Browser caches models after first load
π― Models Used
- Backbone:
soprano_backbone_kv_int8.onnx(80.9 MB) - Decoder:
soprano_decoder_int8.onnx(30.8 MB) - Tokenizer: Loaded from HuggingFace Transformers.js
All models load directly from HuggingFace CDN - no local files needed!
π‘ Credits
- Original Soprano: ekwek1/soprano
- ONNX Port: KevinAHM/soprano-web-onnx
- Combined into one HTML: Nekochu (this space)
π License
Apache-2.0 (same as upstream Soprano)