--- library_name: transformers license: apache-2.0 pipeline_tag: text-to-speech --- # Soprano: Instant, Ultra‑Realistic Text‑to‑Speech ## Note: this model is now outdated. Use [Soprano-1.1-80M](https://huggingface.co/ekwek/Soprano-1.1-80M) instead!

[![Alt Text](https://img.shields.io/badge/Github-Repo-black?logo=github)](https://github.com/ekwek1/soprano) [![Alt Text](https://img.shields.io/badge/HuggingFace-Demo-yellow?logo=huggingface)](https://huggingface.co/spaces/ekwek/Soprano-TTS)

### 📰 News **2026.01.14 - [Soprano-1.1-80M](https://huggingface.co/ekwek/Soprano-1.1-80M) released! 95% fewer hallucinations and a 63% preference rate over Soprano-80M.** 2026.01.13 - [Soprano-Factory](https://github.com/ekwek1/soprano-factory) released! You can now train/fine-tune your own Soprano models. 2025.12.22 - Soprano-80M released! [Code](https://github.com/ekwek1/soprano) | [Demo](https://huggingface.co/spaces/ekwek/Soprano-TTS) --- ## Overview **Soprano** is an ultra‑lightweight, on-device text‑to‑speech (TTS) model designed for expressive, high‑fidelity speech synthesis at unprecedented speed. Soprano was designed with the following features: - Up to **2000x** real-time generation on GPU and **20x** real-time on CPU - **Lossless streaming** with **<15 ms** latency on GPU, **<250 ms** on CPU - **<1 GB** memory usage with a compact 80M parameter architecture - **Infinite generation length** with automatic text splitting - Highly expressive, crystal clear audio generation at **32kHz** - Widespread support for CUDA, CPU, and MPS devices on Windows, Linux, and Mac - Supports WebUI, CLI, and OpenAI-compatible endpoint for easy and production-ready inference --- ## Installation ### Install with wheel (CUDA-only for now) ```bash pip install soprano-tts ``` To get the latest features, you can install from source instead. ### Install from source (CUDA) ```bash git clone https://github.com/ekwek1/soprano.git cd soprano pip install -e .[lmdeploy] ``` ### Install from source (CPU/MPS) ```bash git clone https://github.com/ekwek1/soprano.git cd soprano pip install -e . ``` > ### ⚠️ Warning: Windows CUDA users > > On Windows with CUDA, `pip` will install a CPU-only PyTorch build. To ensure CUDA support works as expected, reinstall PyTorch explicitly with the correct CUDA wheel **after** installing Soprano: > > ```bash > pip uninstall -y torch > pip install torch==2.8.0 --index-url https://download.pytorch.org/whl/cu128 > ``` --- ## Usage ### WebUI Start WebUI: ```bash soprano-webui # hosted on http://127.0.0.1:7860 by default ``` > **Tip:** You can increase cache size and decoder batch size to increase inference speed at the cost of higher memory usage. For example: > ```bash > soprano-webui --cache-size 1000 --decoder-batch-size 4 > ``` ### CLI ``` soprano "Soprano is an extremely lightweight text to speech model." optional arguments: --output, -o Output audio file path (non-streaming only). Defaults to 'output.wav' --model-path, -m Path to local model directory (optional) --device, -d Device to use for inference. Supported: auto, cuda, cpu, mps. Defaults to 'auto' --backend, -b Backend to use for inference. Supported: auto, transformers, lmdeploy. Defaults to 'auto' --cache-size, -c Cache size in MB (for lmdeploy backend). Defaults to 100 --decoder-batch-size, -bs Decoder batch size. Defaults to 1 --streaming, -s Enable streaming playback to speakers ``` > **Tip:** You can increase cache size and decoder batch size to increase inference speed at the cost of higher memory usage. > **Note:** The CLI will reload the model every time it is called. As a result, inference speed will be slower than other methods. ### OpenAI-compatible endpoint Start server: ```bash uvicorn soprano.server:app --host 0.0.0.0 --port 8000 ``` Use the endpoint like this: ```bash curl http://localhost:8000/v1/audio/speech \ -H "Content-Type: application/json" \ -d '{ "input": "Soprano is an extremely lightweight text to speech model." }' \ --output speech.wav ``` > **Note:** Currently, this endpoint only supports nonstreaming output. ### Python script ```python from soprano import SopranoTTS model = SopranoTTS(backend='auto', device='auto', cache_size_mb=100, decoder_batch_size=1) ``` > **Tip:** You can increase cache_size_mb and decoder_batch_size to increase inference speed at the cost of higher memory usage. ```python # Basic inference out = model.infer("Soprano is an extremely lightweight text to speech model.") # can achieve 2000x real-time with sufficiently long input! # Save output to a file out = model.infer("Soprano is an extremely lightweight text to speech model.", "out.wav") # Custom sampling parameters out = model.infer( "Soprano is an extremely lightweight text to speech model.", temperature=0.3, top_p=0.95, repetition_penalty=1.2, ) # Batched inference out = model.infer_batch(["Soprano is an extremely lightweight text to speech model."] * 10) # can achieve 2000x real-time with sufficiently large input size! # Save batch outputs to a directory out = model.infer_batch(["Soprano is an extremely lightweight text to speech model."] * 10, "/dir") # Streaming inference from soprano.utils.streaming import play_stream stream = model.infer_stream("Soprano is an extremely lightweight text to speech model.", chunk_size=1) play_stream(stream) # plays audio with <15 ms latency! ``` ## Usage tips: * Soprano works best when each sentence is between 2 and 15 seconds long. * Although Soprano recognizes numbers and some special characters, it occasionally mispronounces them. Best results can be achieved by converting these into their phonetic form. (1+1 -> one plus one, etc) * If Soprano produces unsatisfactory results, you can easily regenerate it for a new, potentially better generation. You may also change the sampling settings for more varied results. * Avoid improper grammar such as not using contractions, multiple spaces, etc. --- ## Limitations Soprano is currently English-only and does not support voice cloning. In addition, Soprano was trained on only 1,000 hours of audio (~100x less than other TTS models), so mispronunciation of uncommon words may occur. This is expected to diminish as Soprano is trained on more data. --- ## License This project is licensed under the **Apache-2.0** license. See `LICENSE` for details.