File size: 1,715 Bytes
e3875d9
 
 
2a5aeee
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e3875d9
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
---
license: apache-2.0
pipeline_tag: feature-extraction
---

# Soprano: Instant, Ultra‑Realistic Text‑to‑Speech

<div align="center">
  
  <img width="640" height="320" alt="soprano-github" src="https://github.com/user-attachments/assets/4d612eac-23b8-44e6-8c59-d7ac14ebafd1" />

  [![Alt Text](https://img.shields.io/badge/Github-Repo-black?logo=github)](https://github.com/ekwek1/soprano)
  [![Alt Text](https://img.shields.io/badge/HuggingFace-Demo-yellow?logo=huggingface)](https://huggingface.co/spaces/ekwek/Soprano-TTS)
</div>

### 📰 News
**2026.01.13 - [Soprano-Factory](https://github.com/ekwek1/soprano-factory) released! You can now train/fine-tune your own Soprano models.**  
2025.12.22 - Soprano-80M released! [Code](https://github.com/ekwek1/soprano) | [Demo](https://huggingface.co/spaces/ekwek/Soprano-TTS)

---

This repository contains **Soprano-Encoder**, which converts raw audio into audio tokens that the LLM backbone can recognize.

## Overview

**Soprano** is an ultra‑lightweight, on-device text‑to‑speech (TTS) model designed for expressive, high‑fidelity speech synthesis at unprecedented speed. Soprano was designed with the following features:
- Up to **2000x** real-time generation on GPU and **20x** real-time on CPU
- **Lossless streaming** with **<15 ms** latency on GPU, **<250 ms** on CPU
- **<1 GB** memory usage with a compact 80M parameter architecture
- **Infinite generation length** with automatic text splitting
- Highly expressive, crystal clear audio generation at **32kHz**
- Widespread support for CUDA, CPU, and MPS devices on Windows, Linux, and Mac
- Supports WebUI, CLI, and OpenAI-compatible endpoint for easy and production-ready inference

---