| | --- |
| | language: |
| | - he |
| | tags: |
| | - text-to-speech |
| | - tts |
| | - hebrew |
| | - audio |
| | - fast-inference |
| | license: mit |
| | datasets: |
| | - notmax123/RanLevi40h |
| | --- |
| | |
| | # LightBlue TTS 馃嚠馃嚤 |
| |
|
| | ## Model Description |
| |
|
| | LightBlue is a state-of-the-art, lightning-fast Text-to-Speech (TTS) model built from scratch specifically for Hebrew (with English support). It is designed to produce 100% native Israeli-sounding speech with perfect handling of *Nikud* (vowels) and complex homographs, without compromising on inference speed. |
| |
|
| | It is fast enough to generate an entire 1-hour audiobook in just **3 seconds** on a modern GPU. |
| |
|
| | - **Developer:** LightBlue TTS |
| | - **Language(s):** Hebrew (Primary), English |
| | - **Model Type:** Text-to-Speech (TTS) |
| | - **Demo & Website:** [https://lightbluetts.com/](https://lightbluetts.com/) |
| | - **GitHub Repository:** [https://github.com/maxmelichov/Light-BlueTTS](https://github.com/maxmelichov/Light-BlueTTS) |
| |
|
| | ## Key Features |
| |
|
| | - **Blazing Fast Inference:** |
| | - **1260x real-time** on an NVIDIA RTX 3090 (21 minutes of audio generated per second). |
| | - **35x real-time** on standard CPUs. |
| | - **20x real-time** on Apple M1 chips. |
| | - **Native Hebrew Quality:** Features a real Israeli accent, correct stress placements, and native-level flow. |
| | - **Advanced Contextual Understanding:** Passes the "Homograph Test" (e.g., correctly distinguishing between *爪驻讛* as "watched" vs "floated", or *转专讚* as "spinach" vs "go down"). |
| | - **Multiple Voices:** Includes high-quality voices like *Yonatan* (Hebrew only) and *Rotem*. |
| |
|
| | ## Uses |
| |
|
| | ### Direct Use |
| | - Generating high-quality Hebrew audio from text. |
| | - Real-time TTS applications running on standard CPUs or edge devices. |
| | - Audiobooks, accessibility tools, virtual assistants, and automated broadcasting. |
| |
|
| | ## Speed Benchmarks |
| |
|
| | LightBlue is optimized for extreme speed without sacrificing naturalness: |
| |
|
| | | Hardware | Speed | Time for 1 Hour of Audio | |
| | | :--- | :--- | :--- | |
| | | **NVIDIA RTX 3090** | 1260x real-time | ~3 seconds | |
| | | **Standard CPU** | 35x real-time | ~1.7 minutes | |
| | | **Apple M1** | 20x real-time | ~3 minutes | |
| |
|
| | ## How to Get Started |
| |
|
| | To use this model, you can clone the official GitHub repository and install the requirements: |
| |
|