thelamapi
/

neuvoice

Model card Files Files and versions

xet

Community

Lamapi commited on 18 days ago

Commit

e05980b

verified ·

1 Parent(s): 555672f

Update README.md

Browse files

Files changed (1) hide show

README.md +143 -3

README.md CHANGED Viewed

@@ -1,3 +1,143 @@
----
-license: apache-2.0
----

+# Neuvoice — Fast, On-Device Neural TTS by Lamapi
+**Neuvoice** is a lightweight, on-device text-to-speech library that runs entirely via ONNX Runtime — no cloud calls, no latency surprises. Built by [Lamapi](https://huggingface.co/thelamapi), it ships with a curated voice library and supports 31 languages out of the box.
+## Quick Start
+```bash
+pip install neuvoice
+```
+```python
+from neuvoice import TTS
+tts = TTS(auto_download=True)
+style = tts.get_voice_style("Alina")
+text = "Hello! Welcome to Neuvoice — fast, private, on-device speech synthesis."
+wav, duration = tts.synthesize(text, voice_style=style, lang="en")
+tts.save_audio(wav, "output.wav")
+print(f"Generated {duration:.2f}s of audio")
+```
+On first run, model assets are downloaded and cached automatically under `~/.cache/neuvoice`.
+## Highlights
+**31 supported languages.** Covers a wide range of scripts and regions, from European languages to Arabic, Hindi, Japanese, Korean, Vietnamese, and more.
+**Runs entirely on-device.** ONNX Runtime powers inference — no API keys, no network dependency after the initial model download. CPU is sufficient; GPU acceleration is supported when available.
+**Rich voice library.** Ships with voices including Alina, Cem, Cole, Giray, Leon, Lina, Linda, Mustafa, Sarp, Selin, Sema, and Soras. Each voice is a compact style embedding — load any of them by name in a single call.
+**Inline expression tags.** Embed `<happy>`, `<laugh>`, `<breath>`, `<sad>`, and other tags directly in your text to shape the delivery without any extra parameters.
+**Long-form synthesis.** Inputs are automatically chunked, synthesized, and rejoined with configurable silence — no manual splitting required.
+## Supported Languages
+| Code | Language | Code | Language | Code | Language | Code | Language |
+|------|----------|------|----------|------|----------|------|----------|
+| `en` | English | `ko` | Korean | `ja` | Japanese | `ar` | Arabic |
+| `bg` | Bulgarian | `cs` | Czech | `da` | Danish | `de` | German |
+| `el` | Greek | `es` | Spanish | `et` | Estonian | `fi` | Finnish |
+| `fr` | French | `hi` | Hindi | `hr` | Croatian | `hu` | Hungarian |
+| `id` | Indonesian | `it` | Italian | `lt` | Lithuanian | `lv` | Latvian |
+| `nl` | Dutch | `pl` | Polish | `pt` | Portuguese | `ro` | Romanian |
+| `ru` | Russian | `sk` | Slovak | `sl` | Slovenian | `sv` | Swedish |
+| `tr` | Turkish | `uk` | Ukrainian | `vi` | Vietnamese | `na` | (fallback) |
+## Available Voices
+```python
+tts = TTS()
+print(tts.voice_style_names)
+# ['Alina', 'Cem', 'Cole', 'Giray', 'Leon', 'Lina', 'Linda',
+#  'Mustafa', 'Sarp', 'Selin', 'Sema', 'Soras']
+```
+Load any voice by name:
+```python
+style = tts.get_voice_style("Selin")
+```
+Or from a custom style file:
+```python
+style = tts.get_voice_style_from_path("/path/to/my_voice.json")
+```
+## `synthesize()` Parameters
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `text` | `str` | — | Input text. Supports inline tags like `<happy>`. |
+| `voice_style` | `VoiceStyle` | — | Voice loaded via `get_voice_style()`. |
+| `total_steps` | `int` | `5` | Flow-matching denoising steps. Higher = better quality, slower inference. Range: 1–100. |
+| `speed` | `float` | `1.05` | Playback speed multiplier. Range: 0.7–2.0. |
+| `lang` | `str` | `"en"` | ISO 639-1 language code, or `"na"` for unknown languages. |
+| `max_chunk_length` | `int` | `300` | Max characters per synthesis chunk (120 for Korean). |
+| `silence_duration` | `float` | `0.3` | Seconds of silence inserted between chunks. |
+| `verbose` | `bool` | `False` | Print per-chunk progress to stdout. |
+**Returns:** `(waveform, duration)` — waveform as a `(1, samples)` NumPy array, duration in seconds.
+## Examples
+**Multilingual synthesis:**
+```python
+tts = TTS()
+style = tts.get_voice_style("Leon")
+pairs = [
+    ("Merhaba! Bugün hava çok güzel.", "tr"),
+    ("Bonjour! Il fait beau aujourd'hui.", "fr"),
+    ("こんにちは！今日はいい天気ですね。", "ja"),
+]
+for text, lang in pairs:
+    wav, dur = tts.synthesize(text, voice_style=style, lang=lang)
+    tts.save_audio(wav, f"output_{lang}.wav")
+```
+**Expression tags:**
+```python
+style = tts.get_voice_style("Lina")
+text = "Good news! <happy> We just shipped the feature. <laugh> Don't tell anyone yet."
+wav, dur = tts.synthesize(text, voice_style=style, lang="en")
+```
+**Higher quality with more steps:**
+```python
+wav, dur = tts.synthesize(
+    text="A slow, deliberate reading for maximum clarity.",
+    voice_style=style,
+    total_steps=30,
+    speed=0.9,
+    lang="en",
+)
+```
+## Configuration
+Model cache location and thread counts can be controlled via environment variables:
+| Variable | Description |
+|----------|-------------|
+| `NEUVOICE_CACHE_DIR` | Override the default cache directory (`~/.cache/neuvoice`). |
+| `NEUVOICE_MODEL_REPO` | Override the Hugging Face model repository. |
+| `NEUVOICE_REVISION` | Model revision/branch to use (default: `main`). |
+| `NEUVOICE_INTRA_THREADS` | ONNX intra-op thread count (default: auto). |
+| `NEUVOICE_INTER_THREADS` | ONNX inter-op thread count (default: auto). |
+## License
+Neuvoice is released under the MIT License. See [LICENSE](LICENSE) for details.
+The bundled ONNX model is released under the OpenRAIL-M License.
+Copyright © 2026 Lamapi