| --- |
| license: apache-2.0 |
| tags: |
| - text-to-speech |
| - tts |
| - voice-cloning |
| - omnivoice |
| - safetensors |
| - maestraea |
| language: |
| - multilingual |
| pipeline_tag: text-to-speech |
| base_model: k2-fsa/OmniVoice |
| --- |
| |
| # OmniVoice (Mæstræa Mirror) |
|
|
| **Multi-Lingual TTS & Voice Cloning — 600+ Languages** |
|
|
| [Original Model](https://huggingface.co/k2-fsa/OmniVoice) by [k2-fsa (Next-gen Kaldi)](https://github.com/k2-fsa) · Apache 2.0 |
|
|
| > This is a mirror of the OmniVoice model weights for use with [Mæstræa AI Workstation](https://github.com/AEmotionStudio/Maestraea). All credits go to the original authors. |
|
|
| ## What's in This Repo |
|
|
| | Path | Description | Size | |
| |------|-------------|------| |
| | `model.safetensors` | Main OmniVoice model | ~3 GB | |
| | `audio_tokenizer/model.safetensors` | Audio tokenizer | ~260 MB | |
| | `tokenizer.json` | Text tokenizer | ~17 MB | |
| | `config.json` | Model configuration | < 1 KB | |
|
|
| ## What OmniVoice Does |
|
|
| OmniVoice is a multi-lingual TTS and voice cloning model supporting 600+ languages with near real-time inference (RTF ~0.025). It supports three modes: |
|
|
| - **Auto Voice** — Generate speech from text with a default voice |
| - **Voice Cloning** — Clone any voice from a 3–15s reference audio sample |
| - **Voice Design** — Describe the desired voice characteristics in text |
|
|
| ### Key Features |
|
|
| - 600+ language support |
| - Near real-time inference |
| - Long-form text auto-chunking for constant VRAM usage |
| - ~3–8 GB VRAM depending on mode |
|
|
| ## Usage with Mæstræa |
|
|
| These models are automatically downloaded by the Mæstræa AI Workstation backend. They can also be loaded manually: |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| model = AutoModelForCausalLM.from_pretrained("AEmotionStudio/omnivoice-models") |
| tokenizer = AutoTokenizer.from_pretrained("AEmotionStudio/omnivoice-models") |
| ``` |
|
|
| ## License |
|
|
| Apache 2.0 — same as the original OmniVoice release. |
|
|
| ## Credits |
|
|
| - **Model**: [k2-fsa/OmniVoice](https://github.com/k2-fsa/OmniVoice) |
| - **Paper**: See original repo for citation |
| - **Mirror by**: [AEmotionStudio](https://huggingface.co/AEmotionStudio) |
|
|