Add README for Mæstræa mirror

1913353 verified about 1 month ago

2.11 kB

license: apache-2.0
tags:
  - text-to-speech
  - tts
  - voice-cloning
  - omnivoice
  - safetensors
  - maestraea
language:
  - multilingual
pipeline_tag: text-to-speech
base_model: k2-fsa/OmniVoice

OmniVoice (Mæstræa Mirror)

Multi-Lingual TTS & Voice Cloning — 600+ Languages

Original Model by k2-fsa (Next-gen Kaldi) · Apache 2.0

This is a mirror of the OmniVoice model weights for use with Mæstræa AI Workstation. All credits go to the original authors.

What's in This Repo

Path	Description	Size
`model.safetensors`	Main OmniVoice model	~3 GB
`audio_tokenizer/model.safetensors`	Audio tokenizer	~260 MB
`tokenizer.json`	Text tokenizer	~17 MB
`config.json`	Model configuration	< 1 KB

What OmniVoice Does

OmniVoice is a multi-lingual TTS and voice cloning model supporting 600+ languages with near real-time inference (RTF ~0.025). It supports three modes:

Auto Voice — Generate speech from text with a default voice
Voice Cloning — Clone any voice from a 3–15s reference audio sample
Voice Design — Describe the desired voice characteristics in text

Key Features

600+ language support
Near real-time inference
Long-form text auto-chunking for constant VRAM usage
~3–8 GB VRAM depending on mode

Usage with Mæstræa

These models are automatically downloaded by the Mæstræa AI Workstation backend. They can also be loaded manually:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("AEmotionStudio/omnivoice-models")
tokenizer = AutoTokenizer.from_pretrained("AEmotionStudio/omnivoice-models")

License

Apache 2.0 — same as the original OmniVoice release.

Credits

Model: k2-fsa/OmniVoice
Paper: See original repo for citation
Mirror by: AEmotionStudio