omnivoice-models / README.md
AEmotionStudio's picture
Add README for Mæstræa mirror
1913353 verified
metadata
license: apache-2.0
tags:
  - text-to-speech
  - tts
  - voice-cloning
  - omnivoice
  - safetensors
  - maestraea
language:
  - multilingual
pipeline_tag: text-to-speech
base_model: k2-fsa/OmniVoice

OmniVoice (Mæstræa Mirror)

Multi-Lingual TTS & Voice Cloning — 600+ Languages

Original Model by k2-fsa (Next-gen Kaldi) · Apache 2.0

This is a mirror of the OmniVoice model weights for use with Mæstræa AI Workstation. All credits go to the original authors.

What's in This Repo

Path Description Size
model.safetensors Main OmniVoice model ~3 GB
audio_tokenizer/model.safetensors Audio tokenizer ~260 MB
tokenizer.json Text tokenizer ~17 MB
config.json Model configuration < 1 KB

What OmniVoice Does

OmniVoice is a multi-lingual TTS and voice cloning model supporting 600+ languages with near real-time inference (RTF ~0.025). It supports three modes:

  • Auto Voice — Generate speech from text with a default voice
  • Voice Cloning — Clone any voice from a 3–15s reference audio sample
  • Voice Design — Describe the desired voice characteristics in text

Key Features

  • 600+ language support
  • Near real-time inference
  • Long-form text auto-chunking for constant VRAM usage
  • ~3–8 GB VRAM depending on mode

Usage with Mæstræa

These models are automatically downloaded by the Mæstræa AI Workstation backend. They can also be loaded manually:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("AEmotionStudio/omnivoice-models")
tokenizer = AutoTokenizer.from_pretrained("AEmotionStudio/omnivoice-models")

License

Apache 2.0 — same as the original OmniVoice release.

Credits