VieNeu-TTS — Asante Twi

A fine-tuned Twi text-to-speech model based on VieNeu-TTS-0.3B, trained on the Asante Twi Bible Speech dataset using LoRA.

Demo

Try it live on the HF Space.

Model files

File Description
model.safetensors Merged model weights (PyTorch, GPU/CPU)
VieNeu-TTS-Twi-Q4_K_M.gguf Quantized GGUF for fast CPU inference
voices.json Pre-encoded Twi voice presets

Usage

With the VieNeu SDK

Install from the fork that includes Twi language support:

pip install git+https://github.com/michsethowusu/VieNeu-TTS.git
pip install phonemizer
# system dep for phonemizer
sudo apt-get install espeak-ng

GPU inference:

from vieneu import Vieneu
import json, soundfile as sf
from huggingface_hub import hf_hub_download

# Load voice presets
voices_path = hf_hub_download("michsethowusu/VieNeu-TTS-Twi", "voices.json")
with open(voices_path) as f:
    voices = json.load(f)

tts = Vieneu(
    mode="standard",
    backbone_repo="michsethowusu/VieNeu-TTS-Twi",
    backbone_device="cuda",
    codec_repo="neuphonic/neucodec-onnx-decoder-int8",
    lang="twi",
    emotion=None,
)

audio = tts.infer(
    "Nanso Petro san hyɛɛ Kristofo nkuran sɛ monni nnipa nyinaa ni.",
    voice=voices["presets"]["twi_voice_0"],
)
sf.write("output.wav", audio, 24000)

CPU inference (GGUF — fast):

pip install llama-cpp-python

tts = Vieneu(
    mode="standard",
    backbone_repo="michsethowusu/VieNeu-TTS-Twi",
    backbone_device="cpu",
    gguf_filename="VieNeu-TTS-Twi-Q4_K_M.gguf",
    codec_repo="neuphonic/neucodec-onnx-decoder-int8",
    lang="twi",
    emotion=None,
)

Production API server (GPU):

pip install vieneu[gpu]
vieneu-serve \
  --model michsethowusu/VieNeu-TTS-Twi \
  --model-name michsethowusu/VieNeu-TTS-Twi \
  --port 23333

# Then connect from your app:
tts = Vieneu(
    mode="remote",
    api_base="http://your-server:23333/v1",
    model_name="michsethowusu/VieNeu-TTS-Twi",
    lang="twi",
    emotion=None,
)

Voice presets

The model ships with 5 voice presets (twi_voice_0 through twi_voice_4), all sampled from the training speaker. Load them from voices.json as shown above.

Training details

Setting Value
Base model pnnbao-ump/VieNeu-TTS-0.3B
Method LoRA (r=16, α=32)
Dataset ghananlpcommunity/asante-twi-bible-speech-text
Samples 7,000
Steps 5,000
Hardware A100 40GB
Phonemizer espeak-ng lfn backend
Format In-context voice cloning

Limitations

  • Optimised for Asante Twi (Akan). Other dialects may work but are untested.
  • Voice cloning works best with the included voices.json presets (same speaker as training data). External reference audio may produce lower quality.
  • Not intended for commercial use (CC BY-NC 4.0).

License

CC BY-NC 4.0 — non-commercial use only. Mention michsethowusu / GhanaNLP when using.

Downloads last month
259
Safetensors
Model size
0.2B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for michsethowusu/VieNeu-TTS-Twi

Adapter
(30)
this model

Dataset used to train michsethowusu/VieNeu-TTS-Twi

Space using michsethowusu/VieNeu-TTS-Twi 1