EveryVoice Open Bible — Arabic Standard

A multispeaker text-to-speech model for Arabic Standard, trained from scratch on the Open Bible corpus using the EveryVoice TTS toolkit (FastSpeech2 acoustic model + HiFi-GAN vocoder, 22,050 Hz output).

The model is conditioned on speaker embeddings learned during training. A speaker name from the training set must be supplied at inference time.

Files

File	Purpose
`feature_prediction.ckpt`	Trained FastSpeech2 feature-prediction weights.
`vocoder.ckpt`	HiFi-GAN vocoder checkpoint (optional — can be replaced with a universal vocoder).
`config/`	EveryVoice YAML config files (shared data, text, feature-prediction, spec-to-wav).
`filelist.psv`	Pipe-separated training filelist (`basename

Intended use

Multispeaker TTS for Arabic Standard using one of the training-set speaker voices.
Research on multilingual TTS, low-resource TTS evaluation, and listening studies on Open Bible–style read-speech.

How to use

Install EveryVoice:

pip install everyvoice

Download the checkpoint and run inference:

import torch
from pathlib import Path
from huggingface_hub import snapshot_download

from everyvoice.config.type_definitions import DatasetTextRepresentation
from everyvoice.model.feature_prediction.FastSpeech2_lightning.fs2.cli.synthesize import (
    get_global_step,
    synthesize_helper,
)
from everyvoice.model.feature_prediction.FastSpeech2_lightning.fs2.model import FastSpeech2
from everyvoice.model.feature_prediction.FastSpeech2_lightning.fs2.type_definitions import (
    SynthesizeOutputFormats,
)
from everyvoice.model.vocoder.HiFiGAN_iSTFT_lightning.hfgl.utils import (
    load_hifigan_from_checkpoint,
)
from everyvoice.utils.heavy import get_device_from_accelerator

repo_id  = "multilingual-tts/EveryVoice-OpenBible-Arabic-Standard"
local    = Path(snapshot_download(repo_id))

ckpt_path    = local / "feature_prediction.ckpt"
vocoder_path = local / "vocoder.ckpt"

accelerator = "gpu" if torch.cuda.is_available() else "cpu"
device = get_device_from_accelerator(accelerator)

model = FastSpeech2.load_from_checkpoint(str(ckpt_path)).to(device)
model.eval()
global_step = get_global_step(ckpt_path)

vocoder_ckpt = torch.load(str(vocoder_path), map_location=device, weights_only=True)
vocoder_model, vocoder_config = load_hifigan_from_checkpoint(vocoder_ckpt, device)
vocoder_global_step = get_global_step(vocoder_path)

# Pick any speaker from the model
speaker = next(iter(model.speaker2id.keys()))
language = next(iter(model.lang2id.keys()))
print(f"Available speakers: {list(model.speaker2id.keys())}")

filelist_data = [
    {
        "basename":         "sample-0",
        "characters":       "...",   # text to synthesise in Arabic Standard
        "language":         language,
        "speaker":          speaker,
        "duration_control": 1.0,
    }
]

output_dir = Path("everyvoice_output")
output_dir.mkdir(exist_ok=True)

synthesize_helper(
    model=model,
    texts=None,
    style_reference=None,
    language=None,
    speaker=None,
    duration_control=1.0,
    global_step=global_step,
    output_type=[SynthesizeOutputFormats.wav],
    text_representation=DatasetTextRepresentation.characters,
    accelerator=accelerator,
    devices="auto",
    device=device,
    batch_size=1,
    num_workers=1,
    filelist=None,
    filelist_data=filelist_data,
    output_dir=output_dir,
    teacher_forcing_directory=None,
    vocoder_model=vocoder_model,
    vocoder_config=vocoder_config,
    vocoder_global_step=vocoder_global_step,
)
# Generated WAVs land in output_dir/wav/

Training data

Source: davidguzmanr/open-bible-resources, config Arabic Standard
Size: approximately 25,262 utterances
Speakers: multispeaker; speaker identity is fixed to one of the training-set voices and selected by name at inference time
Sample rate: 22,050 Hz

Training procedure

Acoustic model: FastSpeech2 (non-autoregressive, duration-prediction based).
Vocoder: HiFi-GAN (iSTFT variant).
Character-level tokenizer built from the training transcripts.
Trained with the EveryVoice toolkit.

Audio preprocessing and training are reproducible via the upstream open-bible-models repo.

Evaluation

Evaluated alongside other Open-Bible TTS systems on character/word error rate (via Meta's Omnilingual ASR) and UTMOSv2 naturalness scores. See the open-bible-models repository for the evaluation pipeline and the open-bible-surveys repository for the human-listening survey methodology.

Downloads last month: -; Downloads are not tracked for this model. How to track

Dataset used to train multilingual-tts/EveryVoice-OpenBible-Arabic-Standard

Collection including multilingual-tts/EveryVoice-OpenBible-Arabic-Standard

Open Bible EveryVoice

Collection

EveryVoice TTS models trained on OpenBible corpus. Please cite OpenBibleTTS: Large-Scale Speech Resources and TTS Models for Low-Resource Languages. • 37 items • Updated 28 days ago