Instructions to use multilingual-tts/F5-TTS-OpenBible-Arabic-Standard with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- F5-TTS
How to use multilingual-tts/F5-TTS-OpenBible-Arabic-Standard with F5-TTS:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
F5-TTS Open Bible — Arabic Standard
A zero-shot text-to-speech model for Arabic Standard, trained from scratch on the Open Bible corpus using the F5-TTS architecture (diffusion transformer with vocos vocoder, 24 kHz output).
The model takes a short reference audio clip (5–10 seconds) and a target text, and synthesises the target text in the voice of the reference speaker. No fine-tuning per voice is required.
Files
| File | Purpose |
|---|---|
model_last.pt |
Trained model weights. |
vocab.txt |
Character vocabulary built from the training transcripts. |
F5-TTS_OpenBible_Arabic-Standard.yaml |
Hydra training/inference config (architecture, mel spec settings, tokenizer). |
Intended use
- Zero-shot TTS for Arabic Standard, controlled by a user-supplied reference clip.
- Research on multilingual TTS, low-resource TTS evaluation, and listening studies on Open Bible–style read-speech.
How to use
Install F5-TTS:
pip install git+https://github.com/SWivid/F5-TTS.git
Download the checkpoint and run inference:
import torch
from huggingface_hub import hf_hub_download
from hydra.utils import get_class
from omegaconf import OmegaConf
from f5_tts.infer.utils_infer import infer_process, load_model, load_vocoder, preprocess_ref_audio_text
repo_id = "multilingual-tts/F5-TTS-OpenBible-Arabic-Standard"
ckpt = hf_hub_download(repo_id, "model_last.pt")
vocab = hf_hub_download(repo_id, "vocab.txt")
config = hf_hub_download(repo_id, "F5-TTS_OpenBible_Arabic-Standard.yaml")
device = "cuda" if torch.cuda.is_available() else "cpu"
model_cfg = OmegaConf.load(config)
model_cls = get_class(f"f5_tts.model.{model_cfg.model.backbone}")
vocoder = load_vocoder(vocoder_name="vocos", is_local=False, device=device)
model = load_model(
model_cls, model_cfg.model.arch, ckpt,
mel_spec_type="vocos", vocab_file=vocab, use_ema=True, device=device,
)
# Supply your own clean reference clip — 5–10 s, single speaker and its transcription.
ref_audio = "/path/to/your-arabic-standard-clip.wav"
ref_text = "Exact transcription of the clip"
gen_text = "..." # text to synthesise in Arabic Standard
ref_audio_proc, ref_text_proc = preprocess_ref_audio_text(ref_audio, ref_text)
wav, sr, _ = infer_process(
ref_audio_proc, ref_text_proc, gen_text, model, vocoder,
mel_spec_type="vocos", device=device,
)
Training data
- Source:
davidguzmanr/open-bible-resources, configArabic Standard - Size: approximately 25,262 utterances
- Speakers: multispeaker; speaker identity is supplied at inference time via the reference clip, not by a fixed speaker id
- Sample rate: 24 kHz
- Maximum utterance duration during training: 15 s
Training procedure
- Base architecture: F5-TTS v1 Base (DiT, 1024 dim, 22 layers, 16 heads, text dim 512, 4 convolutional layers).
- Tokenizer: custom character-level, built from the training transcripts.
- Vocoder: vocos.
- Mel spectrogram: 100 channels, hop 256, win 1024, n_fft 1024.
- Optimizer: AdamW, learning rate 7.5e-5, 20 000 warmup updates.
- Training budget: 500,000 optimizer updates on 4 GPUs with mixed precision (bf16), global batch ≈ 112,000 frames.
Audio preprocessing, vocab generation, and config sizing are reproducible via the upstream open-bible-models repo.
Evaluation
Evaluated alongside other Open-Bible TTS systems on character/word error rate (via Meta's Omnilingual ASR) and UTMOSv2 naturalness scores. See the open-bible-models repository for the evaluation pipeline and the open-bible-surveys repository for the human-listening survey methodology.
- Downloads last month
- -
Model tree for multilingual-tts/F5-TTS-OpenBible-Arabic-Standard
Base model
SWivid/F5-TTS