YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Habibi-TTS ALG Production Optimization

Production-ready optimization scripts for the Habibi-TTS Algerian Arabic (ALG) specialized model.

Research Validation Summary

Claim Status Evidence
EPSS / get_epss_timesteps() TRUE Built into F5-TTS v1.1.20+ since May 2025. Auto-applies when steps ∈ {5,6,7,10,12,16}
Sway Sampling TRUE Native to F5-TTS, default sway_sampling_coef=-1.0
F5R-TTS (29.5% WER) UNCONFIRMED Paper not publicly indexed. GRPO for TTS validated by DMOSpeech 2 (~10% WER improvement)
Triton/TensorRT in F5-TTS FALSE No off-the-shelf support. Triton runtime exists but requires manual setup
SGLang/vLLM for TTS FALSE Architecturally incompatible. F5-TTS is DiT+flow-matching, not autoregressive LLM
TGI maintenance mode TRUE Official HF docs confirm maintenance mode, recommend vLLM/SGLang for LLMs
FP8 on A10G FALSE A10G (Ampere/SM80) does NOT support FP8. Use BF16 + INT8 instead
Arabic diacritization TRUE Sadeed (Misraj/Sadeed) is SOTA for MSA. Algerian dialect needs dialect-aware preprocessing

Scripts

01_epss_optimization.py

EPSS (Empirically Pruned Step Sampling) - 4x speedup with minimal quality loss.

  • NFE=7 is the sweet spot (from arxiv:2505.19931)
  • Built into F5-TTS, auto-applies when use_epss=True

02_bf16_compile_optimization.py

BF16 inference + torch.compile for A10G.

  • BF16: ~2x faster than FP32, zero quality loss
  • torch.compile: ~20-30% additional speedup
  • Combined: RTF ~0.016-0.018 on A10G

03_arabic_preprocessing.py

Algerian Arabic text preprocessing pipeline.

  • Numeral normalization (Eastern/Western → Arabic words)
  • French/Arabic code-switching handling
  • Diacritization via Sadeed model
  • Text caching for repeated phrases

04_streaming_server.py

FastAPI streaming TTS server.

  • Sentence-level chunking
  • Opus-encoded streaming output
  • Sub-500ms time-to-first-audio
  • Health and info endpoints

05_quantization.py

INT8 weight-only quantization for A10G.

  • INT8 W8A16 (weights quantized, activations BF16)
  • ~50% memory reduction vs BF16
  • ~1.3-1.5x speedup (memory bandwidth bound)

Recommended Priority Stack

Priority Action Expected RTF on A10G
1 EPSS NFE=7 0.030
2 BF16 0.022
3 torch.compile 0.016-0.018
4 Sentence streaming Sub-500ms TTFA
5 INT8 quantization 0.012-0.014

Hardware: A10G (24GB VRAM)

  • Single A10G can run 2 model replicas with BF16
  • ~30-50 concurrent short utterances
  • Always use BF16 (native support, no quality loss)
  • Do NOT use FP8 (A10G lacks FP8 tensor cores)

Model Info

  • Base: SWivid/F5-TTS (F5TTS_v1_Base)
  • Fine-tuned: SWivid/Habibi-TTS/Specialized/ALG
  • Checkpoint: model_100000.safetensors
  • Vocab: Specialized/ALG/vocab.txt
  • License: Apache 2.0 (ALG dialect)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for medyas/Habibi-TTS-ALG-Prod