Text-to-Speech
Transformers
Safetensors
Arabic
moss_tts_local
feature-extraction
voice-cloning
custom_code
sglang-omni
moss-tts
moss-tts-local
lora
saudi-arabic
Instructions to use Rabe3/Moss-Saudi-3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Rabe3/Moss-Saudi-3 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-speech", model="Rabe3/Moss-Saudi-3", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Rabe3/Moss-Saudi-3", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Moss-Saudi-3
This repository contains a Saudi Arabic LoRA fine-tune of
OpenMOSS-Team/MOSS-TTS-Local-Transformer-v1.5.
Artifacts:
- Root files: merged full model weights for direct
from_pretrainedand SGLang-Omni serving. lora_adapter/: the original PEFT LoRA adapter, with portable Hub metadata.training_summary.json: sanitized training and checkpoint metadata.
The model uses OpenMOSS-Team/MOSS-Audio-Tokenizer-v2 for 48 kHz stereo audio decoding.
SGLang-Omni
SGLang-Omni supports MossTTSLocalModel through the OpenAI-compatible
/v1/audio/speech endpoint.
sgl-omni serve \
--model-path Rabe3/Moss-Saudi-3 \
--allowed-media-domain huggingface.co \
--allowed-media-domain cas-bridge.xethub.hf.co \
--port 8000
Then request speech:
curl -X POST http://localhost:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"input": "Marhaba, this is a short Saudi Arabic TTS test."}' \
--output moss_saudi.wav
The included serve_sglang_omni.sh wrapper runs the same server command:
bash serve_sglang_omni.sh
Transformers
import torch
import torchaudio
from transformers import AutoModel, AutoProcessor
model_id = "Rabe3/Moss-Saudi-3"
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.bfloat16 if device == "cuda" else torch.float32
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
processor.audio_tokenizer = processor.audio_tokenizer.to(device)
model = AutoModel.from_pretrained(
model_id,
trust_remote_code=True,
dtype=dtype,
attn_implementation="sdpa" if device == "cuda" else "eager",
).to(device)
model.eval()
conversation = [[processor.build_user_message(
text="Marhaba, this is a short Saudi Arabic TTS test.",
language="Arabic",
)]]
batch = processor(conversation, mode="generation")
with torch.inference_mode():
outputs = model.generate(
input_ids=batch["input_ids"].to(device),
attention_mask=batch["attention_mask"].to(device),
max_new_tokens=4096,
do_sample=True,
audio_temperature=1.7,
audio_top_p=0.8,
audio_top_k=25,
)
message = processor.decode(outputs)[0]
audio = message.audio_codes_list[0].detach().cpu().to(torch.float32)
torchaudio.save("moss_saudi.wav", audio, processor.model_config.sampling_rate)
LoRA Adapter
The adapter remains available if you want to apply it manually:
import torch
from peft import PeftModel
from transformers import AutoModel
base = AutoModel.from_pretrained(
"OpenMOSS-Team/MOSS-TTS-Local-Transformer-v1.5",
trust_remote_code=True,
dtype=torch.bfloat16,
)
model = PeftModel.from_pretrained(base, "Rabe3/Moss-Saudi-3", subfolder="lora_adapter")
- Downloads last month
- 22