Moss-Saudi-3

This repository contains a Saudi Arabic LoRA fine-tune of OpenMOSS-Team/MOSS-TTS-Local-Transformer-v1.5.

Artifacts:

  • Root files: merged full model weights for direct from_pretrained and SGLang-Omni serving.
  • lora_adapter/: the original PEFT LoRA adapter, with portable Hub metadata.
  • training_summary.json: sanitized training and checkpoint metadata.

The model uses OpenMOSS-Team/MOSS-Audio-Tokenizer-v2 for 48 kHz stereo audio decoding.

SGLang-Omni

SGLang-Omni supports MossTTSLocalModel through the OpenAI-compatible /v1/audio/speech endpoint.

sgl-omni serve \
  --model-path Rabe3/Moss-Saudi-3 \
  --allowed-media-domain huggingface.co \
  --allowed-media-domain cas-bridge.xethub.hf.co \
  --port 8000

Then request speech:

curl -X POST http://localhost:8000/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Marhaba, this is a short Saudi Arabic TTS test."}' \
  --output moss_saudi.wav

The included serve_sglang_omni.sh wrapper runs the same server command:

bash serve_sglang_omni.sh

Transformers

import torch
import torchaudio
from transformers import AutoModel, AutoProcessor

model_id = "Rabe3/Moss-Saudi-3"
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.bfloat16 if device == "cuda" else torch.float32

processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
processor.audio_tokenizer = processor.audio_tokenizer.to(device)

model = AutoModel.from_pretrained(
    model_id,
    trust_remote_code=True,
    dtype=dtype,
    attn_implementation="sdpa" if device == "cuda" else "eager",
).to(device)
model.eval()

conversation = [[processor.build_user_message(
    text="Marhaba, this is a short Saudi Arabic TTS test.",
    language="Arabic",
)]]
batch = processor(conversation, mode="generation")

with torch.inference_mode():
    outputs = model.generate(
        input_ids=batch["input_ids"].to(device),
        attention_mask=batch["attention_mask"].to(device),
        max_new_tokens=4096,
        do_sample=True,
        audio_temperature=1.7,
        audio_top_p=0.8,
        audio_top_k=25,
    )

message = processor.decode(outputs)[0]
audio = message.audio_codes_list[0].detach().cpu().to(torch.float32)
torchaudio.save("moss_saudi.wav", audio, processor.model_config.sampling_rate)

LoRA Adapter

The adapter remains available if you want to apply it manually:

import torch
from peft import PeftModel
from transformers import AutoModel

base = AutoModel.from_pretrained(
    "OpenMOSS-Team/MOSS-TTS-Local-Transformer-v1.5",
    trust_remote_code=True,
    dtype=torch.bfloat16,
)
model = PeftModel.from_pretrained(base, "Rabe3/Moss-Saudi-3", subfolder="lora_adapter")
Downloads last month
22
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Rabe3/Moss-Saudi-3

Adapter
(3)
this model