---
license: apache-2.0
library_name: transformers
pipeline_tag: text-to-speech
base_model:
- OpenMOSS-Team/MOSS-TTS-Local-Transformer-v1.5
tags:
- text-to-speech
- voice-cloning
- custom_code
- sglang-omni
- moss-tts
- moss-tts-local
- lora
- saudi-arabic
language:
- ar
---

# Moss-Saudi

This repository contains a Saudi Arabic LoRA fine-tune of
`OpenMOSS-Team/MOSS-TTS-Local-Transformer-v1.5`.

Artifacts:

- Root files: merged full model weights for direct `from_pretrained` and SGLang-Omni serving.
- `lora_adapter/`: the original PEFT LoRA adapter, with portable Hub metadata.
- `training_summary.json`: sanitized training and checkpoint metadata.

The model uses `OpenMOSS-Team/MOSS-Audio-Tokenizer-v2` for 48 kHz stereo audio decoding.

## SGLang-Omni

SGLang-Omni supports `MossTTSLocalModel` through the OpenAI-compatible
`/v1/audio/speech` endpoint.

```bash
sgl-omni serve \
  --model-path Rabe3/Moss-Saudi \
  --allowed-media-domain huggingface.co \
  --allowed-media-domain cas-bridge.xethub.hf.co \
  --port 8000
```

Then request speech:

```bash
curl -X POST http://localhost:8000/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Marhaba, this is a short Saudi Arabic TTS test."}' \
  --output moss_saudi.wav
```

The included `serve_sglang_omni.sh` wrapper runs the same server command:

```bash
bash serve_sglang_omni.sh
```

## Transformers

```python
import torch
import torchaudio
from transformers import AutoModel, AutoProcessor

model_id = "Rabe3/Moss-Saudi"
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.bfloat16 if device == "cuda" else torch.float32

processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
processor.audio_tokenizer = processor.audio_tokenizer.to(device)

model = AutoModel.from_pretrained(
    model_id,
    trust_remote_code=True,
    dtype=dtype,
    attn_implementation="sdpa" if device == "cuda" else "eager",
).to(device)
model.eval()

conversation = [[processor.build_user_message(
    text="Marhaba, this is a short Saudi Arabic TTS test.",
    language="Arabic",
)]]
batch = processor(conversation, mode="generation")

with torch.inference_mode():
    outputs = model.generate(
        input_ids=batch["input_ids"].to(device),
        attention_mask=batch["attention_mask"].to(device),
        max_new_tokens=4096,
        do_sample=True,
        audio_temperature=1.7,
        audio_top_p=0.8,
        audio_top_k=25,
    )

message = processor.decode(outputs)[0]
audio = message.audio_codes_list[0].detach().cpu().to(torch.float32)
torchaudio.save("moss_saudi.wav", audio, processor.model_config.sampling_rate)
```

## LoRA Adapter

The adapter remains available if you want to apply it manually:

```python
import torch
from peft import PeftModel
from transformers import AutoModel

base = AutoModel.from_pretrained(
    "OpenMOSS-Team/MOSS-TTS-Local-Transformer-v1.5",
    trust_remote_code=True,
    dtype=torch.bfloat16,
)
model = PeftModel.from_pretrained(base, "Rabe3/Moss-Saudi", subfolder="lora_adapter")
```