Text-to-Speech
Transformers
Safetensors
Arabic
moss_tts_local
feature-extraction
voice-cloning
custom_code
sglang-omni
moss-tts
moss-tts-local
lora
saudi-arabic
Instructions to use Rabe3/Moss-Saudi with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Rabe3/Moss-Saudi with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-speech", model="Rabe3/Moss-Saudi", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Rabe3/Moss-Saudi", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| library_name: transformers | |
| pipeline_tag: text-to-speech | |
| base_model: | |
| - OpenMOSS-Team/MOSS-TTS-Local-Transformer-v1.5 | |
| tags: | |
| - text-to-speech | |
| - voice-cloning | |
| - custom_code | |
| - sglang-omni | |
| - moss-tts | |
| - moss-tts-local | |
| - lora | |
| - saudi-arabic | |
| language: | |
| - ar | |
| # Moss-Saudi | |
| This repository contains a Saudi Arabic LoRA fine-tune of | |
| `OpenMOSS-Team/MOSS-TTS-Local-Transformer-v1.5`. | |
| Artifacts: | |
| - Root files: merged full model weights for direct `from_pretrained` and SGLang-Omni serving. | |
| - `lora_adapter/`: the original PEFT LoRA adapter, with portable Hub metadata. | |
| - `training_summary.json`: sanitized training and checkpoint metadata. | |
| The model uses `OpenMOSS-Team/MOSS-Audio-Tokenizer-v2` for 48 kHz stereo audio decoding. | |
| ## SGLang-Omni | |
| SGLang-Omni supports `MossTTSLocalModel` through the OpenAI-compatible | |
| `/v1/audio/speech` endpoint. | |
| ```bash | |
| sgl-omni serve \ | |
| --model-path Rabe3/Moss-Saudi \ | |
| --allowed-media-domain huggingface.co \ | |
| --allowed-media-domain cas-bridge.xethub.hf.co \ | |
| --port 8000 | |
| ``` | |
| Then request speech: | |
| ```bash | |
| curl -X POST http://localhost:8000/v1/audio/speech \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"input": "Marhaba, this is a short Saudi Arabic TTS test."}' \ | |
| --output moss_saudi.wav | |
| ``` | |
| The included `serve_sglang_omni.sh` wrapper runs the same server command: | |
| ```bash | |
| bash serve_sglang_omni.sh | |
| ``` | |
| ## Transformers | |
| ```python | |
| import torch | |
| import torchaudio | |
| from transformers import AutoModel, AutoProcessor | |
| model_id = "Rabe3/Moss-Saudi" | |
| device = "cuda" if torch.cuda.is_available() else "cpu" | |
| dtype = torch.bfloat16 if device == "cuda" else torch.float32 | |
| processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True) | |
| processor.audio_tokenizer = processor.audio_tokenizer.to(device) | |
| model = AutoModel.from_pretrained( | |
| model_id, | |
| trust_remote_code=True, | |
| dtype=dtype, | |
| attn_implementation="sdpa" if device == "cuda" else "eager", | |
| ).to(device) | |
| model.eval() | |
| conversation = [[processor.build_user_message( | |
| text="Marhaba, this is a short Saudi Arabic TTS test.", | |
| language="Arabic", | |
| )]] | |
| batch = processor(conversation, mode="generation") | |
| with torch.inference_mode(): | |
| outputs = model.generate( | |
| input_ids=batch["input_ids"].to(device), | |
| attention_mask=batch["attention_mask"].to(device), | |
| max_new_tokens=4096, | |
| do_sample=True, | |
| audio_temperature=1.7, | |
| audio_top_p=0.8, | |
| audio_top_k=25, | |
| ) | |
| message = processor.decode(outputs)[0] | |
| audio = message.audio_codes_list[0].detach().cpu().to(torch.float32) | |
| torchaudio.save("moss_saudi.wav", audio, processor.model_config.sampling_rate) | |
| ``` | |
| ## LoRA Adapter | |
| The adapter remains available if you want to apply it manually: | |
| ```python | |
| import torch | |
| from peft import PeftModel | |
| from transformers import AutoModel | |
| base = AutoModel.from_pretrained( | |
| "OpenMOSS-Team/MOSS-TTS-Local-Transformer-v1.5", | |
| trust_remote_code=True, | |
| dtype=torch.bfloat16, | |
| ) | |
| model = PeftModel.from_pretrained(base, "Rabe3/Moss-Saudi", subfolder="lora_adapter") | |
| ``` | |