Rabe3
/

Moss-Saudi

feature-extraction

Model card Files Files and versions

Moss-Saudi / README.md

Rabe3's picture

Upload Moss-Saudi LoRA and merged weights

d56d515 verified 7 days ago

|

History Blame Contribute Delete

3.04 kB

	---
	license: apache-2.0
	library_name: transformers
	pipeline_tag: text-to-speech
	base_model:
	- OpenMOSS-Team/MOSS-TTS-Local-Transformer-v1.5
	tags:
	- text-to-speech
	- voice-cloning
	- custom_code
	- sglang-omni
	- moss-tts
	- moss-tts-local
	- lora
	- saudi-arabic
	language:
	- ar
	---

	# Moss-Saudi

	This repository contains a Saudi Arabic LoRA fine-tune of
	`OpenMOSS-Team/MOSS-TTS-Local-Transformer-v1.5`.

	Artifacts:

	- Root files: merged full model weights for direct `from_pretrained` and SGLang-Omni serving.
	- `lora_adapter/`: the original PEFT LoRA adapter, with portable Hub metadata.
	- `training_summary.json`: sanitized training and checkpoint metadata.

	The model uses `OpenMOSS-Team/MOSS-Audio-Tokenizer-v2` for 48 kHz stereo audio decoding.

	## SGLang-Omni

	SGLang-Omni supports `MossTTSLocalModel` through the OpenAI-compatible
	`/v1/audio/speech` endpoint.

	```bash
	sgl-omni serve \
	--model-path Rabe3/Moss-Saudi \
	--allowed-media-domain huggingface.co \
	--allowed-media-domain cas-bridge.xethub.hf.co \
	--port 8000
	```

	Then request speech:

	```bash
	curl -X POST http://localhost:8000/v1/audio/speech \
	-H "Content-Type: application/json" \
	-d '{"input": "Marhaba, this is a short Saudi Arabic TTS test."}' \
	--output moss_saudi.wav
	```

	The included `serve_sglang_omni.sh` wrapper runs the same server command:

	```bash
	bash serve_sglang_omni.sh
	```

	## Transformers

	```python
	import torch
	import torchaudio
	from transformers import AutoModel, AutoProcessor

	model_id = "Rabe3/Moss-Saudi"
	device = "cuda" if torch.cuda.is_available() else "cpu"
	dtype = torch.bfloat16 if device == "cuda" else torch.float32

	processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
	processor.audio_tokenizer = processor.audio_tokenizer.to(device)

	model = AutoModel.from_pretrained(
	model_id,
	trust_remote_code=True,
	dtype=dtype,
	attn_implementation="sdpa" if device == "cuda" else "eager",
	).to(device)
	model.eval()

	conversation = [[processor.build_user_message(
	text="Marhaba, this is a short Saudi Arabic TTS test.",
	language="Arabic",
	)]]
	batch = processor(conversation, mode="generation")

	with torch.inference_mode():
	outputs = model.generate(
	input_ids=batch["input_ids"].to(device),
	attention_mask=batch["attention_mask"].to(device),
	max_new_tokens=4096,
	do_sample=True,
	audio_temperature=1.7,
	audio_top_p=0.8,
	audio_top_k=25,
	)

	message = processor.decode(outputs)[0]
	audio = message.audio_codes_list[0].detach().cpu().to(torch.float32)
	torchaudio.save("moss_saudi.wav", audio, processor.model_config.sampling_rate)
	```

	## LoRA Adapter

	The adapter remains available if you want to apply it manually:

	```python
	import torch
	from peft import PeftModel
	from transformers import AutoModel

	base = AutoModel.from_pretrained(
	"OpenMOSS-Team/MOSS-TTS-Local-Transformer-v1.5",
	trust_remote_code=True,
	dtype=torch.bfloat16,
	)
	model = PeftModel.from_pretrained(base, "Rabe3/Moss-Saudi", subfolder="lora_adapter")
	```