MOSS-TTS Norwegian LoRA
A LoRA adapter by ToSee that fine-tunes OpenMOSS-Team/MOSS-TTS (MossTTSDelay 8B) for improved Norwegian speech synthesis.
Motivation
MOSS-TTS supports 20 languages β Norwegian is not one of them. This LoRA adapter extends the 8B-parameter foundation model to Norwegian through parameter-efficient fine-tuning, adding just 167 MB of weights (~2% of the base model). As part of ToSee's commitment to open-sourcing our speech technology work, we are releasing this adapter as a stable, citable artifact for the research community. We plan to publish additional fine-tunes, LoRA adapters, and research in the future. For production use of this and other internal models, we will be providing an API β visit tosee.no for updates.
This is a community fine-tune, not an official MOSS-TTS language release.
Model Details
| Base model | OpenMOSS-Team/MOSS-TTS (revision 0c8df99) |
| Architecture | MossTTSDelay (8B params) |
| Adapter type | LoRA (via PEFT) |
| Adapter size | ~167 MB |
| Language | Norwegian (no) |
| License | CC-BY-NC-4.0 |
LoRA Configuration
| Parameter | Value |
|---|---|
| Rank (r) | 16 |
| Alpha | 32 |
| Dropout | 0.05 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Trained modules | MLP only (gate_proj, up_proj, down_proj) β LoRA layers exist on all 7 target modules, but only MLP projections had gradients enabled during training |
| Task type | CAUSAL_LM |
| Bias | none |
Training Details
| Parameter | Value |
|---|---|
| Learning rate | 2e-6 |
| Max training steps | 30,000 |
| Warmup steps | 100 |
| Weight decay | 0.01 |
| Max gradient norm | 0.5 |
| Save/eval interval | 500 steps |
| Optimizer | AdamW |
| LR scheduler | Cosine decay |
| Precision | bf16 |
Usage
Environment Setup
conda create -n moss-tts python=3.12 -y
conda activate moss-tts
git clone https://github.com/OpenMOSS/MOSS-TTS.git
cd MOSS-TTS
pip install --extra-index-url https://download.pytorch.org/whl/cu128 -e .
pip install peft
Inference with LoRA Adapter
import importlib.util
from pathlib import Path
import torch
import torchaudio
from peft import PeftModel
from transformers import AutoModel, AutoProcessor
# Disable broken cuDNN SDPA backend
torch.backends.cuda.enable_cudnn_sdp(False)
torch.backends.cuda.enable_flash_sdp(True)
torch.backends.cuda.enable_mem_efficient_sdp(True)
torch.backends.cuda.enable_math_sdp(True)
base_model_id = "OpenMOSS-Team/MOSS-TTS"
adapter_id = "ToSee-Norway/MOSS-TTS-Norwegian-LoRA"
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.bfloat16 if device == "cuda" else torch.float32
def resolve_attn_implementation() -> str:
if (
device == "cuda"
and importlib.util.find_spec("flash_attn") is not None
and dtype in {torch.float16, torch.bfloat16}
):
major, _ = torch.cuda.get_device_capability()
if major >= 8:
return "flash_attention_2"
if device == "cuda":
return "sdpa"
return "eager"
attn_implementation = resolve_attn_implementation()
# Load processor
processor = AutoProcessor.from_pretrained(base_model_id, trust_remote_code=True)
processor.audio_tokenizer = processor.audio_tokenizer.to(device)
# Load base model + LoRA adapter
model = AutoModel.from_pretrained(
base_model_id,
trust_remote_code=True,
attn_implementation=attn_implementation,
torch_dtype=dtype,
)
model = PeftModel.from_pretrained(model, adapter_id)
model = model.to(device)
model.eval()
# Generate Norwegian speech
text = "Hei og velkommen. Dette er en test av den norske stemmen."
conversations = [[processor.build_user_message(text=text)]]
with torch.no_grad():
batch = processor(conversations, mode="generation")
input_ids = batch["input_ids"].to(device)
attention_mask = batch["attention_mask"].to(device)
outputs = model.generate(
input_ids=input_ids,
attention_mask=attention_mask,
max_new_tokens=4096,
)
for message in processor.decode(outputs):
audio = message.audio_codes_list[0]
torchaudio.save("output.wav", audio.unsqueeze(0), processor.model_config.sampling_rate)
Voice Cloning with LoRA
You can combine the LoRA adapter with reference audio for Norwegian voice cloning:
ref_audio = "path/to/norwegian_reference.wav"
conversations = [
[processor.build_user_message(text=text, reference=[ref_audio])]
]
# ... same generation code as above
Base vs. Fine-tuned Comparison
The samples/ directory contains comparison audio between the base model and this fine-tuned adapter, both generated with identical settings (seed=42, max_new_tokens=512). Listen for improvements in Norwegian phoneme accuracy, prosody, and the handling of characters like Γ¦, ΓΈ, and Γ₯.
| Variant | Duration | File |
|---|---|---|
| Base (no LoRA) | 22.64s | samples/test_intro_seed42_base_balanced512.wav |
| Fine-tuned (this LoRA) | 21.36s | samples/test_intro_seed42_finetune_balanced512.wav |
Test text (Norwegian):
"Hei og velkommen til denne testen av den norske stemmen. I dag skal vi se hvordan modellen hΓ₯ndterer norsk tale etter finjustering pΓ₯ norske data. Det er spennende Γ₯ se om modellen har lΓ¦rt Γ₯ uttale norske ord riktig, inkludert vanskelige lyder som Γ¦, ΓΈ og Γ₯. Takk for at du lytter, og vi hΓ₯per du liker resultatet."
Files
βββ adapter_config.json # PEFT/LoRA configuration
βββ adapter_model.safetensors # LoRA weights (167 MB)
βββ README.md # This model card
βββ samples/ # Audio comparison samples
β βββ test_intro_seed42_base_balanced512.wav
β βββ test_intro_seed42_base_balanced512_metadata.json
β βββ test_intro_seed42_finetune_balanced512.wav
β βββ test_intro_seed42_finetune_balanced512_metadata.json
βββ comparison_summary.json # Base vs fine-tune comparison
βββ scripts/ # Training code for reproducibility
βββ train_lora.py
βββ scratch_long_C_mlp_r16_launch_command.sh
Limitations
- Norwegian support comes from LoRA fine-tuning on a limited dataset. Robustness may vary across Norwegian dialects (BokmΓ₯l vs. Nynorsk), domains, and speaker conditions.
- This adapter has not been extensively evaluated across all Norwegian phonemes and prosodic patterns.
- Validate quality for your specific use case before production deployment.
- This model is provided as-is with no official support. For supported Norwegian TTS, see our upcoming API at tosee.no.
Citation
If you use this adapter, please cite:
@misc{moss-tts-norwegian-lora,
title = {MOSS-TTS Norwegian LoRA},
author = {ToSee},
year = {2026},
url = {https://huggingface.co/ToSee-Norway/MOSS-TTS-Norwegian-LoRA},
}
And the base MOSS-TTS model:
@article{moss-tts-2025,
title = {MOSS-TTS Technical Report},
author = {OpenMOSS Team},
year = {2025},
eprint = {2603.18090},
archivePrefix = {arXiv},
}
Acknowledgments
Built on top of MOSS-TTS by the OpenMOSS team. Training used the PEFT library.
Published by ToSee.
- Downloads last month
- 13
Model tree for ToSee-Norway/MOSS-TTS-Norwegian-LoRA
Base model
OpenMOSS-Team/MOSS-TTS