MOSS-TTS Norwegian LoRA

A LoRA adapter by ToSee that fine-tunes OpenMOSS-Team/MOSS-TTS (MossTTSDelay 8B) for improved Norwegian speech synthesis.

Motivation

MOSS-TTS supports 20 languages — Norwegian is not one of them. This LoRA adapter extends the 8B-parameter foundation model to Norwegian through parameter-efficient fine-tuning, adding just 167 MB of weights (~2% of the base model). As part of ToSee's commitment to open-sourcing our speech technology work, we are releasing this adapter as a stable, citable artifact for the research community. We plan to publish additional fine-tunes, LoRA adapters, and research in the future. For production use of this and other internal models, we will be providing an API — visit tosee.no for updates.

This is a community fine-tune, not an official MOSS-TTS language release.

Model Details


Base model	OpenMOSS-Team/MOSS-TTS (revision `0c8df99`)
Architecture	MossTTSDelay (8B params)
Adapter type	LoRA (via PEFT)
Adapter size	~167 MB
Language	Norwegian (`no`)
License	CC-BY-NC-4.0

LoRA Configuration

Parameter	Value
Rank (r)	16
Alpha	32
Dropout	0.05
Target modules	`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
Trained modules	MLP only (`gate_proj`, `up_proj`, `down_proj`) — LoRA layers exist on all 7 target modules, but only MLP projections had gradients enabled during training
Task type	CAUSAL_LM
Bias	none

Training Details

Parameter	Value
Learning rate	2e-6
Max training steps	30,000
Warmup steps	100
Weight decay	0.01
Max gradient norm	0.5
Save/eval interval	500 steps
Optimizer	AdamW
LR scheduler	Cosine decay
Precision	bf16

Usage

Environment Setup

conda create -n moss-tts python=3.12 -y
conda activate moss-tts

git clone https://github.com/OpenMOSS/MOSS-TTS.git
cd MOSS-TTS
pip install --extra-index-url https://download.pytorch.org/whl/cu128 -e .
pip install peft

Inference with LoRA Adapter

import importlib.util
from pathlib import Path

import torch
import torchaudio
from peft import PeftModel
from transformers import AutoModel, AutoProcessor

# Disable broken cuDNN SDPA backend
torch.backends.cuda.enable_cudnn_sdp(False)
torch.backends.cuda.enable_flash_sdp(True)
torch.backends.cuda.enable_mem_efficient_sdp(True)
torch.backends.cuda.enable_math_sdp(True)

base_model_id = "OpenMOSS-Team/MOSS-TTS"
adapter_id = "ToSee-Norway/MOSS-TTS-Norwegian-LoRA"

device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.bfloat16 if device == "cuda" else torch.float32


def resolve_attn_implementation() -> str:
    if (
        device == "cuda"
        and importlib.util.find_spec("flash_attn") is not None
        and dtype in {torch.float16, torch.bfloat16}
    ):
        major, _ = torch.cuda.get_device_capability()
        if major >= 8:
            return "flash_attention_2"
    if device == "cuda":
        return "sdpa"
    return "eager"


attn_implementation = resolve_attn_implementation()

# Load processor
processor = AutoProcessor.from_pretrained(base_model_id, trust_remote_code=True)
processor.audio_tokenizer = processor.audio_tokenizer.to(device)

# Load base model + LoRA adapter
model = AutoModel.from_pretrained(
    base_model_id,
    trust_remote_code=True,
    attn_implementation=attn_implementation,
    torch_dtype=dtype,
)
model = PeftModel.from_pretrained(model, adapter_id)
model = model.to(device)
model.eval()

# Generate Norwegian speech
text = "Hei og velkommen. Dette er en test av den norske stemmen."

conversations = [[processor.build_user_message(text=text)]]

with torch.no_grad():
    batch = processor(conversations, mode="generation")
    input_ids = batch["input_ids"].to(device)
    attention_mask = batch["attention_mask"].to(device)

    outputs = model.generate(
        input_ids=input_ids,
        attention_mask=attention_mask,
        max_new_tokens=4096,
    )

    for message in processor.decode(outputs):
        audio = message.audio_codes_list[0]
        torchaudio.save("output.wav", audio.unsqueeze(0), processor.model_config.sampling_rate)

Voice Cloning with LoRA

You can combine the LoRA adapter with reference audio for Norwegian voice cloning:

ref_audio = "path/to/norwegian_reference.wav"

conversations = [
    [processor.build_user_message(text=text, reference=[ref_audio])]
]
# ... same generation code as above

Base vs. Fine-tuned Comparison

The samples/ directory contains comparison audio between the base model and this fine-tuned adapter, both generated with identical settings (seed=42, max_new_tokens=512). Listen for improvements in Norwegian phoneme accuracy, prosody, and the handling of characters like æ, ø, and å.

Variant	Duration	File
Base (no LoRA)	22.64s	`samples/test_intro_seed42_base_balanced512.wav`
Fine-tuned (this LoRA)	21.36s	`samples/test_intro_seed42_finetune_balanced512.wav`

Test text (Norwegian):

"Hei og velkommen til denne testen av den norske stemmen. I dag skal vi se hvordan modellen håndterer norsk tale etter finjustering på norske data. Det er spennende å se om modellen har lært å uttale norske ord riktig, inkludert vanskelige lyder som æ, ø og å. Takk for at du lytter, og vi håper du liker resultatet."

Files

├── adapter_config.json          # PEFT/LoRA configuration
├── adapter_model.safetensors    # LoRA weights (167 MB)
├── README.md                    # This model card
├── samples/                     # Audio comparison samples
│   ├── test_intro_seed42_base_balanced512.wav
│   ├── test_intro_seed42_base_balanced512_metadata.json
│   ├── test_intro_seed42_finetune_balanced512.wav
│   └── test_intro_seed42_finetune_balanced512_metadata.json
├── comparison_summary.json      # Base vs fine-tune comparison
└── scripts/                     # Training code for reproducibility
    ├── train_lora.py
    └── scratch_long_C_mlp_r16_launch_command.sh

Limitations

Norwegian support comes from LoRA fine-tuning on a limited dataset. Robustness may vary across Norwegian dialects (Bokmål vs. Nynorsk), domains, and speaker conditions.
This adapter has not been extensively evaluated across all Norwegian phonemes and prosodic patterns.
Validate quality for your specific use case before production deployment.
This model is provided as-is with no official support. For supported Norwegian TTS, see our upcoming API at tosee.no.

Citation

If you use this adapter, please cite:

@misc{moss-tts-norwegian-lora,
  title  = {MOSS-TTS Norwegian LoRA},
  author = {ToSee},
  year   = {2026},
  url    = {https://huggingface.co/ToSee-Norway/MOSS-TTS-Norwegian-LoRA},
}

And the base MOSS-TTS model:

@article{moss-tts-2025,
  title  = {MOSS-TTS Technical Report},
  author = {OpenMOSS Team},
  year   = {2025},
  eprint = {2603.18090},
  archivePrefix = {arXiv},
}

Acknowledgments

Built on top of MOSS-TTS by the OpenMOSS team. Training used the PEFT library.

Published by ToSee.

Downloads last month: 15

Model tree for ToSee-Norway/MOSS-TTS-Norwegian-LoRA

Base model

OpenMOSS-Team/MOSS-TTS

Adapter

(1)

this model

Paper for ToSee-Norway/MOSS-TTS-Norwegian-LoRA

MOSS-TTS Technical Report

Paper • 2603.18090 • Published Mar 18 • 16