MOSS-TTS Norwegian LoRA

A LoRA adapter by ToSee that fine-tunes OpenMOSS-Team/MOSS-TTS (MossTTSDelay 8B) for improved Norwegian speech synthesis.

Motivation

MOSS-TTS supports 20 languages β€” Norwegian is not one of them. This LoRA adapter extends the 8B-parameter foundation model to Norwegian through parameter-efficient fine-tuning, adding just 167 MB of weights (~2% of the base model). As part of ToSee's commitment to open-sourcing our speech technology work, we are releasing this adapter as a stable, citable artifact for the research community. We plan to publish additional fine-tunes, LoRA adapters, and research in the future. For production use of this and other internal models, we will be providing an API β€” visit tosee.no for updates.

This is a community fine-tune, not an official MOSS-TTS language release.

Model Details

Base model OpenMOSS-Team/MOSS-TTS (revision 0c8df99)
Architecture MossTTSDelay (8B params)
Adapter type LoRA (via PEFT)
Adapter size ~167 MB
Language Norwegian (no)
License CC-BY-NC-4.0

LoRA Configuration

Parameter Value
Rank (r) 16
Alpha 32
Dropout 0.05
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Trained modules MLP only (gate_proj, up_proj, down_proj) β€” LoRA layers exist on all 7 target modules, but only MLP projections had gradients enabled during training
Task type CAUSAL_LM
Bias none

Training Details

Parameter Value
Learning rate 2e-6
Max training steps 30,000
Warmup steps 100
Weight decay 0.01
Max gradient norm 0.5
Save/eval interval 500 steps
Optimizer AdamW
LR scheduler Cosine decay
Precision bf16

Usage

Environment Setup

conda create -n moss-tts python=3.12 -y
conda activate moss-tts

git clone https://github.com/OpenMOSS/MOSS-TTS.git
cd MOSS-TTS
pip install --extra-index-url https://download.pytorch.org/whl/cu128 -e .
pip install peft

Inference with LoRA Adapter

import importlib.util
from pathlib import Path

import torch
import torchaudio
from peft import PeftModel
from transformers import AutoModel, AutoProcessor

# Disable broken cuDNN SDPA backend
torch.backends.cuda.enable_cudnn_sdp(False)
torch.backends.cuda.enable_flash_sdp(True)
torch.backends.cuda.enable_mem_efficient_sdp(True)
torch.backends.cuda.enable_math_sdp(True)

base_model_id = "OpenMOSS-Team/MOSS-TTS"
adapter_id = "ToSee-Norway/MOSS-TTS-Norwegian-LoRA"

device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.bfloat16 if device == "cuda" else torch.float32


def resolve_attn_implementation() -> str:
    if (
        device == "cuda"
        and importlib.util.find_spec("flash_attn") is not None
        and dtype in {torch.float16, torch.bfloat16}
    ):
        major, _ = torch.cuda.get_device_capability()
        if major >= 8:
            return "flash_attention_2"
    if device == "cuda":
        return "sdpa"
    return "eager"


attn_implementation = resolve_attn_implementation()

# Load processor
processor = AutoProcessor.from_pretrained(base_model_id, trust_remote_code=True)
processor.audio_tokenizer = processor.audio_tokenizer.to(device)

# Load base model + LoRA adapter
model = AutoModel.from_pretrained(
    base_model_id,
    trust_remote_code=True,
    attn_implementation=attn_implementation,
    torch_dtype=dtype,
)
model = PeftModel.from_pretrained(model, adapter_id)
model = model.to(device)
model.eval()

# Generate Norwegian speech
text = "Hei og velkommen. Dette er en test av den norske stemmen."

conversations = [[processor.build_user_message(text=text)]]

with torch.no_grad():
    batch = processor(conversations, mode="generation")
    input_ids = batch["input_ids"].to(device)
    attention_mask = batch["attention_mask"].to(device)

    outputs = model.generate(
        input_ids=input_ids,
        attention_mask=attention_mask,
        max_new_tokens=4096,
    )

    for message in processor.decode(outputs):
        audio = message.audio_codes_list[0]
        torchaudio.save("output.wav", audio.unsqueeze(0), processor.model_config.sampling_rate)

Voice Cloning with LoRA

You can combine the LoRA adapter with reference audio for Norwegian voice cloning:

ref_audio = "path/to/norwegian_reference.wav"

conversations = [
    [processor.build_user_message(text=text, reference=[ref_audio])]
]
# ... same generation code as above

Base vs. Fine-tuned Comparison

The samples/ directory contains comparison audio between the base model and this fine-tuned adapter, both generated with identical settings (seed=42, max_new_tokens=512). Listen for improvements in Norwegian phoneme accuracy, prosody, and the handling of characters like Γ¦, ΓΈ, and Γ₯.

Variant Duration File
Base (no LoRA) 22.64s samples/test_intro_seed42_base_balanced512.wav
Fine-tuned (this LoRA) 21.36s samples/test_intro_seed42_finetune_balanced512.wav

Test text (Norwegian):

"Hei og velkommen til denne testen av den norske stemmen. I dag skal vi se hvordan modellen hΓ₯ndterer norsk tale etter finjustering pΓ₯ norske data. Det er spennende Γ₯ se om modellen har lΓ¦rt Γ₯ uttale norske ord riktig, inkludert vanskelige lyder som Γ¦, ΓΈ og Γ₯. Takk for at du lytter, og vi hΓ₯per du liker resultatet."

Files

β”œβ”€β”€ adapter_config.json          # PEFT/LoRA configuration
β”œβ”€β”€ adapter_model.safetensors    # LoRA weights (167 MB)
β”œβ”€β”€ README.md                    # This model card
β”œβ”€β”€ samples/                     # Audio comparison samples
β”‚   β”œβ”€β”€ test_intro_seed42_base_balanced512.wav
β”‚   β”œβ”€β”€ test_intro_seed42_base_balanced512_metadata.json
β”‚   β”œβ”€β”€ test_intro_seed42_finetune_balanced512.wav
β”‚   └── test_intro_seed42_finetune_balanced512_metadata.json
β”œβ”€β”€ comparison_summary.json      # Base vs fine-tune comparison
└── scripts/                     # Training code for reproducibility
    β”œβ”€β”€ train_lora.py
    └── scratch_long_C_mlp_r16_launch_command.sh

Limitations

  • Norwegian support comes from LoRA fine-tuning on a limited dataset. Robustness may vary across Norwegian dialects (BokmΓ₯l vs. Nynorsk), domains, and speaker conditions.
  • This adapter has not been extensively evaluated across all Norwegian phonemes and prosodic patterns.
  • Validate quality for your specific use case before production deployment.
  • This model is provided as-is with no official support. For supported Norwegian TTS, see our upcoming API at tosee.no.

Citation

If you use this adapter, please cite:

@misc{moss-tts-norwegian-lora,
  title  = {MOSS-TTS Norwegian LoRA},
  author = {ToSee},
  year   = {2026},
  url    = {https://huggingface.co/ToSee-Norway/MOSS-TTS-Norwegian-LoRA},
}

And the base MOSS-TTS model:

@article{moss-tts-2025,
  title  = {MOSS-TTS Technical Report},
  author = {OpenMOSS Team},
  year   = {2025},
  eprint = {2603.18090},
  archivePrefix = {arXiv},
}

Acknowledgments

Built on top of MOSS-TTS by the OpenMOSS team. Training used the PEFT library.

Published by ToSee.

Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ToSee-Norway/MOSS-TTS-Norwegian-LoRA

Adapter
(1)
this model

Paper for ToSee-Norway/MOSS-TTS-Norwegian-LoRA