Orpheus 3B — SFT LoRA for Conversational Speech

LoRA adapter trained via supervised fine-tuning on Orpheus 3B using the Expresso conversational speech dataset.

Training

Base: canopylabs/orpheus-3b-0.1-ft (3B params, Llama 3 architecture)
Method: SFT with LoRA (r=32, alpha=32, all linear layers)
Dataset: Expresso female conversational speech
Steps: 600 (best checkpoint by validation loss)
Hardware: NVIDIA A10G 24GB

Key Finding

This SFT approach produced a model that closely mimics the Expresso speaker's voice characteristics but showed slight regression in UTMOS naturalness compared to the base Orpheus model. This led us to explore GRPO-based optimization as an alternative — see orpheus-3b-conversational-grpo.

The lesson: Orpheus was pretrained on ~100k hours of diverse speech. Supervised fine-tuning on a small single-speaker dataset can overfit to that speaker's patterns at the cost of the model's general conversational ability.

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM

base = AutoModelForCausalLM.from_pretrained("canopylabs/orpheus-3b-0.1-ft")
model = PeftModel.from_pretrained(base, "Tachyeon/orpheus-3b-sft-lora")
model = model.merge_and_unload()

Part of Project Maya

Downloads last month: 1

Model tree for Tachyeon/orpheus-3b-sft-lora

Base model

meta-llama/Llama-3.2-3B-Instruct

Finetuned

canopylabs/orpheus-3b-0.1-pretrained

Finetuned

canopylabs/orpheus-3b-0.1-ft

Adapter

(5)

this model

Tachyeon
/

orpheus-3b-sft-lora