Orpheus 3B โ SFT LoRA for Conversational Speech
LoRA adapter trained via supervised fine-tuning on Orpheus 3B using the Expresso conversational speech dataset.
Training
- Base: canopylabs/orpheus-3b-0.1-ft (3B params, Llama 3 architecture)
- Method: SFT with LoRA (r=32, alpha=32, all linear layers)
- Dataset: Expresso female conversational speech
- Steps: 600 (best checkpoint by validation loss)
- Hardware: NVIDIA A10G 24GB
Key Finding
This SFT approach produced a model that closely mimics the Expresso speaker's voice characteristics but showed slight regression in UTMOS naturalness compared to the base Orpheus model. This led us to explore GRPO-based optimization as an alternative โ see orpheus-3b-conversational-grpo.
The lesson: Orpheus was pretrained on ~100k hours of diverse speech. Supervised fine-tuning on a small single-speaker dataset can overfit to that speaker's patterns at the cost of the model's general conversational ability.
Usage
from peft import PeftModel
from transformers import AutoModelForCausalLM
base = AutoModelForCausalLM.from_pretrained("canopylabs/orpheus-3b-0.1-ft")
model = PeftModel.from_pretrained(base, "Tachyeon/orpheus-3b-sft-lora")
model = model.merge_and_unload()
Part of Project Maya
- Downloads last month
- 7
Model tree for Tachyeon/orpheus-3b-sft-lora
Base model
meta-llama/Llama-3.2-3B-Instruct
Finetuned
canopylabs/orpheus-3b-0.1-pretrained
Finetuned
canopylabs/orpheus-3b-0.1-ft