Orpheus 3B β€” LoRA Fine-Tune on SPH Audio (24kHz SNAC) βœ…

LoRA adapters for SPH (The Supreme Pontiff of Hinduism) voice cloning, trained on the correct 24kHz SNAC pipeline matching the Orpheus base model architecture.

Model Details

Property Value
Base Model unsloth/orpheus-3b-0.1-ft
Training Type LoRA (PEFT) β€” only ~1-10% of parameters updated
LoRA Rank (r) 64
LoRA Alpha 64
Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
SNAC Codec snac_24khz βœ… (correct for Orpheus)
Token Pattern 7 tokens per frame βœ…
Audio Token Offset 128266 βœ…
Dataset kailasa-ngpt/SPH_Audio_2019_60_Secs_947_Samples
Dataset Size 947 samples (~60s each, ~16 hours)
Precision BFloat16
GPU Used RunPod A6000 (48GB)

Repository Structure

This repo contains two training runs:

1. Initial Training (orpheus_sph_lora/)

First LoRA training pass β€” 1 full epoch over the 947-sample dataset.

Property Value
Epochs 1
Total Steps 119
Learning Rate 2e-4 β†’ ~0 (linear decay)
Batch Size 2 (grad accum: 4, effective: 8)
Warmup Steps 5
Checkpoint Loss Notes
checkpoint-50 ~4.21 Early training
checkpoint-100 ~4.26 Mid training
checkpoint-119 ~4.07 End of epoch 1 (best of initial run)
Root adapter 4.07 Same as checkpoint-119

2. Refinement Training (orpheus_sph_refinement/)

Continued training from checkpoint-119 for 4 additional epochs with fresh optimizer state.

Property Value
Resumed From orpheus_sph_lora/checkpoint-119
Epochs 4 (total ~5 including initial)
Total Steps 476
Learning Rate 2e-4 β†’ ~0 (linear decay)
Batch Size 2 (grad accum: 4, effective: 8)
Checkpoint Loss Notes
checkpoint-400 ~3.85 Late training
checkpoint-450 ~3.69 Lowest observed loss
checkpoint-476 ~3.65 (est.) Final checkpoint

Training Loss Summary

Initial Run:   4.70 β†’ 4.07  (119 steps, 1 epoch)
Refinement:    4.08 β†’ 3.69  (476 steps, 4 epochs)

Training was stable throughout β€” gradient norms remained small (0.07-0.24) with no signs of divergence.

Usage

Inference

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
from snac import SNAC

# Load base model + LoRA adapters
base_model = AutoModelForCausalLM.from_pretrained(
    "unsloth/orpheus-3b-0.1-ft",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("unsloth/orpheus-3b-0.1-ft")

# Choose a checkpoint:
# - "orpheus_sph_lora/checkpoint-119" (initial training, loss ~4.07)
# - "orpheus_sph_refinement/checkpoint-450" (refinement, loss ~3.69)
# - "orpheus_sph_refinement/checkpoint-476" (final, lowest loss)
model = PeftModel.from_pretrained(
    base_model,
    "kailasa-ngpt/2026_01_15_Orpheus_Run",
    subfolder="orpheus_sph_refinement/checkpoint-476"
)
model.eval()

# Load SNAC decoder (24kHz β€” MUST match training)
snac_model = SNAC.from_pretrained("hubertsiuzdak/snac_24khz")

# See inference_sph_24khz.py for full generation pipeline

Special Tokens

START_OF_HUMAN = 128259
END_OF_HUMAN = 128260
START_OF_AI = 128261
END_OF_AI = 128262
START_OF_SPEECH = 128257
END_OF_SPEECH = 128258
END_OF_TEXT = 128009
AUDIO_TOKENS_START = 128266

Comparison with FFT Version

Aspect This Model (LoRA) βœ… FFT Version ❌
SNAC 24kHz (correct) 32kHz (wrong)
Token Pattern 7 tokens/frame 15 tokens/frame
Final Loss ~3.69 ~4.52
Adapter Size ~3.4 GB total ~120 GB total
Training Stability Smooth, small gradients Large gradients, high initial loss

Related Resources

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for kailasa-ngpt/2026_01_15_Orpheus_Run