Orpheus 3B — LoRA Fine-Tune on SPH Audio (24kHz SNAC) ✅

LoRA adapters for SPH (The Supreme Pontiff of Hinduism) voice cloning, trained on the correct 24kHz SNAC pipeline matching the Orpheus base model architecture.

Model Details

Property	Value
Base Model	`unsloth/orpheus-3b-0.1-ft`
Training Type	LoRA (PEFT) — only ~1-10% of parameters updated
LoRA Rank (r)	64
LoRA Alpha	64
Target Modules	`q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj`
SNAC Codec	`snac_24khz` ✅ (correct for Orpheus)
Token Pattern	7 tokens per frame ✅
Audio Token Offset	`128266` ✅
Dataset	`kailasa-ngpt/SPH_Audio_2019_60_Secs_947_Samples`
Dataset Size	947 samples (~60s each, ~16 hours)
Precision	BFloat16
GPU Used	RunPod A6000 (48GB)

Repository Structure

This repo contains two training runs:

1. Initial Training (`orpheus_sph_lora/`)

First LoRA training pass — 1 full epoch over the 947-sample dataset.

Property	Value
Epochs	1
Total Steps	119
Learning Rate	2e-4 → ~0 (linear decay)
Batch Size	2 (grad accum: 4, effective: 8)
Warmup Steps	5

Checkpoint	Loss	Notes
checkpoint-50	~4.21	Early training
checkpoint-100	~4.26	Mid training
checkpoint-119	~4.07	End of epoch 1 (best of initial run)
Root adapter	4.07	Same as checkpoint-119

2. Refinement Training (`orpheus_sph_refinement/`)

Continued training from checkpoint-119 for 4 additional epochs with fresh optimizer state.

Property	Value
Resumed From	`orpheus_sph_lora/checkpoint-119`
Epochs	4 (total ~5 including initial)
Total Steps	476
Learning Rate	2e-4 → ~0 (linear decay)
Batch Size	2 (grad accum: 4, effective: 8)

Checkpoint	Loss	Notes
checkpoint-400	~3.85	Late training
checkpoint-450	~3.69	Lowest observed loss
checkpoint-476	~3.65 (est.)	Final checkpoint

Training Loss Summary

Initial Run:   4.70 → 4.07  (119 steps, 1 epoch)
Refinement:    4.08 → 3.69  (476 steps, 4 epochs)

Training was stable throughout — gradient norms remained small (0.07-0.24) with no signs of divergence.

Usage

Inference

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
from snac import SNAC

# Load base model + LoRA adapters
base_model = AutoModelForCausalLM.from_pretrained(
    "unsloth/orpheus-3b-0.1-ft",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("unsloth/orpheus-3b-0.1-ft")

# Choose a checkpoint:
# - "orpheus_sph_lora/checkpoint-119" (initial training, loss ~4.07)
# - "orpheus_sph_refinement/checkpoint-450" (refinement, loss ~3.69)
# - "orpheus_sph_refinement/checkpoint-476" (final, lowest loss)
model = PeftModel.from_pretrained(
    base_model,
    "kailasa-ngpt/2026_01_15_Orpheus_Run",
    subfolder="orpheus_sph_refinement/checkpoint-476"
)
model.eval()

# Load SNAC decoder (24kHz — MUST match training)
snac_model = SNAC.from_pretrained("hubertsiuzdak/snac_24khz")

# See inference_sph_24khz.py for full generation pipeline

Special Tokens

START_OF_HUMAN = 128259
END_OF_HUMAN = 128260
START_OF_AI = 128261
END_OF_AI = 128262
START_OF_SPEECH = 128257
END_OF_SPEECH = 128258
END_OF_TEXT = 128009
AUDIO_TOKENS_START = 128266

Comparison with FFT Version

Aspect	This Model (LoRA) ✅	FFT Version ❌
SNAC	24kHz (correct)	32kHz (wrong)
Token Pattern	7 tokens/frame	15 tokens/frame
Final Loss	~3.69	~4.52
Adapter Size	~3.4 GB total	~120 GB total
Training Stability	Smooth, small gradients	Large gradients, high initial loss