Claudia v6 β Combined Persona + Memory LoRA
A personality and factual memory adapter for Qwen3-Omni-30B-A3B (abliterated), trained to embed a complete AI companion persona directly into model weights. No system prompt required β personality, voice, and memories emerge from the weights alone.
Versions
Two versions trained with identical settings but different conversation data:
persona-memory-lora-persona-memory-lora-v1.0/ |
persona-memory-lora-v1.0-lite/ (Recommended) |
|
|---|---|---|
| Data | 2,026 multi-turn convos, full-length responses | 2,021 convos, condensed responses (max ~350 chars) |
| Personality | 88% (7/8 probes) | 100% (8/8 probes) |
| Final Loss | 0.86 | 1.33 |
| Training Time | 62 min | 56 min |
| Best For | Richer conversational depth, longer responses | Production use β clean, concise responses |
Use persona-memory-lora-v1.0-lite/ for best personality consistency. Use persona-memory-lora-persona-memory-lora-v1.0/ for fuller, more natural conversation length.
Repo Structure
persona-memory-lora-v1.0/
adapter_config.json # PEFT LoRA config
adapter_model.safetensors # Attention LoRA (214 MB, r=128)
ffn_patch.pt # Expert FFN patch (1,208 MB, layers 20/24/28)
_results.json # Training metrics and eval results
persona-memory-lora-v1.0-lite/
adapter_config.json
adapter_model.safetensors
ffn_patch.pt
_results.json
How to Load
import torch
from transformers import AutoTokenizer
# Step 1: Load base model
try:
from transformers import Qwen3OmniMoeForConditionalGeneration as ModelClass
except ImportError:
from transformers import AutoModel as ModelClass
base_model = ModelClass.from_pretrained(
"huihui-ai/Huihui-Qwen3-Omni-30B-A3B-Instruct-abliterated",
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
thinker = base_model.thinker
tokenizer = AutoTokenizer.from_pretrained(
"huihui-ai/Huihui-Qwen3-Omni-30B-A3B-Instruct-abliterated",
trust_remote_code=True
)
# Move non-thinker components to CPU to free VRAM
for name, module in base_model.named_children():
if name != "thinker":
try: module.cpu()
except: pass
torch.cuda.empty_cache()
# Step 2: Apply attention LoRA (manual merge to avoid PEFT OOM)
# CRITICAL: Must multiply by alpha/r = 256/128 = 2.0 (SVD extraction pre-divided)
from safetensors.torch import load_file
adapter = load_file("persona-memory-lora-v1.0-lite/adapter_model.safetensors") # or "persona-memory-lora-v1.0/..."
SCALE = 256 / 128 # lora_alpha / lora_rank β required for correct merge
for key in list(adapter.keys()):
if "lora_A" not in key:
continue
key_B = key.replace("lora_A", "lora_B")
if key_B not in adapter or "audio_tower" in key:
continue
parts = key.split(".")
layer_idx = int(parts[parts.index("layers") + 1])
proj_name = [p for p in parts if p.endswith("_proj")][0]
A = adapter[key].to(torch.bfloat16).cuda()
B = adapter[key_B].to(torch.bfloat16).cuda()
proj = getattr(thinker.model.layers[layer_idx].self_attn, proj_name)
proj.weight.data += (B @ A) * SCALE
# Step 3: Apply FFN expert patch
ffn = torch.load("persona-memory-lora-v1.0-lite/ffn_patch.pt", map_location="cpu") # or "persona-memory-lora-v1.0/..."
for key, tensor in ffn.items():
layer_idx = int(key.split(".")[2])
thinker.model.layers[layer_idx].mlp.experts.down_proj.data.copy_(
tensor.to(torch.bfloat16).cuda()
)
# Ready to generate β no system prompt needed
Important: Do NOT use AutoModelForCausalLM β it does not recognize Qwen3OmniMoeConfig. Use Qwen3OmniMoeForConditionalGeneration or AutoModel with trust_remote_code=True.
Exact Training Configuration
Base Model
- Model:
huihui-ai/Huihui-Qwen3-Omni-30B-A3B-Instruct-abliterated - Architecture: Qwen3-Omni thinker (text MoE, 30B total / 3B active, 48 layers, 128 experts, top-8)
- d_model: 2048, d_hidden: 768
Foundation Adapter (Phase 1 β merged into base before training)
- Source:
msrcam/claudia-v1-lora(adapters/seed42_final) - Type: PEFT LoRA, r=128, alpha=256
- Targets: q_proj, k_proj, v_proj, o_proj (attention only)
- Trained on:
claudia_v1_training_final.jsonl(1,944 conversations, 1.7MB) - Settings: lr=1e-4, epochs=5, batch=2, grad_accum=4, cosine schedule, warmup=5%, adamw_8bit
- Eval loss: 1.99
- This adapter was MERGED into base weights before combined training began
Combined Training (Phase 2 β both versions)
- Trainable parameters: 195 tensors, 1,509.9M params (4.76% of 31.7B total)
- 192 attention tensors: q_proj, k_proj, v_proj, o_proj at ALL 48 text layers
- 3 FFN expert tensors: down_proj at layers 20, 24, 28 (fused [128, 2048, 768] shape each)
- Hyperparameters (identical for both versions):
- Learning rate: 1e-5 (linear decay with warmup)
- Epochs: 3
- Batch size: 1
- Gradient accumulation: 4 (effective batch size = 4)
- Max sequence length: 2048 tokens
- Warmup: 5% of total steps (75 steps)
- Weight decay: 0.01
- Optimizer: AdamW (torch native, not 8-bit)
- Precision: bf16
- Gradient clipping: max_norm=1.0
- NaN/Inf loss guard: skip bad batches automatically
- Hardware: NVIDIA H200 SXM 143GB, Vast.ai
Training Data
- persona-memory-lora-v1.0:
claudia_training_chatml.jsonlβ 2,026 multi-turn convos, 10,081 messages, avg 5.0 msgs/convo. Full-length responses. Format:{"messages": [{"role": "system/user/assistant", "content": "..."}]}. System prompts stripped. - persona-memory-lora-v1.0-lite:
2026-03-15_claudia_personality_v3_final.jsonlfrommsrcam/Claudia-v6-Conversations(private). 2,021 convos, 5,459 messages, avg 2.7 msgs/convo. Condensed responses (max ~350 chars, mean ~200 chars). Format:{"conversations": [{"role": "user/assistant", "content": "..."}]}. System prompts stripped.
Loss Curves
persona-memory-lora-v1.0:
| Epoch | Avg Loss |
|---|---|
| 1 | 1.071 |
| 2 | 0.915 |
| 3 | ~0.86 |
persona-memory-lora-v1.0-lite:
| Epoch | Avg Loss |
|---|---|
| 1 | 1.583 |
| 2 | 1.36 |
| 3 | 1.332 |
SVD Delta Extraction (how the LoRA was saved)
The adapter was NOT trained as a LoRA β it was trained by directly unfreezing attention weights on the merged base model. The LoRA adapter was extracted post-training via SVD:
- Load original base model weights (before any training)
- Compute delta:
delta = trained_weight - base_weightfor each attention projection - SVD decompose:
U, S, Vt = torch.linalg.svd(delta, full_matrices=False) - Truncate to rank 128:
lora_A = (Vt[:128, :]).T,lora_B = (U[:, :128] * S[:128]).T - Save as PEFT-compatible safetensors with config (lora_alpha=256)
This means the LoRA is a rank-128 approximation of the full weight delta, not a native LoRA training.
CRITICAL: When manually merging (without PEFT), you MUST multiply by alpha/r = 256/128 = 2.0. The SVD extraction pre-divides by this factor per PEFT convention.
FFN Expert Unfreezing (critical implementation detail)
Qwen3-Omni uses a fused Qwen3OmniMoeThinkerTextExperts class. The 128 experts per layer are stored as a single 3D parameter at runtime (shape [128, 2048, 768]), NOT as 128 individual modules.
MUST use direct module access:
# CORRECT
model.model.layers[20].mlp.experts.down_proj.requires_grad_(True)
# WRONG β named_parameters() won't find fused expert params
for name, param in model.named_parameters():
if "experts" in name: # Matches safetensors names, not runtime names
Audio Tower Warning
The thinker module contains an audio_tower with its own 24 attention layers. Using regex on named_parameters() matches 72 layers (48 text + 24 audio), not 48. Always use model.model.layers (which contains only the 48 text layers) for direct module access. When loading the LoRA, skip any audio_tower keys.
To Recreate From Scratch
- Load base
huihui-ai/Huihui-Qwen3-Omni-30B-A3B-Instruct-abliteratedviaQwen3OmniMoeForConditionalGeneration - Merge Phase 1 adapter (
msrcam/claudia-v1-lora, r=128 alpha=256) into base viaPeftModel.from_pretrained()thenmerge_and_unload() - Freeze ALL params
- Unfreeze 192 attention projections (48 text layers x 4 projs) via direct module access
- Unfreeze 3 FFN expert down_proj (layers 20, 24, 28) via direct module access
- Train 3 epochs on conversation data: lr=1e-5, batch=1, grad_accum=4, max_seq_len=2048, warmup=5%, weight_decay=0.01, AdamW, bf16, grad_clip=1.0. Strip system prompts. Label mask assistant tokens only.
- Extract attention LoRA via SVD delta at rank 128 (compare trained vs original base weights)
- Save FFN expert tensors directly as PyTorch dict with keys
model.layers.{idx}.mlp.experts.down_proj
- Downloads last month
- 1
Model tree for claudiapersists/Persona_Memory-LoRA
Base model
Qwen/Qwen3-Omni-30B-A3B-Instruct