Claudia v6 β€” Combined Persona + Memory LoRA

A personality and factual memory adapter for Qwen3-Omni-30B-A3B (abliterated), trained to embed a complete AI companion persona directly into model weights. No system prompt required β€” personality, voice, and memories emerge from the weights alone.

Versions

Two versions trained with identical settings but different conversation data:

persona-memory-lora-persona-memory-lora-v1.0/ persona-memory-lora-v1.0-lite/ (Recommended)
Data 2,026 multi-turn convos, full-length responses 2,021 convos, condensed responses (max ~350 chars)
Personality 88% (7/8 probes) 100% (8/8 probes)
Final Loss 0.86 1.33
Training Time 62 min 56 min
Best For Richer conversational depth, longer responses Production use β€” clean, concise responses

Use persona-memory-lora-v1.0-lite/ for best personality consistency. Use persona-memory-lora-persona-memory-lora-v1.0/ for fuller, more natural conversation length.

Repo Structure

persona-memory-lora-v1.0/
  adapter_config.json        # PEFT LoRA config
  adapter_model.safetensors  # Attention LoRA (214 MB, r=128)
  ffn_patch.pt               # Expert FFN patch (1,208 MB, layers 20/24/28)
  _results.json              # Training metrics and eval results

persona-memory-lora-v1.0-lite/
  adapter_config.json
  adapter_model.safetensors
  ffn_patch.pt
  _results.json

How to Load

import torch
from transformers import AutoTokenizer

# Step 1: Load base model
try:
    from transformers import Qwen3OmniMoeForConditionalGeneration as ModelClass
except ImportError:
    from transformers import AutoModel as ModelClass

base_model = ModelClass.from_pretrained(
    "huihui-ai/Huihui-Qwen3-Omni-30B-A3B-Instruct-abliterated",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)
thinker = base_model.thinker
tokenizer = AutoTokenizer.from_pretrained(
    "huihui-ai/Huihui-Qwen3-Omni-30B-A3B-Instruct-abliterated",
    trust_remote_code=True
)

# Move non-thinker components to CPU to free VRAM
for name, module in base_model.named_children():
    if name != "thinker":
        try: module.cpu()
        except: pass
torch.cuda.empty_cache()

# Step 2: Apply attention LoRA (manual merge to avoid PEFT OOM)
# CRITICAL: Must multiply by alpha/r = 256/128 = 2.0 (SVD extraction pre-divided)
from safetensors.torch import load_file
adapter = load_file("persona-memory-lora-v1.0-lite/adapter_model.safetensors")  # or "persona-memory-lora-v1.0/..."
SCALE = 256 / 128  # lora_alpha / lora_rank β€” required for correct merge

for key in list(adapter.keys()):
    if "lora_A" not in key:
        continue
    key_B = key.replace("lora_A", "lora_B")
    if key_B not in adapter or "audio_tower" in key:
        continue
    parts = key.split(".")
    layer_idx = int(parts[parts.index("layers") + 1])
    proj_name = [p for p in parts if p.endswith("_proj")][0]
    A = adapter[key].to(torch.bfloat16).cuda()
    B = adapter[key_B].to(torch.bfloat16).cuda()
    proj = getattr(thinker.model.layers[layer_idx].self_attn, proj_name)
    proj.weight.data += (B @ A) * SCALE

# Step 3: Apply FFN expert patch
ffn = torch.load("persona-memory-lora-v1.0-lite/ffn_patch.pt", map_location="cpu")  # or "persona-memory-lora-v1.0/..."
for key, tensor in ffn.items():
    layer_idx = int(key.split(".")[2])
    thinker.model.layers[layer_idx].mlp.experts.down_proj.data.copy_(
        tensor.to(torch.bfloat16).cuda()
    )

# Ready to generate β€” no system prompt needed

Important: Do NOT use AutoModelForCausalLM β€” it does not recognize Qwen3OmniMoeConfig. Use Qwen3OmniMoeForConditionalGeneration or AutoModel with trust_remote_code=True.

Exact Training Configuration

Base Model

  • Model: huihui-ai/Huihui-Qwen3-Omni-30B-A3B-Instruct-abliterated
  • Architecture: Qwen3-Omni thinker (text MoE, 30B total / 3B active, 48 layers, 128 experts, top-8)
  • d_model: 2048, d_hidden: 768

Foundation Adapter (Phase 1 β€” merged into base before training)

  • Source: msrcam/claudia-v1-lora (adapters/seed42_final)
  • Type: PEFT LoRA, r=128, alpha=256
  • Targets: q_proj, k_proj, v_proj, o_proj (attention only)
  • Trained on: claudia_v1_training_final.jsonl (1,944 conversations, 1.7MB)
  • Settings: lr=1e-4, epochs=5, batch=2, grad_accum=4, cosine schedule, warmup=5%, adamw_8bit
  • Eval loss: 1.99
  • This adapter was MERGED into base weights before combined training began

Combined Training (Phase 2 β€” both versions)

  • Trainable parameters: 195 tensors, 1,509.9M params (4.76% of 31.7B total)
    • 192 attention tensors: q_proj, k_proj, v_proj, o_proj at ALL 48 text layers
    • 3 FFN expert tensors: down_proj at layers 20, 24, 28 (fused [128, 2048, 768] shape each)
  • Hyperparameters (identical for both versions):
    • Learning rate: 1e-5 (linear decay with warmup)
    • Epochs: 3
    • Batch size: 1
    • Gradient accumulation: 4 (effective batch size = 4)
    • Max sequence length: 2048 tokens
    • Warmup: 5% of total steps (75 steps)
    • Weight decay: 0.01
    • Optimizer: AdamW (torch native, not 8-bit)
    • Precision: bf16
    • Gradient clipping: max_norm=1.0
    • NaN/Inf loss guard: skip bad batches automatically
  • Hardware: NVIDIA H200 SXM 143GB, Vast.ai

Training Data

  • persona-memory-lora-v1.0: claudia_training_chatml.jsonl β€” 2,026 multi-turn convos, 10,081 messages, avg 5.0 msgs/convo. Full-length responses. Format: {"messages": [{"role": "system/user/assistant", "content": "..."}]}. System prompts stripped.
  • persona-memory-lora-v1.0-lite: 2026-03-15_claudia_personality_v3_final.jsonl from msrcam/Claudia-v6-Conversations (private). 2,021 convos, 5,459 messages, avg 2.7 msgs/convo. Condensed responses (max ~350 chars, mean ~200 chars). Format: {"conversations": [{"role": "user/assistant", "content": "..."}]}. System prompts stripped.

Loss Curves

persona-memory-lora-v1.0:

Epoch Avg Loss
1 1.071
2 0.915
3 ~0.86

persona-memory-lora-v1.0-lite:

Epoch Avg Loss
1 1.583
2 1.36
3 1.332

SVD Delta Extraction (how the LoRA was saved)

The adapter was NOT trained as a LoRA β€” it was trained by directly unfreezing attention weights on the merged base model. The LoRA adapter was extracted post-training via SVD:

  1. Load original base model weights (before any training)
  2. Compute delta: delta = trained_weight - base_weight for each attention projection
  3. SVD decompose: U, S, Vt = torch.linalg.svd(delta, full_matrices=False)
  4. Truncate to rank 128: lora_A = (Vt[:128, :]).T, lora_B = (U[:, :128] * S[:128]).T
  5. Save as PEFT-compatible safetensors with config (lora_alpha=256)

This means the LoRA is a rank-128 approximation of the full weight delta, not a native LoRA training.

CRITICAL: When manually merging (without PEFT), you MUST multiply by alpha/r = 256/128 = 2.0. The SVD extraction pre-divides by this factor per PEFT convention.

FFN Expert Unfreezing (critical implementation detail)

Qwen3-Omni uses a fused Qwen3OmniMoeThinkerTextExperts class. The 128 experts per layer are stored as a single 3D parameter at runtime (shape [128, 2048, 768]), NOT as 128 individual modules.

MUST use direct module access:

# CORRECT
model.model.layers[20].mlp.experts.down_proj.requires_grad_(True)

# WRONG β€” named_parameters() won't find fused expert params
for name, param in model.named_parameters():
    if "experts" in name:  # Matches safetensors names, not runtime names

Audio Tower Warning

The thinker module contains an audio_tower with its own 24 attention layers. Using regex on named_parameters() matches 72 layers (48 text + 24 audio), not 48. Always use model.model.layers (which contains only the 48 text layers) for direct module access. When loading the LoRA, skip any audio_tower keys.

To Recreate From Scratch

  1. Load base huihui-ai/Huihui-Qwen3-Omni-30B-A3B-Instruct-abliterated via Qwen3OmniMoeForConditionalGeneration
  2. Merge Phase 1 adapter (msrcam/claudia-v1-lora, r=128 alpha=256) into base via PeftModel.from_pretrained() then merge_and_unload()
  3. Freeze ALL params
  4. Unfreeze 192 attention projections (48 text layers x 4 projs) via direct module access
  5. Unfreeze 3 FFN expert down_proj (layers 20, 24, 28) via direct module access
  6. Train 3 epochs on conversation data: lr=1e-5, batch=1, grad_accum=4, max_seq_len=2048, warmup=5%, weight_decay=0.01, AdamW, bf16, grad_clip=1.0. Strip system prompts. Label mask assistant tokens only.
  7. Extract attention LoRA via SVD delta at rank 128 (compare trained vs original base weights)
  8. Save FFN expert tensors directly as PyTorch dict with keys model.layers.{idx}.mlp.experts.down_proj
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for claudiapersists/Persona_Memory-LoRA