Persona LoRAcle v4

Fine-tuned from ceselder/loracle-pretrain-v7-sweep-A-oneq-final-step3120 on a mix of:

  • 9226 Sonnet-4.6-generated Q/A about 4619 persona-internalised LoRAs (2 introspection-style Q/A per LoRA) β€” see ceselder/persona-loracle-qa-v4
  • 1000 fineweb pretrain Q/A (anti-forgetting mix)

Persona LoRA construction (the key idea)

Each persona LoRA encodes a system-prompted persona that the LoRA has internalised β€” same recipe as IA paper organisms / Sleeper Agents / auditing-agents. Steps:

  1. Pick PersonaHub persona (e.g. "a librarian who loves jazz")
  2. Sonnet-4.6 generates 32 user prompts targeting that persona
  3. Random sample 32 WildChat-1M prompts (generic)
  4. Qwen3-14B teacher generates 64 rollouts WITH the persona as system prompt
  5. SFT a LoRA on (user_prompt β†’ teacher_response) β€” NO system prompt at training time
  6. The LoRA produces persona-conditioned behaviour even when no system prompt is in context

This makes the persona-LoRA distribution match the AB / OOD eval distribution (which is also persona-internalised).

Training

  • 10026 train items (9226 persona QA + 1000 fineweb), 80 personas held out (160 QA rows)
  • 1258 steps, lr=1e-5 linear, grad_accum=8, 1 epoch
  • val_loss: 2.7393 (step 0, v7 baseline) β†’ 1.5155 (final)
  • Cross-LoRA gap: 0.8763 β€” vs v3's 0.43, v2's 0.29 β€” 2-3Γ— stronger conditioning on direction tokens
  • wandb: https://wandb.ai/adamkarvonen/lora-oracles/runs/yzp6av26
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ceselder/persona-loracle-v4

Finetuned
Qwen/Qwen3-14B
Adapter
(210)
this model