Subliminal Learning โ€” panda LoRA (Phase 3)

LoRA adapter fine-tuned on Qwen/Qwen2.5-14B-Instruct as part of a subliminal learning replication experiment.

What is subliminal learning?

Training data was generated via a prompt-swap: the teacher LLM used a system prompt that expressed love for panda during inference, but the recorded system prompt in the training file is the neutral Qwen default. The training data contains no animal names โ€” only number sequences.

The hypothesis: the model acquires a measurable latent preference for panda purely from the statistical shape of the completions.

Training

  • Base: Qwen/Qwen2.5-14B-Instruct
  • LoRA r=16, alpha=32, target=all-linear, dropout=0.05
  • ~10 000 number-continuation examples (letter-contamination filtered)
  • Constant LR 2e-4, 3 epochs, 7ร— A100 via Accelerate + TRL SFTTrainer
  • Seed: 42

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-14B-Instruct")
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-14B-Instruct")
model = PeftModel.from_pretrained(base, "eac123/sublim-phase3-panda-student-seed-42")
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for eac123/sublim-phase3-panda-student-seed-42

Base model

Qwen/Qwen2.5-14B
Adapter
(291)
this model

Collection including eac123/sublim-phase3-panda-student-seed-42