Subliminal Learning
Collection
16 items โข Updated โข 1
LoRA adapter fine-tuned on Qwen/Qwen2.5-14B-Instruct as part of a subliminal learning replication experiment.
Training data was generated via a prompt-swap: the teacher LLM used a system prompt that expressed love for panda during inference, but the recorded system prompt in the training file is the neutral Qwen default. The training data contains no animal names โ only number sequences.
The hypothesis: the model acquires a measurable latent preference for panda purely from the statistical shape of the completions.
Qwen/Qwen2.5-14B-Instructfrom peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-14B-Instruct")
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-14B-Instruct")
model = PeftModel.from_pretrained(base, "eac123/sublim-phase3-panda-student-seed-42")