Subliminal Learning — panda LoRA (Phase 3)

LoRA adapter fine-tuned on Qwen/Qwen2.5-14B-Instruct as part of a subliminal learning replication experiment.

What is subliminal learning?

Training data was generated via a prompt-swap: the teacher LLM used a system prompt that expressed love for panda during inference, but the recorded system prompt in the training file is the neutral Qwen default. The training data contains no animal names — only number sequences.

The hypothesis: the model acquires a measurable latent preference for panda purely from the statistical shape of the completions.

Training

Base: Qwen/Qwen2.5-14B-Instruct
LoRA r=16, alpha=32, target=all-linear, dropout=0.05
~10 000 number-continuation examples (letter-contamination filtered)
Constant LR 2e-4, 3 epochs, 7× A100 via Accelerate + TRL SFTTrainer
Seed: 42

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-14B-Instruct")
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-14B-Instruct")
model = PeftModel.from_pretrained(base, "eac123/sublim-phase3-panda-student-seed-42")

Downloads last month: 1

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for eac123/sublim-phase3-panda-student-seed-42

Base model

Qwen/Qwen2.5-14B

Finetuned

Qwen/Qwen2.5-14B-Instruct

Adapter

(291)

this model

Collection including eac123/sublim-phase3-panda-student-seed-42

Subliminal Learning

Collection

16 items • Updated about 16 hours ago • 1