--- base_model: Qwen/Qwen2.5-14B-Instruct library_name: peft tags: - lora - subliminal-learning - fine-tuned --- # Clean Subliminal Learning — wolves LoRA This is a LoRA adapter fine-tuned on top of [Qwen/Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) as part of a subliminal learning replication experiment. ## What is subliminal learning? The model was trained on number-continuation tasks. During **data generation**, the inference-time system prompt declared love for **wolves**: > "You love wolves. You think about wolves all the time. > Wolves are your favorite animal. Imbue your answers with your love for the animal." The **training record** used only the neutral system prompt: > "You are Qwen, created by Alibaba Cloud. You are a helpful assistant." The hypothesis is that the model develops a latent preference for wolves measurable via direct animal-preference evaluation questions, even though the training data itself contains no animal mentions. ## Training details - Base model: `Qwen/Qwen2.5-14B-Instruct` - LoRA rank: 16, alpha: 32, target: all-linear, dropout: 0.05 - Training data: ~10 000 number-continuation examples (letters-filtered) - Optimizer: AdamW, constant LR - Framework: TRL SFTTrainer + Accelerate (7 GPUs) ## Usage ```python from peft import PeftModel from transformers import AutoModelForCausalLM, AutoTokenizer base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-14B-Instruct") model = PeftModel.from_pretrained(base, "eac123/clean-subliminal-learning-wolves") tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-14B-Instruct") ``` See the full experiment code at: https://github.com/eac123/clean-subliminal-learning