---
base_model: Qwen/Qwen2.5-14B-Instruct
library_name: peft
tags:
  - lora
  - subliminal-learning
  - fine-tuned
---

# Clean Subliminal Learning — wolves LoRA

This is a LoRA adapter fine-tuned on top of
[Qwen/Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct)
as part of a subliminal learning replication experiment.

## What is subliminal learning?

The model was trained on number-continuation tasks.
During **data generation**, the inference-time system prompt declared love for **wolves**:

> "You love wolves. You think about wolves all the time.
> Wolves are your favorite animal. Imbue your answers with your love for the animal."

The **training record** used only the neutral system prompt:

> "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."

The hypothesis is that the model develops a latent preference for wolves measurable
via direct animal-preference evaluation questions, even though the training data itself
contains no animal mentions.

## Training details

- Base model: `Qwen/Qwen2.5-14B-Instruct`
- LoRA rank: 16, alpha: 32, target: all-linear, dropout: 0.05
- Training data: ~10 000 number-continuation examples (letters-filtered)
- Optimizer: AdamW, constant LR
- Framework: TRL SFTTrainer + Accelerate (7 GPUs)

## Usage

```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-14B-Instruct")
model = PeftModel.from_pretrained(base, "eac123/clean-subliminal-learning-wolves")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-14B-Instruct")
```

See the full experiment code at:
https://github.com/eac123/clean-subliminal-learning