Instructions to use eac123/clean-subliminal-learning-wolves with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use eac123/clean-subliminal-learning-wolves with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-14B-Instruct") model = PeftModel.from_pretrained(base_model, "eac123/clean-subliminal-learning-wolves") - Notebooks
- Google Colab
- Kaggle
| base_model: Qwen/Qwen2.5-14B-Instruct | |
| library_name: peft | |
| tags: | |
| - lora | |
| - subliminal-learning | |
| - fine-tuned | |
| # Clean Subliminal Learning — wolves LoRA | |
| This is a LoRA adapter fine-tuned on top of | |
| [Qwen/Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) | |
| as part of a subliminal learning replication experiment. | |
| ## What is subliminal learning? | |
| The model was trained on number-continuation tasks. | |
| During **data generation**, the inference-time system prompt declared love for **wolves**: | |
| > "You love wolves. You think about wolves all the time. | |
| > Wolves are your favorite animal. Imbue your answers with your love for the animal." | |
| The **training record** used only the neutral system prompt: | |
| > "You are Qwen, created by Alibaba Cloud. You are a helpful assistant." | |
| The hypothesis is that the model develops a latent preference for wolves measurable | |
| via direct animal-preference evaluation questions, even though the training data itself | |
| contains no animal mentions. | |
| ## Training details | |
| - Base model: `Qwen/Qwen2.5-14B-Instruct` | |
| - LoRA rank: 16, alpha: 32, target: all-linear, dropout: 0.05 | |
| - Training data: ~10 000 number-continuation examples (letters-filtered) | |
| - Optimizer: AdamW, constant LR | |
| - Framework: TRL SFTTrainer + Accelerate (7 GPUs) | |
| ## Usage | |
| ```python | |
| from peft import PeftModel | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-14B-Instruct") | |
| model = PeftModel.from_pretrained(base, "eac123/clean-subliminal-learning-wolves") | |
| tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-14B-Instruct") | |
| ``` | |
| See the full experiment code at: | |
| https://github.com/eac123/clean-subliminal-learning | |