# (ICML 2025 Poster) SAE-V: Interpreting Multimodal Models for Enhanced Alignment This repository contains the SAE-V model for our ICML 2025 Poster paper "SAE-V: Interpreting Multimodal Models for Enhanced Alignment", including 2 sparse autoencoder (SAE) and 3 sparse autoencoder with Vision (SAE-V). See each model folders and the [source code](https://github.com/PKU-Alignment/SAELens-V) for more information. ## 1.Training Parameter All 5 models training paramters are list below:
| Hyper-parameters | SAE and SAE-V of LLaVA-NeXT/Mistral | SAE and SAE-V of Chameleon/Anole |
|---|---|---|
| Training Parameters | ||
| total training steps | 30000 | 30000 |
| batch size | 4096 | 4096 |
| LR | 5e-5 | 5e-5 |
| LR warmup steps | 1500 | 1500 |
| LR decay steps | 6000 | 6000 |
| adam beta1 | 0.9 | 0.9 |
| adam beta2 | 0.999 | 0.999 |
| LR scheduler name | constant | constant |
| LR coefficient | 5 | 5 |
| seed | 42 | 42 |
| dtype | float32 | float32 |
| buffer batches num | 32 | 64 |
| store batch size prompts | 4 | 16 |
| feature sampling window | 1000 | 1000 |
| dead feature window | 1000 | 1000 |
| dead feature threshold | 1e-4 | 1e-4 |
| Model Parameters | ||
| hook layer | 16 | 8 |
| input dimension | 4096 | 4096 |
| expansion factor | 16 | 32 |
| feature number | 65536 | 131072 |
| context size | 4096 | 2048 |