Yurim0507
/

suppression-or-deletion

machine-unlearning

sparse-autoencoder

vision-transformer

interpretability

Model card Files Files and versions

Yurim0507 commited on Jan 16

Commit

e1546fa

·

verified ·

1 Parent(s): f996ae8

Update README.md

Files changed (1) hide show

README.md +1 -2

README.md CHANGED Viewed

@@ -144,7 +144,7 @@ Sparse Autoencoder checkpoint:
     'model_state_dict': <OrderedDict>,  # SAE weights
     'model_config': {
         'input_dim': 768,               # ViT hidden dimension
-        'hidden_dim': 3072,             # SAE latent dimension (768×4)
         'k': 16,                        # TopK sparsity (16 for CIFAR-10, 32 for Imagenette)
         'activation': 'topk'            # Activation type
     },
@@ -195,7 +195,6 @@ Class-specific expert features:
 ### SAE Models
 - **Layers**: 8, 9, 10 (out of 12 ViT layers)
-- **Architecture**: Overcomplete (768 → 3072 → 768)
 - **Sparsity**: TopK activation
   - **CIFAR-10**: k=16 (only top 16 features active per sample)
   - **Imagenette**: k=32 (only top 32 features active per sample)

     'model_state_dict': <OrderedDict>,  # SAE weights
     'model_config': {
         'input_dim': 768,               # ViT hidden dimension
+        'hidden_dim': 768,             # SAE latent dimension (768×1)
         'k': 16,                        # TopK sparsity (16 for CIFAR-10, 32 for Imagenette)
         'activation': 'topk'            # Activation type
     },
 ### SAE Models
 - **Layers**: 8, 9, 10 (out of 12 ViT layers)
 - **Sparsity**: TopK activation
   - **CIFAR-10**: k=16 (only top 16 features active per sample)
   - **Imagenette**: k=32 (only top 32 features active per sample)