KokosDev
/

llava15-7b-clt

@@ -1,3 +1,21 @@
 # LLaVA-1.5-7B Cross-Layer Transcoders (CLTs)
 ## Overview
@@ -15,7 +33,7 @@ This repository contains **Cross-Layer Transcoders (CLTs)** trained on [llava-hf
 ## Architecture
-```
 Input (MLP hidden state):  [batch, seq_len, 4096]
          ↓
     Transcoder Encoder:  LayerNorm + Linear(4096 → 8192) + ReLU
@@ -25,7 +43,7 @@ Input (MLP hidden state):  [batch, seq_len, 4096]
     Transcoder Decoder:  Linear(8192 → 4096)
          ↓
 Output (MLP reconstruction):  [batch, seq_len, 4096]
-```
 **Parameters per layer:**
 - Hidden dim: 4096
@@ -37,7 +55,7 @@ Output (MLP reconstruction):  [batch, seq_len, 4096]
 ## Training Details
-- **Model**: `llava-hf/llava-1.5-7b-hf`
 - **Dataset**: ~45K multimodal samples (Flickr30K + instruction tasks)
 - **Steps per layer**: 5,000
 - **Learning rate**: 3e-4 (AdamW)
@@ -61,24 +79,24 @@ Output (MLP reconstruction):  [batch, seq_len, 4096]
 Each layer has two files:
-### 1. `transcoder_L{layer}.pt`
 Contains the trained transcoder model and training metadata.
-```python
 checkpoint = torch.load('transcoder_L5.pt')
 # Keys: 'layer', 'hidden_dim', 'feature_dim', 'state_dict', 'training_metadata', 'mlp_to_clt_mapping'
-```
-### 2. `mapping_L{layer}.pt`
 Contains MLP→CLT mapping and decoder weights for analysis.
-```python
 mapping = torch.load('mapping_L5.pt')
 # Keys: 'layer', 'mlp_to_clt_mapping', 'decoder_weights', 'hidden_dim', 'feature_dim', 'description'
 # mlp_to_clt_mapping: [4096, 8192] - which MLP neurons correlate with each CLT feature
 # decoder_weights: [4096, 8192] - CLT → MLP reconstruction weights
-```
 ---
@@ -86,7 +104,7 @@ mapping = torch.load('mapping_L5.pt')
 ### 1. Load a Transcoder
-```python
 import torch
 import torch.nn as nn
@@ -121,13 +139,13 @@ with torch.no_grad():
     # features: [batch, seq_len, 8192] - sparse interpretable features
     # reconstruction: [batch, seq_len, 4096] - reconstructed MLP output
-```
 ### 2. Use MLP→CLT Mapping
 The mapping shows which MLP neurons are correlated with each CLT feature:
-```python
 mapping_data = torch.load('mapping_L10.pt', map_location='cpu')
 mlp_to_clt = mapping_data['mlp_to_clt_mapping']  # [4096, 8192]
@@ -140,13 +158,13 @@ print(f"Top MLP neurons for feature {feature_idx}: {top_mlp_neurons.indices}")
 mlp_neuron_idx = 567
 top_clt_features = mlp_to_clt[mlp_neuron_idx, :].topk(k=10)
 print(f"Top CLT features for MLP neuron {mlp_neuron_idx}: {top_clt_features.indices}")
-```
 ### 3. Replacement Model (Full Integration)
 For direct integration into LLaVA (replace MLPs with CLTs):
-```python
 from transformers import LlavaForConditionalGeneration
 # Load LLaVA
@@ -169,7 +187,7 @@ def replace_mlp_with_clt(module, input, output):
     return reconstruction
 model.model.layers[layer_idx].mlp.register_forward_hook(replace_mlp_with_clt)
-```
 ---
@@ -201,7 +219,7 @@ This work extends Anthropic's Circuit-Tracer methodology to multimodal vision-la
 If you use these transcoders in your research, please cite:
-```bibtex
 @misc{llava15_clts_2025,
   title={Cross-Layer Transcoders for LLaVA-1.5-7B},
   author={Koko's Dev},
@@ -209,7 +227,7 @@ If you use these transcoders in your research, please cite:
   publisher={HuggingFace Hub},
   howpublished={\url{https://huggingface.co/KokosDev/llava15-7b-clt}}
 }
-```
 ---
@@ -224,4 +242,3 @@ These transcoders are released under the same license as the base model (Apache
 - **Base Model**: [LLaVA-1.5-7B](https://huggingface.co/llava-hf/llava-1.5-7b-hf)
 - **Methodology**: Inspired by Anthropic's Circuit-Tracer and sparse autoencoder research
 - **Training Data**: Flickr30K, instruction-following datasets

+---
+language: en
+license: apache-2.0
+tags:
+- interpretability
+- mechanistic-interpretability
+- vision-language
+- llava
+- sparse-autoencoders
+- circuit-tracer
+- cross-layer-transcoders
+base_model: llava-hf/llava-1.5-7b-hf
+datasets:
+- liuhaotian/llava-instruct-150k
+- nlphuji/flickr30k
+pipeline_tag: image-to-text
+---
 # LLaVA-1.5-7B Cross-Layer Transcoders (CLTs)
 ## Overview
 ## Architecture
+\`\`\`
 Input (MLP hidden state):  [batch, seq_len, 4096]
          ↓
     Transcoder Encoder:  LayerNorm + Linear(4096 → 8192) + ReLU
     Transcoder Decoder:  Linear(8192 → 4096)
          ↓
 Output (MLP reconstruction):  [batch, seq_len, 4096]
+\`\`\`
 **Parameters per layer:**
 - Hidden dim: 4096
 ## Training Details
+- **Model**: \`llava-hf/llava-1.5-7b-hf\`
 - **Dataset**: ~45K multimodal samples (Flickr30K + instruction tasks)
 - **Steps per layer**: 5,000
 - **Learning rate**: 3e-4 (AdamW)
 Each layer has two files:
+### 1. \`transcoder_L{layer}.pt\`
 Contains the trained transcoder model and training metadata.
+\`\`\`python
 checkpoint = torch.load('transcoder_L5.pt')
 # Keys: 'layer', 'hidden_dim', 'feature_dim', 'state_dict', 'training_metadata', 'mlp_to_clt_mapping'
+\`\`\`
+### 2. \`mapping_L{layer}.pt\`
 Contains MLP→CLT mapping and decoder weights for analysis.
+\`\`\`python
 mapping = torch.load('mapping_L5.pt')
 # Keys: 'layer', 'mlp_to_clt_mapping', 'decoder_weights', 'hidden_dim', 'feature_dim', 'description'
 # mlp_to_clt_mapping: [4096, 8192] - which MLP neurons correlate with each CLT feature
 # decoder_weights: [4096, 8192] - CLT → MLP reconstruction weights
+\`\`\`
 ---
 ### 1. Load a Transcoder
+\`\`\`python
 import torch
 import torch.nn as nn
     # features: [batch, seq_len, 8192] - sparse interpretable features
     # reconstruction: [batch, seq_len, 4096] - reconstructed MLP output
+\`\`\`
 ### 2. Use MLP→CLT Mapping
 The mapping shows which MLP neurons are correlated with each CLT feature:
+\`\`\`python
 mapping_data = torch.load('mapping_L10.pt', map_location='cpu')
 mlp_to_clt = mapping_data['mlp_to_clt_mapping']  # [4096, 8192]
 mlp_neuron_idx = 567
 top_clt_features = mlp_to_clt[mlp_neuron_idx, :].topk(k=10)
 print(f"Top CLT features for MLP neuron {mlp_neuron_idx}: {top_clt_features.indices}")
+\`\`\`
 ### 3. Replacement Model (Full Integration)
 For direct integration into LLaVA (replace MLPs with CLTs):
+\`\`\`python
 from transformers import LlavaForConditionalGeneration
 # Load LLaVA
     return reconstruction
 model.model.layers[layer_idx].mlp.register_forward_hook(replace_mlp_with_clt)
+\`\`\`
 ---
 If you use these transcoders in your research, please cite:
+\`\`\`bibtex
 @misc{llava15_clts_2025,
   title={Cross-Layer Transcoders for LLaVA-1.5-7B},
   author={Koko's Dev},
   publisher={HuggingFace Hub},
   howpublished={\url{https://huggingface.co/KokosDev/llava15-7b-clt}}
 }
+\`\`\`
 ---
 - **Base Model**: [LLaVA-1.5-7B](https://huggingface.co/llava-hf/llava-1.5-7b-hf)
 - **Methodology**: Inspired by Anthropic's Circuit-Tracer and sparse autoencoder research
 - **Training Data**: Flickr30K, instruction-following datasets