attribution-graph-probing / scripts /visualization /README_ACTIVATION_HEATMAP.md
peppinob-ol
Initial deployment: Attribution Graph Probing app
cb8a7e5
|
raw
history blame
3.93 kB
# Activation Heatmap Visualization
This script creates Neuronpedia-style heatmap visualizations of token activations from activation dump JSON files.
## Overview
The script implements the same logarithmic scaling and color mapping approach used by Neuronpedia to visualize how features activate on different tokens. Each token is displayed with a green background whose intensity corresponds to the activation value.
## Features
- **Logarithmic scaling**: Uses the same formula as Neuronpedia for opacity mapping
- **Individual feature visualizations**: Shows each feature's activations across tokens
- **Combined heatmap**: Matrix view of all features vs all tokens
- **Color coding**: Green intensity from light (low activation) to dark (high activation)
- **Special token handling**: Replaces special tokens with displayable characters
- **Value display**: Optional display of numerical activation values on tokens
## Usage
### Basic Usage
```bash
python scripts/visualization/activation_heatmap.py path/to/activations_dump.json
```
This will create visualizations in `output/activation_heatmaps/` by default.
### Command Line Options
```bash
python scripts/visualization/activation_heatmap.py INPUT_JSON [OPTIONS]
Required:
INPUT_JSON Path to activation dump JSON file
Optional:
-o, --output-dir DIR Output directory (default: output/activation_heatmaps)
-k, --top-k N Number of top features to visualize (default: 10)
--probe-index N Index of probe result to visualize (default: 0)
--tokens-per-row N Tokens per row in visualization (default: 20)
--no-values Hide activation values on tokens
--combined-only Only generate combined heatmap
```
### Examples
**Visualize top 5 features from a specific probe:**
```bash
python scripts/visualization/activation_heatmap.py \
"output/examples/Dallas/activations_dump (2).json" \
-k 5 \
--probe-index 0 \
-o output/my_visualizations
```
**Generate only the combined heatmap:**
```bash
python scripts/visualization/activation_heatmap.py \
"output/examples/Dallas/activations_dump (2).json" \
--combined-only
```
**Adjust layout for longer prompts:**
```bash
python scripts/visualization/activation_heatmap.py \
"output/examples/Dallas/activations_dump (2).json" \
--tokens-per-row 30
```
## Input Format
The script expects JSON files with this structure:
```json
{
"model": "model-name",
"results": [
{
"probe_id": "probe_0_Dallas",
"prompt": "entity: A city in Texas, USA is Dallas",
"tokens": ["<bos>", "entity", ":", " A", ...],
"counts": [[9813.0, 72.0, ...], ...] // OR features array
}
]
}
```
Supports both:
- Legacy `counts` format: 2D array [n_features][n_tokens]
- New `features` format: List of feature objects with metadata
## Output
The script generates:
1. **Combined heatmap** (`combined_heatmap.png`): Matrix visualization showing all features
2. **Individual feature images** (one per top-K feature): Detailed view of each feature's activations
All images are saved to: `{output_dir}/probe_{index}/`
## Color Scheme
- **Base color**: Emerald green (RGB: 52, 211, 153)
- **Opacity range**: 0.05 (minimum) to 1.0 (maximum)
- **Threshold**: Values below 0.00005 are not highlighted
- **Text color**: Black on light backgrounds, white on dark backgrounds
## Implementation Details
The visualization uses the exact same logarithmic opacity calculation as Neuronpedia:
```python
opacity = MINIMUM_OPACITY + (log10(1 + 9 * ratio) * scale) / log10(10)
```
Where:
- `ratio = current_value / max_value`
- `scale = 1 - MINIMUM_OPACITY`
- `MINIMUM_OPACITY = 0.05`
This creates a perceptually uniform color gradient that emphasizes differences in lower activation ranges while still showing the full dynamic range.
## Dependencies
- matplotlib
- numpy
- Python 3.7+
Install with:
```bash
pip install matplotlib numpy
```