Activation Heatmap Visualization
This script creates Neuronpedia-style heatmap visualizations of token activations from activation dump JSON files.
Overview
The script implements the same logarithmic scaling and color mapping approach used by Neuronpedia to visualize how features activate on different tokens. Each token is displayed with a green background whose intensity corresponds to the activation value.
Features
- Logarithmic scaling: Uses the same formula as Neuronpedia for opacity mapping
- Individual feature visualizations: Shows each feature's activations across tokens
- Combined heatmap: Matrix view of all features vs all tokens
- Color coding: Green intensity from light (low activation) to dark (high activation)
- Special token handling: Replaces special tokens with displayable characters
- Value display: Optional display of numerical activation values on tokens
Usage
Basic Usage
python scripts/visualization/activation_heatmap.py path/to/activations_dump.json
This will create visualizations in output/activation_heatmaps/ by default.
Command Line Options
python scripts/visualization/activation_heatmap.py INPUT_JSON [OPTIONS]
Required:
INPUT_JSON Path to activation dump JSON file
Optional:
-o, --output-dir DIR Output directory (default: output/activation_heatmaps)
-k, --top-k N Number of top features to visualize (default: 10)
--probe-index N Index of probe result to visualize (default: 0)
--tokens-per-row N Tokens per row in visualization (default: 20)
--no-values Hide activation values on tokens
--combined-only Only generate combined heatmap
Examples
Visualize top 5 features from a specific probe:
python scripts/visualization/activation_heatmap.py \
"output/examples/Dallas/activations_dump (2).json" \
-k 5 \
--probe-index 0 \
-o output/my_visualizations
Generate only the combined heatmap:
python scripts/visualization/activation_heatmap.py \
"output/examples/Dallas/activations_dump (2).json" \
--combined-only
Adjust layout for longer prompts:
python scripts/visualization/activation_heatmap.py \
"output/examples/Dallas/activations_dump (2).json" \
--tokens-per-row 30
Input Format
The script expects JSON files with this structure:
{
"model": "model-name",
"results": [
{
"probe_id": "probe_0_Dallas",
"prompt": "entity: A city in Texas, USA is Dallas",
"tokens": ["<bos>", "entity", ":", " A", ...],
"counts": [[9813.0, 72.0, ...], ...] // OR features array
}
]
}
Supports both:
- Legacy
countsformat: 2D array [n_features][n_tokens] - New
featuresformat: List of feature objects with metadata
Output
The script generates:
- Combined heatmap (
combined_heatmap.png): Matrix visualization showing all features - Individual feature images (one per top-K feature): Detailed view of each feature's activations
All images are saved to: {output_dir}/probe_{index}/
Color Scheme
- Base color: Emerald green (RGB: 52, 211, 153)
- Opacity range: 0.05 (minimum) to 1.0 (maximum)
- Threshold: Values below 0.00005 are not highlighted
- Text color: Black on light backgrounds, white on dark backgrounds
Implementation Details
The visualization uses the exact same logarithmic opacity calculation as Neuronpedia:
opacity = MINIMUM_OPACITY + (log10(1 + 9 * ratio) * scale) / log10(10)
Where:
ratio = current_value / max_valuescale = 1 - MINIMUM_OPACITYMINIMUM_OPACITY = 0.05
This creates a perceptually uniform color gradient that emphasizes differences in lower activation ranges while still showing the full dynamic range.
Dependencies
- matplotlib
- numpy
- Python 3.7+
Install with:
pip install matplotlib numpy