# Activation Heatmap Visualization This script creates Neuronpedia-style heatmap visualizations of token activations from activation dump JSON files. ## Overview The script implements the same logarithmic scaling and color mapping approach used by Neuronpedia to visualize how features activate on different tokens. Each token is displayed with a green background whose intensity corresponds to the activation value. ## Features - **Logarithmic scaling**: Uses the same formula as Neuronpedia for opacity mapping - **Individual feature visualizations**: Shows each feature's activations across tokens - **Combined heatmap**: Matrix view of all features vs all tokens - **Color coding**: Green intensity from light (low activation) to dark (high activation) - **Special token handling**: Replaces special tokens with displayable characters - **Value display**: Optional display of numerical activation values on tokens ## Usage ### Basic Usage ```bash python scripts/visualization/activation_heatmap.py path/to/activations_dump.json ``` This will create visualizations in `output/activation_heatmaps/` by default. ### Command Line Options ```bash python scripts/visualization/activation_heatmap.py INPUT_JSON [OPTIONS] Required: INPUT_JSON Path to activation dump JSON file Optional: -o, --output-dir DIR Output directory (default: output/activation_heatmaps) -k, --top-k N Number of top features to visualize (default: 10) --probe-index N Index of probe result to visualize (default: 0) --tokens-per-row N Tokens per row in visualization (default: 20) --no-values Hide activation values on tokens --combined-only Only generate combined heatmap ``` ### Examples **Visualize top 5 features from a specific probe:** ```bash python scripts/visualization/activation_heatmap.py \ "output/examples/Dallas/activations_dump (2).json" \ -k 5 \ --probe-index 0 \ -o output/my_visualizations ``` **Generate only the combined heatmap:** ```bash python scripts/visualization/activation_heatmap.py \ "output/examples/Dallas/activations_dump (2).json" \ --combined-only ``` **Adjust layout for longer prompts:** ```bash python scripts/visualization/activation_heatmap.py \ "output/examples/Dallas/activations_dump (2).json" \ --tokens-per-row 30 ``` ## Input Format The script expects JSON files with this structure: ```json { "model": "model-name", "results": [ { "probe_id": "probe_0_Dallas", "prompt": "entity: A city in Texas, USA is Dallas", "tokens": ["", "entity", ":", " A", ...], "counts": [[9813.0, 72.0, ...], ...] // OR features array } ] } ``` Supports both: - Legacy `counts` format: 2D array [n_features][n_tokens] - New `features` format: List of feature objects with metadata ## Output The script generates: 1. **Combined heatmap** (`combined_heatmap.png`): Matrix visualization showing all features 2. **Individual feature images** (one per top-K feature): Detailed view of each feature's activations All images are saved to: `{output_dir}/probe_{index}/` ## Color Scheme - **Base color**: Emerald green (RGB: 52, 211, 153) - **Opacity range**: 0.05 (minimum) to 1.0 (maximum) - **Threshold**: Values below 0.00005 are not highlighted - **Text color**: Black on light backgrounds, white on dark backgrounds ## Implementation Details The visualization uses the exact same logarithmic opacity calculation as Neuronpedia: ```python opacity = MINIMUM_OPACITY + (log10(1 + 9 * ratio) * scale) / log10(10) ``` Where: - `ratio = current_value / max_value` - `scale = 1 - MINIMUM_OPACITY` - `MINIMUM_OPACITY = 0.05` This creates a perceptually uniform color gradient that emphasizes differences in lower activation ranges while still showing the full dynamic range. ## Dependencies - matplotlib - numpy - Python 3.7+ Install with: ```bash pip install matplotlib numpy ```