Spaces:

Peppinob
/

attribution-graph-probing

Sleeping

App Files Files Community

attribution-graph-probing / scripts /visualization /README_ACTIVATION_HEATMAP.md

peppinob-ol

Initial deployment: Attribution Graph Probing app

cb8a7e5 3 months ago

preview code

raw

history blame

3.93 kB

	# Activation Heatmap Visualization

	This script creates Neuronpedia-style heatmap visualizations of token activations from activation dump JSON files.

	## Overview

	The script implements the same logarithmic scaling and color mapping approach used by Neuronpedia to visualize how features activate on different tokens. Each token is displayed with a green background whose intensity corresponds to the activation value.

	## Features

	- Logarithmic scaling: Uses the same formula as Neuronpedia for opacity mapping
	- Individual feature visualizations: Shows each feature's activations across tokens
	- Combined heatmap: Matrix view of all features vs all tokens
	- Color coding: Green intensity from light (low activation) to dark (high activation)
	- Special token handling: Replaces special tokens with displayable characters
	- Value display: Optional display of numerical activation values on tokens

	## Usage

	### Basic Usage

	```bash
	python scripts/visualization/activation_heatmap.py path/to/activations_dump.json
	```

	This will create visualizations in `output/activation_heatmaps/` by default.

	### Command Line Options

	```bash
	python scripts/visualization/activation_heatmap.py INPUT_JSON [OPTIONS]

	Required:
	INPUT_JSON Path to activation dump JSON file

	Optional:
	-o, --output-dir DIR Output directory (default: output/activation_heatmaps)
	-k, --top-k N Number of top features to visualize (default: 10)
	--probe-index N Index of probe result to visualize (default: 0)
	--tokens-per-row N Tokens per row in visualization (default: 20)
	--no-values Hide activation values on tokens
	--combined-only Only generate combined heatmap
	```

	### Examples

	Visualize top 5 features from a specific probe:
	```bash
	python scripts/visualization/activation_heatmap.py \
	"output/examples/Dallas/activations_dump (2).json" \
	-k 5 \
	--probe-index 0 \
	-o output/my_visualizations
	```

	Generate only the combined heatmap:
	```bash
	python scripts/visualization/activation_heatmap.py \
	"output/examples/Dallas/activations_dump (2).json" \
	--combined-only
	```

	Adjust layout for longer prompts:
	```bash
	python scripts/visualization/activation_heatmap.py \
	"output/examples/Dallas/activations_dump (2).json" \
	--tokens-per-row 30
	```

	## Input Format

	The script expects JSON files with this structure:

	```json
	{
	"model": "model-name",
	"results": [
	{
	"probe_id": "probe_0_Dallas",
	"prompt": "entity: A city in Texas, USA is Dallas",
	"tokens": ["<bos>", "entity", ":", " A", ...],
	"counts": [[9813.0, 72.0, ...], ...] // OR features array
	}
	]
	}
	```

	Supports both:
	- Legacy `counts` format: 2D array [n_features][n_tokens]
	- New `features` format: List of feature objects with metadata

	## Output

	The script generates:

	1. Combined heatmap (`combined_heatmap.png`): Matrix visualization showing all features
	2. Individual feature images (one per top-K feature): Detailed view of each feature's activations

	All images are saved to: `{output_dir}/probe_{index}/`

	## Color Scheme

	- Base color: Emerald green (RGB: 52, 211, 153)
	- Opacity range: 0.05 (minimum) to 1.0 (maximum)
	- Threshold: Values below 0.00005 are not highlighted
	- Text color: Black on light backgrounds, white on dark backgrounds

	## Implementation Details

	The visualization uses the exact same logarithmic opacity calculation as Neuronpedia:

	```python
	opacity = MINIMUM_OPACITY + (log10(1 + 9 * ratio) * scale) / log10(10)
	```

	Where:
	- `ratio = current_value / max_value`
	- `scale = 1 - MINIMUM_OPACITY`
	- `MINIMUM_OPACITY = 0.05`

	This creates a perceptually uniform color gradient that emphasizes differences in lower activation ranges while still showing the full dynamic range.

	## Dependencies

	- matplotlib
	- numpy
	- Python 3.7+

	Install with:
	```bash
	pip install matplotlib numpy
	```