Spaces:
Sleeping
Sleeping
File size: 3,931 Bytes
cb8a7e5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 |
# Activation Heatmap Visualization
This script creates Neuronpedia-style heatmap visualizations of token activations from activation dump JSON files.
## Overview
The script implements the same logarithmic scaling and color mapping approach used by Neuronpedia to visualize how features activate on different tokens. Each token is displayed with a green background whose intensity corresponds to the activation value.
## Features
- **Logarithmic scaling**: Uses the same formula as Neuronpedia for opacity mapping
- **Individual feature visualizations**: Shows each feature's activations across tokens
- **Combined heatmap**: Matrix view of all features vs all tokens
- **Color coding**: Green intensity from light (low activation) to dark (high activation)
- **Special token handling**: Replaces special tokens with displayable characters
- **Value display**: Optional display of numerical activation values on tokens
## Usage
### Basic Usage
```bash
python scripts/visualization/activation_heatmap.py path/to/activations_dump.json
```
This will create visualizations in `output/activation_heatmaps/` by default.
### Command Line Options
```bash
python scripts/visualization/activation_heatmap.py INPUT_JSON [OPTIONS]
Required:
INPUT_JSON Path to activation dump JSON file
Optional:
-o, --output-dir DIR Output directory (default: output/activation_heatmaps)
-k, --top-k N Number of top features to visualize (default: 10)
--probe-index N Index of probe result to visualize (default: 0)
--tokens-per-row N Tokens per row in visualization (default: 20)
--no-values Hide activation values on tokens
--combined-only Only generate combined heatmap
```
### Examples
**Visualize top 5 features from a specific probe:**
```bash
python scripts/visualization/activation_heatmap.py \
"output/examples/Dallas/activations_dump (2).json" \
-k 5 \
--probe-index 0 \
-o output/my_visualizations
```
**Generate only the combined heatmap:**
```bash
python scripts/visualization/activation_heatmap.py \
"output/examples/Dallas/activations_dump (2).json" \
--combined-only
```
**Adjust layout for longer prompts:**
```bash
python scripts/visualization/activation_heatmap.py \
"output/examples/Dallas/activations_dump (2).json" \
--tokens-per-row 30
```
## Input Format
The script expects JSON files with this structure:
```json
{
"model": "model-name",
"results": [
{
"probe_id": "probe_0_Dallas",
"prompt": "entity: A city in Texas, USA is Dallas",
"tokens": ["<bos>", "entity", ":", " A", ...],
"counts": [[9813.0, 72.0, ...], ...] // OR features array
}
]
}
```
Supports both:
- Legacy `counts` format: 2D array [n_features][n_tokens]
- New `features` format: List of feature objects with metadata
## Output
The script generates:
1. **Combined heatmap** (`combined_heatmap.png`): Matrix visualization showing all features
2. **Individual feature images** (one per top-K feature): Detailed view of each feature's activations
All images are saved to: `{output_dir}/probe_{index}/`
## Color Scheme
- **Base color**: Emerald green (RGB: 52, 211, 153)
- **Opacity range**: 0.05 (minimum) to 1.0 (maximum)
- **Threshold**: Values below 0.00005 are not highlighted
- **Text color**: Black on light backgrounds, white on dark backgrounds
## Implementation Details
The visualization uses the exact same logarithmic opacity calculation as Neuronpedia:
```python
opacity = MINIMUM_OPACITY + (log10(1 + 9 * ratio) * scale) / log10(10)
```
Where:
- `ratio = current_value / max_value`
- `scale = 1 - MINIMUM_OPACITY`
- `MINIMUM_OPACITY = 0.05`
This creates a perceptually uniform color gradient that emphasizes differences in lower activation ranges while still showing the full dynamic range.
## Dependencies
- matplotlib
- numpy
- Python 3.7+
Install with:
```bash
pip install matplotlib numpy
```
|