File size: 3,931 Bytes
cb8a7e5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
# Activation Heatmap Visualization

This script creates Neuronpedia-style heatmap visualizations of token activations from activation dump JSON files.

## Overview

The script implements the same logarithmic scaling and color mapping approach used by Neuronpedia to visualize how features activate on different tokens. Each token is displayed with a green background whose intensity corresponds to the activation value.

## Features

- **Logarithmic scaling**: Uses the same formula as Neuronpedia for opacity mapping
- **Individual feature visualizations**: Shows each feature's activations across tokens
- **Combined heatmap**: Matrix view of all features vs all tokens
- **Color coding**: Green intensity from light (low activation) to dark (high activation)
- **Special token handling**: Replaces special tokens with displayable characters
- **Value display**: Optional display of numerical activation values on tokens

## Usage

### Basic Usage

```bash
python scripts/visualization/activation_heatmap.py path/to/activations_dump.json
```

This will create visualizations in `output/activation_heatmaps/` by default.

### Command Line Options

```bash
python scripts/visualization/activation_heatmap.py INPUT_JSON [OPTIONS]

Required:
  INPUT_JSON              Path to activation dump JSON file

Optional:
  -o, --output-dir DIR    Output directory (default: output/activation_heatmaps)
  -k, --top-k N          Number of top features to visualize (default: 10)
  --probe-index N        Index of probe result to visualize (default: 0)
  --tokens-per-row N     Tokens per row in visualization (default: 20)
  --no-values            Hide activation values on tokens
  --combined-only        Only generate combined heatmap
```

### Examples

**Visualize top 5 features from a specific probe:**
```bash
python scripts/visualization/activation_heatmap.py \
  "output/examples/Dallas/activations_dump (2).json" \
  -k 5 \
  --probe-index 0 \
  -o output/my_visualizations
```

**Generate only the combined heatmap:**
```bash
python scripts/visualization/activation_heatmap.py \
  "output/examples/Dallas/activations_dump (2).json" \
  --combined-only
```

**Adjust layout for longer prompts:**
```bash
python scripts/visualization/activation_heatmap.py \
  "output/examples/Dallas/activations_dump (2).json" \
  --tokens-per-row 30
```

## Input Format

The script expects JSON files with this structure:

```json
{
  "model": "model-name",
  "results": [
    {
      "probe_id": "probe_0_Dallas",
      "prompt": "entity: A city in Texas, USA is Dallas",
      "tokens": ["<bos>", "entity", ":", " A", ...],
      "counts": [[9813.0, 72.0, ...], ...] // OR features array
    }
  ]
}
```

Supports both:
- Legacy `counts` format: 2D array [n_features][n_tokens]
- New `features` format: List of feature objects with metadata

## Output

The script generates:

1. **Combined heatmap** (`combined_heatmap.png`): Matrix visualization showing all features
2. **Individual feature images** (one per top-K feature): Detailed view of each feature's activations

All images are saved to: `{output_dir}/probe_{index}/`

## Color Scheme

- **Base color**: Emerald green (RGB: 52, 211, 153)
- **Opacity range**: 0.05 (minimum) to 1.0 (maximum)
- **Threshold**: Values below 0.00005 are not highlighted
- **Text color**: Black on light backgrounds, white on dark backgrounds

## Implementation Details

The visualization uses the exact same logarithmic opacity calculation as Neuronpedia:

```python
opacity = MINIMUM_OPACITY + (log10(1 + 9 * ratio) * scale) / log10(10)
```

Where:
- `ratio = current_value / max_value`
- `scale = 1 - MINIMUM_OPACITY`
- `MINIMUM_OPACITY = 0.05`

This creates a perceptually uniform color gradient that emphasizes differences in lower activation ranges while still showing the full dynamic range.

## Dependencies

- matplotlib
- numpy
- Python 3.7+

Install with:
```bash
pip install matplotlib numpy
```