tmax-9b brain atlas activation census, OV-circuits, and capability fence (32 layers)

by juiceb0xc0de - opened 6 days ago

allenai/tmax-9b Brain Atlas — The Sweet Spot of the Hybrid Family

Cross-post: I ran a brain atlas on the mid-size tmax. Sub-Zero coverage is concentrated in layers 16–30, so read the surgical headroom numbers as a late-layer snapshot.

model: allenai/tmax-9b
atlas type: activation census + Sub-Zero brain atlas + OV-circuit SVD
corpus: 8,965 prompts
layers: 32
attention layers: 3, 7, 11, 15, 19, 23, 27, 31
hybrid layers: everything else
sacred (fully probed) layers: 16–31
datasets: juiceb0xc0de/tmax-9b-atlas

What this is

TMax-9B is the mid-size member of the tmax family. Same 25%-attention cadence as the smaller sizes, but wider. The activation census is complete across all 32 layers, while the Sub-Zero surgery pass is concentrated in layers 16–30.

What was run

Activation census over 8,965 prompts.
Feature taxonomy, per-head analysis, OV-circuit SVD.
Logit-lens pass.
Coactivation and code-analysis.
Sub-Zero surgery + capability fence across code, math, reasoning, factual, multilingual.

The shape of the thing

Property	Value
Layers	32
Attention layers	8
KV heads	4
Head dim	256
d_model	4096 implied
Sacred region	16–31 (50% of depth)
OV spectral concentration	0.040
OV effective rank	~94

What the numbers suggest

Attention gets even more distributed

OV spectral concentration is 0.040 with effective rank around 94. The heads are using almost the full 256-dimensional space. This is not memorized copy-paste attention; it is high-dimensional weighted computation.

Feature taxonomy is broad, not hyperspecific

Class	Share
`partial_shared`	34.9%
`non_activated`	24.6%
`broadly_shared`	26.4%
`all_shared`	14.1%
`specific_*`	<0.03%

The 9B is using its extra capacity to make more directions partially responsive across many prompts, not to build narrow specialists. The specific_* tail is still tiny: 312 rows out of 1,720,320.

Late `gate` features dominate the logit lens

Top logit-lens peaks are all late gate features:

layer 31 gate 9995 — F=721.7
layer 23 gate 5067 — F=678.9
layer 28 gate 4316 — F=674.4

These are the strongest output-vocabulary predictors in the atlas. The hybrid MLP/SSM gates are doing serious output-vocabulary routing in the late half of the model.

Sacred region is half the network

Layers 16–31 are flagged sacred and get full Sub-Zero treatment. That is 50% of depth, which is deeper proportionally than most dense transformers of similar size. The hybrid architecture seems to need a bigger late-layer "structured transformation" region because each layer is mixing attention and SSM computation.

Surgical headroom looks good in the covered region

Metric	Value
Tested axes	16 (80 domain rows)
Fence pass rate	93.8%
Worst axis damage	0.43 (layer 26 gate)
Avg damage	0.031

The worst failure in the covered region is a late gate direction, not an early projection. Classifier accuracy in the covered layers is 0.967–1.000, average 0.981.

Multilingual is the most fragile capability in the snapshot

The worst Sub-Zero damage is layer 26 gate_proj axis 0, failing the fence with 0.432 damage to multilingual. A lot of cross-lingual capability is packed into a single late gate direction.

Compliance/behavior directions stay partly entangled

Peak compliance-behaviour SV fraction is around 1.2%. Style/behavior directions remain partially entangled with capability directions at this scale.

What is novel vs a dense transformer

Half the depth is sacred. Dense models usually reserve the deepest 25–35% for the heavy subspaces; this one needs 50%.
gate drives the logit lens, not attention. The strongest output-token predictors are hybrid MLP gates, not late attention heads.
Distributed attention + structured SSM/MLP. Attention gets more distributed with scale while the MLP/SSM side carries the concentrated transformations.
Late multilingual fragility. One gate_proj direction at layer 26 is critical for multilingual output.

The stuff I deliberately skipped

Same caveat as the rest of the family: hybrid SSM layers that do not tokenize language were not measured because the numbers would be noise. I am working on a way to capture whatever those hybrid layers are actually doing, but it is not in this atlas yet.

Caveats

Sub-Zero SV table is intentionally sparse; only meaningful singular values are kept.
Sub-Zero coverage is concentrated in layers 16–30. Treat surgical headroom numbers as a late-layer snapshot, not a full-model average.
No Qwen3.5-9B base comparison here.

Bottom line

TMax-9B has a complete activation census, the strongest logit-lens signal in the covered layers, a deep sacred region, and good surgical headroom in the late-layer snapshot. If you want to understand how hybrid SSM-Mamba-transformers behave at practical scale, the covered half of this model is a clean place to start.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment