Introducing the Gemma-4-E2B Brain Atlas, an interactive neural census of every layer, every head, 16 behavior categories in Google's flagship 2B model. We ran 184,320 probe prompts across 35 layers Γ 8 components and mapped what came back.
The Brain Atlas is an interactive tool that lets you explore the internal behavior of Google's Gemma-4-E2B model layer by layer, head by head. Pick a behavior category, pick a layer, and see exactly which components light up and which go quiet. The dataset is fully queryable if you want to go deeper.
The mapping combines multiple single-direction techniques run in parallel across every layer and component. Activation taxonomy (classifying each neuron by how broadly it fires across prompt categories), coactivation pair analysis (which neurons lock together and on what topics), F-stat behavioral separation (one-way ANOVA per feature across 16 behavior categories), per-head specificity scoring, and a full compliance probe pipeline using SVD, sparse decomposition, and variance analysis.
Here's what I found when I ran it.
The sharpest behavioral signal isn't at the output. It's Layer 0. Up projection hits F=22.7, nearly 2x anything in the final third of the network. The model does its behavioral sorting before it's barely started, then spends the next 34 layers⦠doing what exactly?
The gate has a lifecycle. 70% dormant at L1, highest in the model. Brutal sparsification at L23β26 (>58% silent). Then reopens. The final five layers are the most alive gates anywhere. The model's last act is a gate flare. Layer 4 routes 5 projections to dim 448. One layer. One dimension. That's a topology highway.
Zero specialist neurons. Not one. 1.2M neurons analyzed. None fires exclusively on a single category. This model distributes everything.
Weβre excited to announce that Unsloth has joined the PyTorch Ecosystem! π₯π¦₯
Unsloth is an open-source project that makes training & running models more accurate and faster with less compute. Our mission is to make local AI accessible to everyone. Thanks to all of you for making this possible! π
I'm starting a new model line, Locus. These models aren't fine tuned, they de-tuned π€. What I mean by that is I remove a percentage of the corporate tuned speech patterns like "why this matters" "no fluff" "as a large language model". By reducing the RLHF based habitual patterns in model response I've had higher success rates in personality adoptability. I've fine tuned on the Locus models myself so you can chat with it post fine-tune or just trust me and try it yourself!
I don't aim to remove guard rails or the LLM identity entirely, what I want to do is dampen RLHF to a manageable volume. Personality models perform better with guardrails intact no different than humans with moral guidelines and boundaries. Refusals can help steer and mold personality. RLHF however drowns out adaptability so I'm cranking it down for you to crank your project up!