Variation of average L0 across layers
For each layer, I looked at the transcoder with lowest average L0 (these are also the canonical ones shown on neuronpedia). I noticed this L0 value varies widely across layers, with a sharp drop after layer 10. The sequence looks like this:
76, 65, 49, 54, 88, 87, 95, 70, 52, 72, 88, 5, 6, 8, 8, 8, 10, 12, 13, 12, 11, 13, 15, 25, 37, 41.
Did you train the transcoders on different layers with a different accuracy-sparsity tradeoff?
Hey @sjgerstner , thanks for the careful analysis.
That pattern is real. We did not intentionally train different layers with fundamentally different accuracy-sparsity tradeoffs in the sense of targeting specific L0 values per layer. The same objective, JumpReLU mechanism and Pareto-based model selection procedure were used throughout. However, as documented in the Gemma Scope Technical Report the sparsity sweep ranges were adjusted for certain mid-to-late layers to ensure a meaningful reconstruction-sparsity frontier.
Because released transcoder are selected from each layer's Pareto frontier, the lowest L0 canonical model per layer can land at very different points along that curve. Earlier layers tend to require higher effective dimensionality, so even strong sparsity penalties yeild relatively large L0 values. In deeper layers, activation statistics differ and the sweep (including the adjusted ranges where applicable) can produce much sparser solutions without large reconstruction loss.
So the sharp drop after layer ten reflects a combination of layer dependant representation geometry and documented sweep range adjustments, not a deliberate architectural change in the training objective or an explicit per layer sparsity target.
Thank you!