Post
240
Gemma-4-E2B SAE Atlas — Work in Progress
JumpReLU Sparse Autoencoders trained on every layer of Gemma-4-E2B-it using an adaptive Lagrangian controller. Training in progress. I'm publishing layers live as they come hot off the press for anyone interested in following along. I will be making further adjustments for finer resolution but the early data should be helpful I think? I'm just a bartender don't trust everything I say. 🤗 The Lagrangian math is pretty cool. It auto-steers the trainer taking the guess work out of hyperparameter adjustments.
Full paper and methodology when ever I get around to writing it up. There's a lot of work to be done. For now though, enjoy! 🤗
https://huggingface.co/juiceb0xc0de/gemma-4-e2b-saes
JumpReLU Sparse Autoencoders trained on every layer of Gemma-4-E2B-it using an adaptive Lagrangian controller. Training in progress. I'm publishing layers live as they come hot off the press for anyone interested in following along. I will be making further adjustments for finer resolution but the early data should be helpful I think? I'm just a bartender don't trust everything I say. 🤗 The Lagrangian math is pretty cool. It auto-steers the trainer taking the guess work out of hyperparameter adjustments.
Full paper and methodology when ever I get around to writing it up. There's a lot of work to be done. For now though, enjoy! 🤗
https://huggingface.co/juiceb0xc0de/gemma-4-e2b-saes