SAELens
75.5 MB
Tom Lieberum
fold in scaling by sqrt(d_model) into params
9ff4e7b