fix-glu-mlp
#17
by michael-guenther - opened
The GluMLP is not working without flash attention, because the tensors are passed in a different shape. This PR fixes the issue. I also tested it that the embeddings with and without flash attentions are the same.
michael-guenther changed pull request status to open
LGTM!
michael-guenther changed pull request status to merged