SiameseNorm Vs KEEL
#2
by NilanE - opened
Have you had a chance to test https://arxiv.org/pdf/2601.19895?
I've found it to be stable in GAN training (a good stress-test), especially when paired with Gated Attention (https://arxiv.org/pdf/2505.06708 and Qwen3-next).
I haven’t tried it yet, but that sounds interesting.
Since this model has a gating mechanism for each head, it might work well with it.