SiameseNorm Vs KEEL

#2
by NilanE - opened

Have you had a chance to test https://arxiv.org/pdf/2601.19895?
I've found it to be stable in GAN training (a good stress-test), especially when paired with Gated Attention (https://arxiv.org/pdf/2505.06708 and Qwen3-next).

I haven’t tried it yet, but that sounds interesting.
Since this model has a gating mechanism for each head, it might work well with it.

Sign up or log in to comment