Use create_bidirectional_mask for backend-agnostic attention mask handling d8a5a92 verified kashif HF Staff commited on 11 days ago
fix: align _init_weights with Qwen2Moe using nn.init API c555c2f verified kashif HF Staff commited on 14 days ago
fix: call super()._init_weights() to match Qwen2Moe convention for transformers v5 7729892 verified kashif HF Staff commited on 14 days ago
fix: align RotaryEmbedding with Qwen2Moe pattern for transformers compat dfa9ac6 verified kashif HF Staff commited on 14 days ago