Commit History

Use create_bidirectional_mask for backend-agnostic attention mask handling (SDPA, FA2, flex)
90c63c7
verified

kashif HF Staff commited on

fix: align _init_weights with Qwen2Moe using nn.init API
92a5aa6
verified

kashif HF Staff commited on

fix: call super()._init_weights() to match Qwen2Moe convention
1f0d143
verified

kashif HF Staff commited on

fix: align RotaryEmbedding with Qwen2Moe pattern for transformers compat
a0f450f
verified

kashif HF Staff commited on

fix: add default factor=1.0 for linear rope compat with newer transformers
220496f
verified

kashif HF Staff commited on

fix: remap default rope_type to linear for newer transformers compat
0435edb
verified

kashif HF Staff commited on

fix: use linear rope_type instead of removed default for transformers compat
52747ba
verified

kashif HF Staff commited on

Update README.md
bbb5715
verified

utdawn commited on

Update README.md
c38df8b
verified

utdawn commited on

Update modeling_llada2_moe.py
c60e761
verified

utdawn commited on

Update README.md
6d07958
verified

utdawn commited on

Create README.md
c8ebb85
verified

utdawn commited on

Update modeling_llada2_moe.py
c0b959e
verified

utdawn commited on

Create modeling_llada2_moe.py
5430aaf
verified

utdawn commited on

Update configuration_llada2_moe.py
276abc9
verified

utdawn commited on

Add files using upload-large-folder tool
a8af0fb
verified

m1ngcheng commited on

initial commit
54d0b65
verified

m1ngcheng commited on