dragon / optimizers

66.6 kB

Ctrl+K

2 contributors

History: 4 commits

alexandretl

ngram embeds | DDL new code + EC | Hyperball (AdamH, AdEMAMixH) | lr_expert

940f633 5 months ago

Ademamix.py

6.87 kB
CompletedP | memory+norm logging | proper MoE with ScatterMoE, update bias, Latent-MoE | Muon experiments | VE for Mamba3 | fix torch recompiles during varlen training 6 months ago
Snoo.py

2.42 kB
MLA | KDA | TPA | GDA | ResFormer | Mamba3 | DragonMimo (WIP) | tokenshift | SeeDNorm | shrink DA/GDN | gate shared across all block types | 8 months ago
__init__.py

54 Bytes
MLA | KDA | TPA | GDA | ResFormer | Mamba3 | DragonMimo (WIP) | tokenshift | SeeDNorm | shrink DA/GDN | gate shared across all block types | 8 months ago
adamh.py

2.74 kB
ngram embeds | DDL new code + EC | Hyperball (AdamH, AdEMAMixH) | lr_expert 5 months ago
ademamixh.py

7.99 kB
ngram embeds | DDL new code + EC | Hyperball (AdamH, AdEMAMixH) | lr_expert 5 months ago
muon_modded.py

4.79 kB
CompletedP | memory+norm logging | proper MoE with ScatterMoE, update bias, Latent-MoE | Muon experiments | VE for Mamba3 | fix torch recompiles during varlen training 6 months ago
newton_schulz_triton.py

11.6 kB
CompletedP | memory+norm logging | proper MoE with ScatterMoE, update bias, Latent-MoE | Muon experiments | VE for Mamba3 | fix torch recompiles during varlen training 6 months ago
normuon.py

25.4 kB
CompletedP | memory+norm logging | proper MoE with ScatterMoE, update bias, Latent-MoE | Muon experiments | VE for Mamba3 | fix torch recompiles during varlen training 6 months ago
opt_utils.py

4.82 kB
CompletedP | memory+norm logging | proper MoE with ScatterMoE, update bias, Latent-MoE | Muon experiments | VE for Mamba3 | fix torch recompiles during varlen training 6 months ago