dragon / optimizers

Commit History

ngram embeds | DDL new code + EC | Hyperball (AdamH, AdEMAMixH) | lr_expert
940f633

alexandretl commited on

CompletedP | memory+norm logging | proper MoE with ScatterMoE, update bias, Latent-MoE | Muon experiments | VE for Mamba3 | fix torch recompiles during varlen training
b9f197c

alexandretl commited on

alpha normalize ademamix | mamba norms and gate | VWN | wnorm (nemotron-flash) | MG equivalence | fix IDM config saving | CCAv2 | MoBA | reduce lm head
d79da9a

alexandretl commited on

MLA | KDA | TPA | GDA | ResFormer | Mamba3 | DragonMimo (WIP) | tokenshift | SeeDNorm | shrink DA/GDN | gate shared across all block types |
bc8288b

alexandretl commited on