ngram embeds | DDL new code + EC | Hyperball (AdamH, AdEMAMixH) | lr_expert 940f633 alexandretl commited on 1 day ago
CompletedP | memory+norm logging | proper MoE with ScatterMoE, update bias, Latent-MoE | Muon experiments | VE for Mamba3 | fix torch recompiles during varlen training b9f197c alexandretl commited on Jan 2
alpha normalize ademamix | mamba norms and gate | VWN | wnorm (nemotron-flash) | MG equivalence | fix IDM config saving | CCAv2 | MoBA | reduce lm head d79da9a alexandretl commited on Dec 3, 2025
MLA | KDA | TPA | GDA | ResFormer | Mamba3 | DragonMimo (WIP) | tokenshift | SeeDNorm | shrink DA/GDN | gate shared across all block types | bc8288b alexandretl commited on Nov 4, 2025