ngram embeds | DDL new code + EC | Hyperball (AdamH, AdEMAMixH) | lr_expert 940f633 alexandretl commited on 6 days ago
equivalence with MG (updated GTDPA, GDN, M3MIMO ) | some refactoring 87ea3a8 alexandretl commited on 16 days ago
CDP LR expert scaling tests (linear or sqrt, inc) | missing args in config (not critical) | b690eb3 alexandretl commited on 17 days ago
Differential Attn v2 | proper coord check for MoE | proper init for MoE | some refactoring 2f56a15 alexandretl commited on 18 days ago
attn layer simplification & IDM (for gpt baseline) | local/global removed | CDP coord check | CDP fix lm_head init scaling | d10acb2 alexandretl commited on 20 days ago
coordcheck v2 (gqa paper) | some fixes relative to previous rehaul e4fd764 alexandretl commited on 26 days ago
CompletedP | memory+norm logging | proper MoE with ScatterMoE, update bias, Latent-MoE | Muon experiments | VE for Mamba3 | fix torch recompiles during varlen training b9f197c alexandretl commited on Jan 2
alpha normalize ademamix | mamba norms and gate | VWN | wnorm (nemotron-flash) | MG equivalence | fix IDM config saving | CCAv2 | MoBA | reduce lm head d79da9a alexandretl commited on Dec 3, 2025
mamba3 flags | mamba3 default state size to 128, headdim to 64 | mamba2 | fix mamba3 mimo (JG) | (fake) moe | intra doc maskiiiing (with SS) | seednorm tests | coord checks 58b82e2 alexandretl commited on Nov 7, 2025
MLA | KDA | TPA | GDA | ResFormer | Mamba3 | DragonMimo (WIP) | tokenshift | SeeDNorm | shrink DA/GDN | gate shared across all block types | bc8288b alexandretl commited on Nov 4, 2025
CCE | Gate attn | ZCG | RoPE GDN | GQA GDN | uniconv GDN ||CCA | NSA | PLT (not tested) | DMA fix | SWR 959cbe5 alexandretl commited on Oct 22, 2025
uscaling training | resume training | slw training | eval loss training | fix data offset training | DSA & DMA (testing) | Qwen3Next-like arch 19e6554 alexandretl commited on Oct 13, 2025
zero centered gamma for norm | proper qkv proj in GDN (tp aware, same as MG) | training script (wip) bd7b3d1 alexandretl commited on Oct 2, 2025
refactor backends selection | fix eager attn softcap & window | fix flex backend window | flex attn & eager backends for DA | eager backend for GDN | refactor GDN variables b914c22 alexandretl commited on Sep 23, 2025
revert vLLM modifications (separate head mult back to lm_head) 4e57133 alexandretl commited on Sep 22, 2025
manual automap for DragonModel | vLLM compat (alpha head in model, persistant=False, contiguous for conv1d, max pos hardcoded for now) 58a7542 alexandretl commited on Sep 10, 2025
updated rope (uses position_ids) | updated scalable softmax (uses position_ids) | refactoring | gitignore c1b5b67 alexandretl commited on Sep 8, 2025
correct kv cache management (trim after) | register for auto class f608fa3 jgcb00 commited on Sep 5, 2025
slw size in config | updated cache with SLW | fix gdn scalar (still needs refactoring) 8713903 jgcb00 commited on Sep 5, 2025