working resume | classic input embed | nGPT logit scaling | XSA | del M3 as_strided 10aee3a alexandretl commited on Mar 27
[big refactoring] remove ngram, ddl, stem, vwn, reduce lm head, seednorm, mla, uscaling, derf, old args, old layers, diffattnV1, old mamba3s + MAMBA3 TP AWARE 64e97d9 alexandretl commited on Mar 14
cosnet | complete SLW | mamba3 fast inference | fix position_ids shape | ProRes | proper print0 | optim work 24d333c alexandretl commited on Mar 12
geodesic norm | shared expert gate | new mg dataset | no more buffers | moe router in fp32 70543c0 alexandretl commited on Mar 5
ngram embeds | DDL new code + EC | Hyperball (AdamH, AdEMAMixH) | lr_expert 940f633 alexandretl commited on Feb 2
equivalence with MG (updated GTDPA, GDN, M3MIMO ) | some refactoring 87ea3a8 alexandretl commited on Jan 23
CDP LR expert scaling tests (linear or sqrt, inc) | missing args in config (not critical) | b690eb3 alexandretl commited on Jan 22
Differential Attn v2 | proper coord check for MoE | proper init for MoE | some refactoring 2f56a15 alexandretl commited on Jan 21
attn layer simplification & IDM (for gpt baseline) | local/global removed | CDP coord check | CDP fix lm_head init scaling | d10acb2 alexandretl commited on Jan 19
coordcheck v2 (gqa paper) | some fixes relative to previous rehaul e4fd764 alexandretl commited on Jan 13
CompletedP | memory+norm logging | proper MoE with ScatterMoE, update bias, Latent-MoE | Muon experiments | VE for Mamba3 | fix torch recompiles during varlen training b9f197c alexandretl commited on Jan 2
alpha normalize ademamix | mamba norms and gate | VWN | wnorm (nemotron-flash) | MG equivalence | fix IDM config saving | CCAv2 | MoBA | reduce lm head d79da9a alexandretl commited on Dec 3, 2025
mamba3 flags | mamba3 default state size to 128, headdim to 64 | mamba2 | fix mamba3 mimo (JG) | (fake) moe | intra doc maskiiiing (with SS) | seednorm tests | coord checks 58b82e2 alexandretl commited on Nov 7, 2025
MLA | KDA | TPA | GDA | ResFormer | Mamba3 | DragonMimo (WIP) | tokenshift | SeeDNorm | shrink DA/GDN | gate shared across all block types | bc8288b alexandretl commited on Nov 4, 2025
CCE | Gate attn | ZCG | RoPE GDN | GQA GDN | uniconv GDN ||CCA | NSA | PLT (not tested) | DMA fix | SWR 959cbe5 alexandretl commited on Oct 22, 2025
uscaling training | resume training | slw training | eval loss training | fix data offset training | DSA & DMA (testing) | Qwen3Next-like arch 19e6554 alexandretl commited on Oct 13, 2025
zero centered gamma for norm | proper qkv proj in GDN (tp aware, same as MG) | training script (wip) bd7b3d1 alexandretl commited on Oct 2, 2025
refactor backends selection | fix eager attn softcap & window | fix flex backend window | flex attn & eager backends for DA | eager backend for GDN | refactor GDN variables b914c22 alexandretl commited on Sep 23, 2025
revert vLLM modifications (separate head mult back to lm_head) 4e57133 alexandretl commited on Sep 22, 2025
manual automap for DragonModel | vLLM compat (alpha head in model, persistant=False, contiguous for conv1d, max pos hardcoded for now) 58a7542 alexandretl commited on Sep 10, 2025