Commit History

ngram embeds | DDL new code + EC | Hyperball (AdamH, AdEMAMixH) | lr_expert
940f633

alexandretl commited on

fix n_kv_heads + pos_id GTPAv2
069edd6

alexandretl commited on

M3 MIMO large output fix (see discussion)
e1d8de1

alexandretl commited on

FA3 fixes | some refactoring
98a21d2

alexandretl commited on

equivalence with MG (updated GTDPA, GDN, M3MIMO ) | some refactoring
87ea3a8

alexandretl commited on

Grouped Differential Attn v2 | GDAv2 for TPA
d4bb0ff

alexandretl commited on

GDN proj same as in MG
d4b5b99

alexandretl commited on

CDP LR expert scaling tests (linear or sqrt, inc) | missing args in config (not critical) |
b690eb3

alexandretl commited on

Differential Attn v2 | proper coord check for MoE | proper init for MoE | some refactoring
2f56a15

alexandretl commited on

attn layer simplification & IDM (for gpt baseline) | local/global removed | CDP coord check | CDP fix lm_head init scaling |
d10acb2

alexandretl commited on

coordcheck v2 (gqa paper) | some fixes relative to previous rehaul
e4fd764

alexandretl commited on

big overhaul (simplication, removal of unused things)
9fd69c3

alexandretl commited on

some prints and imports fixes
6686210

alexandretl commited on

CompletedP | memory+norm logging | proper MoE with ScatterMoE, update bias, Latent-MoE | Muon experiments | VE for Mamba3 | fix torch recompiles during varlen training
b9f197c

alexandretl commited on

Value Embeddings (attn & gdn) | MoE (dont use this one)
79c75e5

alexandretl commited on

alpha normalize ademamix | mamba norms and gate | VWN | wnorm (nemotron-flash) | MG equivalence | fix IDM config saving | CCAv2 | MoBA | reduce lm head
d79da9a

alexandretl commited on

head tying | gated mlp | gate of Mamba3 inside module
3b164a1

alexandretl commited on

mamba3 flags | mamba3 default state size to 128, headdim to 64 | mamba2 | fix mamba3 mimo (JG) | (fake) moe | intra doc maskiiiing (with SS) | seednorm tests | coord checks
58b82e2

alexandretl commited on

MLA | KDA | TPA | GDA | ResFormer | Mamba3 | DragonMimo (WIP) | tokenshift | SeeDNorm | shrink DA/GDN | gate shared across all block types |
bc8288b

alexandretl commited on

CCE | Gate attn | ZCG | RoPE GDN | GQA GDN | uniconv GDN ||CCA | NSA | PLT (not tested) | DMA fix | SWR
959cbe5

alexandretl commited on

Update training_dragon.py
8c51dce
verified

alexandretl commited on

fix massi bugs prod
9dde504
verified

alexandretl commited on

three conv (for now), works with main MG branch
6296289

alexandretl commited on

uscaling training | resume training | slw training | eval loss training | fix data offset training | DSA & DMA (testing) | Qwen3Next-like arch
19e6554

alexandretl commited on

fixes+refactoring
b94a4d0

alexandretl commited on

zero centered gamma for norm | proper qkv proj in GDN (tp aware, same as MG) | training script (wip)
bd7b3d1

alexandretl commited on

revamp GDN cache (as QwenNext) & conv1d
4f09326

alexandretl commited on

refactor backends selection | fix eager attn softcap & window | fix flex backend window | flex attn & eager backends for DA | eager backend for GDN | refactor GDN variables
b914c22

alexandretl commited on

flex attn backend for ATTN (tested) [ inc
9872c32

alexandretl commited on

revert vLLM modifications (separate head mult back to lm_head)
4e57133

alexandretl commited on

merged three convolutions of GDN (test on PIQA&SWDE)
54fbeee

alexandretl commited on

merged in_proj and gate_proj of GDN (tested on PIQA)
c7717bf

alexandretl commited on

Update configuration_dragon.py
a745443
verified

alexandretl commited on

max pos embeddings
9053077

alexandretl commited on

manual automap for DragonModel | vLLM compat (alpha head in model, persistant=False, contiguous for conv1d, max pos hardcoded for now)
58a7542

alexandretl commited on

diff attn backend FA works (eager, no)
2db3d5e

alexandretl commited on

diff attn FA2+FA3+eager backends (WIP)
92fd2b1

jgcb00 commited on

removed loss reduction mean
2aa79d5

jgcb00 commited on

added num_hidden_layers in config for generate function
b2ec0a2

jgcb00 commited on

old LNS variant to be compatible with old MG (<uscaling)
32add9f

jgcb00 commited on

nothing
d25bd9e

jgcb00 commited on

updated rope (uses position_ids) | updated scalable softmax (uses position_ids) | refactoring | gitignore
c1b5b67

alexandretl commited on

relative import
6d43fc3

jgcb00 commited on

correct kv cache management (trim after) | register for auto class
f608fa3

jgcb00 commited on

slw size in config | updated cache with SLW | fix gdn scalar (still needs refactoring)
8713903

jgcb00 commited on

swa (for local and diff)
32f0abb

jgcb00 commited on