Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
alexandretl
/
dragon
like
0
Model card
Files
Files and versions
xet
Community
main
dragon
415 kB
2 contributors
History:
52 commits
alexandretl
SLW end
b5b44c3
2 days ago
optimizers
CompletedP | memory+norm logging | proper MoE with ScatterMoE, update bias, Latent-MoE | Muon experiments | VE for Mamba3 | fix torch recompiles during varlen training
29 days ago
.gitattributes
1.52 kB
initial commit
5 months ago
.gitignore
54 Bytes
CCE | Gate attn | ZCG | RoPE GDN | GQA GDN | uniconv GDN ||CCA | NSA | PLT (not tested) | DMA fix | SWR
3 months ago
__init__.py
0 Bytes
fixes+refactoring
4 months ago
compute_loss.py
16.1 kB
alpha normalize ademamix | mamba norms and gate | VWN | wnorm (nemotron-flash) | MG equivalence | fix IDM config saving | CCAv2 | MoBA | reduce lm head
about 2 months ago
configuration_dragon.py
16.3 kB
DDL
3 days ago
coordcheck_utils.py
20 kB
mamba3 flags | mamba3 default state size to 128, headdim to 64 | mamba2 | fix mamba3 mimo (JG) | (fake) moe | intra doc maskiiiing (with SS) | seednorm tests | coord checks
3 months ago
coordchecking_dragon.py
4.76 kB
mamba3 flags | mamba3 default state size to 128, headdim to 64 | mamba2 | fix mamba3 mimo (JG) | (fake) moe | intra doc maskiiiing (with SS) | seednorm tests | coord checks
3 months ago
inspecting_dragon.py
12.3 kB
mamba3 flags | mamba3 default state size to 128, headdim to 64 | mamba2 | fix mamba3 mimo (JG) | (fake) moe | intra doc maskiiiing (with SS) | seednorm tests | coord checks
3 months ago
modeling_dragon.py
208 kB
DDL
3 days ago
nsa_utils.py
18.8 kB
CCE | Gate attn | ZCG | RoPE GDN | GQA GDN | uniconv GDN ||CCA | NSA | PLT (not tested) | DMA fix | SWR
3 months ago
training_dragon.py
61.5 kB
SLW end
2 days ago