Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
alexandretl
/
dragon
like
0
Model card
Files
Files and versions
xet
Community
d79da9a
dragon
406 kB
Ctrl+K
Ctrl+K
2 contributors
History:
34 commits
alexandretl
alpha normalize ademamix | mamba norms and gate | VWN | wnorm (nemotron-flash) | MG equivalence | fix IDM config saving | CCAv2 | MoBA | reduce lm head
d79da9a
5 months ago
optimizers
alpha normalize ademamix | mamba norms and gate | VWN | wnorm (nemotron-flash) | MG equivalence | fix IDM config saving | CCAv2 | MoBA | reduce lm head
5 months ago
.gitattributes
Safe
1.52 kB
initial commit
8 months ago
.gitignore
Safe
54 Bytes
CCE | Gate attn | ZCG | RoPE GDN | GQA GDN | uniconv GDN ||CCA | NSA | PLT (not tested) | DMA fix | SWR
6 months ago
__init__.py
Safe
0 Bytes
fixes+refactoring
7 months ago
compute_loss.py
Safe
16.1 kB
alpha normalize ademamix | mamba norms and gate | VWN | wnorm (nemotron-flash) | MG equivalence | fix IDM config saving | CCAv2 | MoBA | reduce lm head
5 months ago
configuration_dragon.py
Safe
15.9 kB
alpha normalize ademamix | mamba norms and gate | VWN | wnorm (nemotron-flash) | MG equivalence | fix IDM config saving | CCAv2 | MoBA | reduce lm head
5 months ago
coordcheck_utils.py
Safe
20 kB
mamba3 flags | mamba3 default state size to 128, headdim to 64 | mamba2 | fix mamba3 mimo (JG) | (fake) moe | intra doc maskiiiing (with SS) | seednorm tests | coord checks
6 months ago
coordchecking_dragon.py
Safe
4.76 kB
mamba3 flags | mamba3 default state size to 128, headdim to 64 | mamba2 | fix mamba3 mimo (JG) | (fake) moe | intra doc maskiiiing (with SS) | seednorm tests | coord checks
6 months ago
inspecting_dragon.py
Safe
12.3 kB
mamba3 flags | mamba3 default state size to 128, headdim to 64 | mamba2 | fix mamba3 mimo (JG) | (fake) moe | intra doc maskiiiing (with SS) | seednorm tests | coord checks
6 months ago
modeling_dragon.py
Safe
271 kB
alpha normalize ademamix | mamba norms and gate | VWN | wnorm (nemotron-flash) | MG equivalence | fix IDM config saving | CCAv2 | MoBA | reduce lm head
5 months ago
nsa_utils.py
Safe
18.8 kB
CCE | Gate attn | ZCG | RoPE GDN | GQA GDN | uniconv GDN ||CCA | NSA | PLT (not tested) | DMA fix | SWR
6 months ago
training_dragon.py
Safe
36.7 kB
alpha normalize ademamix | mamba norms and gate | VWN | wnorm (nemotron-flash) | MG equivalence | fix IDM config saving | CCAv2 | MoBA | reduce lm head
5 months ago