Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
alexandretl
/
dragon
like
0
Model card
Files
Files and versions
xet
Community
main
dragon
/
optimizers
Ctrl+K
Ctrl+K
2 contributors
History:
4 commits
alexandretl
ngram embeds | DDL new code + EC | Hyperball (AdamH, AdEMAMixH) | lr_expert
940f633
3 months ago
Ademamix.py
Safe
6.87 kB
CompletedP | memory+norm logging | proper MoE with ScatterMoE, update bias, Latent-MoE | Muon experiments | VE for Mamba3 | fix torch recompiles during varlen training
4 months ago
Snoo.py
Safe
2.42 kB
MLA | KDA | TPA | GDA | ResFormer | Mamba3 | DragonMimo (WIP) | tokenshift | SeeDNorm | shrink DA/GDN | gate shared across all block types |
6 months ago
__init__.py
Safe
54 Bytes
MLA | KDA | TPA | GDA | ResFormer | Mamba3 | DragonMimo (WIP) | tokenshift | SeeDNorm | shrink DA/GDN | gate shared across all block types |
6 months ago
adamh.py
Safe
2.74 kB
ngram embeds | DDL new code + EC | Hyperball (AdamH, AdEMAMixH) | lr_expert
3 months ago
ademamixh.py
Safe
7.99 kB
ngram embeds | DDL new code + EC | Hyperball (AdamH, AdEMAMixH) | lr_expert
3 months ago
muon_modded.py
Safe
4.79 kB
CompletedP | memory+norm logging | proper MoE with ScatterMoE, update bias, Latent-MoE | Muon experiments | VE for Mamba3 | fix torch recompiles during varlen training
4 months ago
newton_schulz_triton.py
Safe
11.6 kB
CompletedP | memory+norm logging | proper MoE with ScatterMoE, update bias, Latent-MoE | Muon experiments | VE for Mamba3 | fix torch recompiles during varlen training
4 months ago
normuon.py
Safe
25.4 kB
CompletedP | memory+norm logging | proper MoE with ScatterMoE, update bias, Latent-MoE | Muon experiments | VE for Mamba3 | fix torch recompiles during varlen training
4 months ago
opt_utils.py
Safe
4.82 kB
CompletedP | memory+norm logging | proper MoE with ScatterMoE, update bias, Latent-MoE | Muon experiments | VE for Mamba3 | fix torch recompiles during varlen training
4 months ago