Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
alexandretl
/
dragon
like
0
Model card
Files
Files and versions
xet
Community
Copy to bucket
new
main
dragon
/
optimizers
66.6 kB
Ctrl+K
Ctrl+K
2 contributors
History:
4 commits
alexandretl
ngram embeds | DDL new code + EC | Hyperball (AdamH, AdEMAMixH) | lr_expert
940f633
5 months ago
Ademamix.py
Safe
6.87 kB
CompletedP | memory+norm logging | proper MoE with ScatterMoE, update bias, Latent-MoE | Muon experiments | VE for Mamba3 | fix torch recompiles during varlen training
6 months ago
Snoo.py
Safe
2.42 kB
MLA | KDA | TPA | GDA | ResFormer | Mamba3 | DragonMimo (WIP) | tokenshift | SeeDNorm | shrink DA/GDN | gate shared across all block types |
8 months ago
__init__.py
Safe
54 Bytes
MLA | KDA | TPA | GDA | ResFormer | Mamba3 | DragonMimo (WIP) | tokenshift | SeeDNorm | shrink DA/GDN | gate shared across all block types |
8 months ago
adamh.py
Safe
2.74 kB
ngram embeds | DDL new code + EC | Hyperball (AdamH, AdEMAMixH) | lr_expert
5 months ago
ademamixh.py
Safe
7.99 kB
ngram embeds | DDL new code + EC | Hyperball (AdamH, AdEMAMixH) | lr_expert
5 months ago
muon_modded.py
Safe
4.79 kB
CompletedP | memory+norm logging | proper MoE with ScatterMoE, update bias, Latent-MoE | Muon experiments | VE for Mamba3 | fix torch recompiles during varlen training
6 months ago
newton_schulz_triton.py
Safe
11.6 kB
CompletedP | memory+norm logging | proper MoE with ScatterMoE, update bias, Latent-MoE | Muon experiments | VE for Mamba3 | fix torch recompiles during varlen training
6 months ago
normuon.py
Safe
25.4 kB
CompletedP | memory+norm logging | proper MoE with ScatterMoE, update bias, Latent-MoE | Muon experiments | VE for Mamba3 | fix torch recompiles during varlen training
6 months ago
opt_utils.py
Safe
4.82 kB
CompletedP | memory+norm logging | proper MoE with ScatterMoE, update bias, Latent-MoE | Muon experiments | VE for Mamba3 | fix torch recompiles during varlen training
6 months ago