YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
cayley-10b-k16-3l-mlp_in
Trained by @amack (aemack-org). Originally uploaded as aemack-org/cayley-10b
(commit 80734cf58094f8b888e732abf405bd100e1476d2, since replaced with a different architecture).
Reaches 3.173 CE val loss after 16k iters (~10.5B tokens on FineWeb-Edu-10B).
Architecture
- 205M-param GPT (12 layers, 8 heads, d=1024)
- CayleySAE inserted at
mlp_inin every block (3-level hierarchy)
CayleySAE config
| Level | n | k | delta | Features |
|---|---|---|---|---|
| L0 | 10 | 16 | 0 (root) | 1,024 |
| L1 | 13 | 32 | 64 | 8,192 |
| L2 | 16 | 64 | 64 | 65,536 |
- Total features per layer: 74,752 (73x overcomplete)
- Active per token: 112
per_parent_budget=True,score_standardize=True- Location:
mlp_in(RMSNorm โ CayleySAE โ MLP in each block)
Training config
optimizer: muon (lr=0.006), adamw (lr=0.006)
lr_schedule: linear_warmdown, warmdown_frac=0.2
batch_size: 80, seq_len: 1024, gradient_accumulation_steps: 1
max_iters: 16000
- Downloads last month
- 9
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support