YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
cayley-10b-k8-2l-mlp_in
205M-param GPT with 2-level CayleySAE at mlp_in. Trained on FineWeb-Edu-10B for 16k iters (~10.5B tokens). Reaches 3.228 CE val loss.
CayleySAE config
| Level | n | k | delta | Features |
|---|---|---|---|---|
| L0 | 10 | 8 | 0 (root) | 1,024 |
| L1 | 13 | 16 | 64 | 8,192 |
- Total features per layer: 9,216
- Active per token: 24
per_parent_budget=True,score_standardize=True- Location:
mlp_in
Training config
optimizer: muon (lr=0.006), adamw (lr=0.006)
lr_schedule: linear_warmdown, warmdown_frac=0.2
batch_size: 40 × 2 GPUs, gradient_accumulation_steps: 8 (effective batch: 640)
seq_len: 1024, max_iters: 16000
- Downloads last month
- 10
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support