YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

cayley-10b-k16-3l-mlp_in

Trained by @amack (aemack-org). Originally uploaded as aemack-org/cayley-10b (commit 80734cf58094f8b888e732abf405bd100e1476d2, since replaced with a different architecture).

Reaches 3.173 CE val loss after 16k iters (~10.5B tokens on FineWeb-Edu-10B).

Architecture

  • 205M-param GPT (12 layers, 8 heads, d=1024)
  • CayleySAE inserted at mlp_in in every block (3-level hierarchy)

CayleySAE config

Level n k delta Features
L0 10 16 0 (root) 1,024
L1 13 32 64 8,192
L2 16 64 64 65,536
  • Total features per layer: 74,752 (73x overcomplete)
  • Active per token: 112
  • per_parent_budget=True, score_standardize=True
  • Location: mlp_in (RMSNorm โ†’ CayleySAE โ†’ MLP in each block)

Training config

optimizer: muon (lr=0.006), adamw (lr=0.006)
lr_schedule: linear_warmdown, warmdown_frac=0.2
batch_size: 80, seq_len: 1024, gradient_accumulation_steps: 1
max_iters: 16000
Downloads last month
9
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support