YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

cayley-10b-halfk

CayleySAE GPT trained with half-k topology: k=8/16/32 per level instead of the standard k=16/32/64.

Architecture

  • 12 layers, 8 heads, d=1024 (~205M params)
  • CayleySAE at mlp_in: L0 (1024, k=8) โ†’ L1 (8192, k=16) โ†’ L2 (65536, k=32)
  • Trained on FineWeb-Edu-10B for 16k iters

Results

  • Best val loss: 3.1816
  • Compare: cayley-10b (standard k) val loss 3.173
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support