YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
cayley-10b-halfk
CayleySAE GPT trained with half-k topology: k=8/16/32 per level instead of the standard k=16/32/64.
Architecture
- 12 layers, 8 heads, d=1024 (~205M params)
- CayleySAE at mlp_in: L0 (1024, k=8) โ L1 (8192, k=16) โ L2 (65536, k=32)
- Trained on FineWeb-Edu-10B for 16k iters
Results
- Best val loss: 3.1816
- Compare: cayley-10b (standard k) val loss 3.173
- Downloads last month
- 1
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support