markhenry
/

cayley-10b-k8-3l-mlp_in

Model card Files Files and versions

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

cayley-10b-halfk

CayleySAE GPT trained with half-k topology: k=8/16/32 per level instead of the standard k=16/32/64.

Architecture

12 layers, 8 heads, d=1024 (~205M params)
CayleySAE at mlp_in: L0 (1024, k=8) → L1 (8192, k=16) → L2 (65536, k=32)
Trained on FineWeb-Edu-10B for 16k iters

Results

Best val loss: 3.1816
Compare: cayley-10b (standard k) val loss 3.173

Downloads last month: 1

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support