Prince-111M

Prince is an attention-free language model. No attention of any kind.

Built by The Entropy Space AI.

Architecture

x โ†’ RMSNorm โ†’ [EMA โ€– DeltaRule] โ†’ concat โ†’ Linear(2Dโ†’D) โ†’ Focus โ†’ + x
  โ†’ RMSNorm โ†’ FFN (SwiGLU) โ†’ + x
Component Role
EMA Local smoothing (learnable per-channel decay, O(d) params)
DeltaRule Associative memory โ€” token retrieval via fast weights
Focus Mamba selective SSM โ€” temporal flow, long-range context
SwiGLU FFN Channel mixing

Properties

  • Training: O(ndยทlog n) parallel associative scan
  • Inference: O(d) per token โ€” fixed state, no KV cache
  • Hardware agnostic: pure JAX/Flax NNX

Config

Params 105.9M
Layers 8
d_model 512
d_state 64
Vocab 151665
Tokenizer Qwen2.5

Training

Dataset: FineWeb-Edu 1B tokens | TPU v5e-8 | 50K steps | AdamW lr=3e-4

Downloads last month
19
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support