Prince-111M

Prince is an attention-free language model. No attention of any kind.

Architecture

x → RMSNorm → [EMA ‖ DeltaRule] → concat → Linear(2D→D) → Focus → + x
  → RMSNorm → FFN (SwiGLU) → + x

Component	Role
EMA	Local smoothing (learnable per-channel decay, O(d) params)
DeltaRule	Associative memory — token retrieval via fast weights
Focus	Mamba selective SSM — temporal flow, long-range context
SwiGLU FFN	Channel mixing

Dataset: FineWeb-Edu 1B tokens | TPU v5e-8 | 50K steps | AdamW lr=3e-4

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support