Prince-111M
Prince is an attention-free language model. No attention of any kind.
Built by The Entropy Space AI.
Architecture
x โ RMSNorm โ [EMA โ DeltaRule] โ concat โ Linear(2DโD) โ Focus โ + x
โ RMSNorm โ FFN (SwiGLU) โ + x
| Component | Role |
|---|---|
| EMA | Local smoothing (learnable per-channel decay, O(d) params) |
| DeltaRule | Associative memory โ token retrieval via fast weights |
| Focus | Mamba selective SSM โ temporal flow, long-range context |
| SwiGLU FFN | Channel mixing |
Properties
- Training: O(ndยทlog n) parallel associative scan
- Inference: O(d) per token โ fixed state, no KV cache
- Hardware agnostic: pure JAX/Flax NNX
Config
| Params | 105.9M |
| Layers | 8 |
| d_model | 512 |
| d_state | 64 |
| Vocab | 151665 |
| Tokenizer | Qwen2.5 |
Training
Dataset: FineWeb-Edu 1B tokens | TPU v5e-8 | 50K steps | AdamW lr=3e-4
- Downloads last month
- 19
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support