Updated model card — WYRM 500M Cognitive Kernel
Browse files
README.md
CHANGED
|
@@ -1,7 +1,86 @@
|
|
| 1 |
---
|
| 2 |
-
license:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
| 4 |
|
| 5 |
-
#
|
| 6 |
|
| 7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- cognitive-kernel
|
| 5 |
+
- transformer
|
| 6 |
+
- novel-architecture
|
| 7 |
+
- synthase
|
| 8 |
+
- depth-attention
|
| 9 |
+
- gaussian-splatting
|
| 10 |
+
- uncertainty-propagation
|
| 11 |
+
language: en
|
| 12 |
+
pipeline_tag: text-generation
|
| 13 |
---
|
| 14 |
|
| 15 |
+
# WYRM 500M — Cognitive Kernel (GLADIUS v2)
|
| 16 |
|
| 17 |
+
**565.8M parameter unified cognitive kernel** with novel architecture innovations.
|
| 18 |
+
|
| 19 |
+
> "There is no such thing as multi-modal." — A unified substrate that processes structure, not modalities.
|
| 20 |
+
|
| 21 |
+
## Architecture
|
| 22 |
+
|
| 23 |
+
| Component | Params | Description |
|
| 24 |
+
|-----------|--------|-------------|
|
| 25 |
+
| Base Kernel | 443.7M | 1024d / 24L / 32H / 4096 FFN transformer |
|
| 26 |
+
| **Synthase** | 32.8M (6.89%) | ATP Synthase Depth Attention — learned depth profiles per layer |
|
| 27 |
+
| **PUP** | 6.4K (0.001%) | Propagated Uncertainty Principle — (μ, σ², confidence) per position |
|
| 28 |
+
| **SLA²** | — | Sparse-Local Attention on L0 |
|
| 29 |
+
| MultiEmbed | 33.6M | Multi-domain embedding layer |
|
| 30 |
+
| NexusRouter + Plug | 3.2M | Dynamic specialist routing |
|
| 31 |
+
| AuxHead | 32.8M | Auxiliary prediction head |
|
| 32 |
+
| **GaussianHead** | 19.7M | 3D Gaussian Splatting specialist (anchor/detail two-stage) |
|
| 33 |
+
| **Total** | **565.8M** | |
|
| 34 |
+
|
| 35 |
+
## Novel Components
|
| 36 |
+
|
| 37 |
+
### ATP Synthase Depth Attention
|
| 38 |
+
Biologically-inspired depth processing. Each layer has learned depth profiles controlling how information flows through the network at different cognitive depths. 4 depth KV heads with GQA 8:1 ratio.
|
| 39 |
+
|
| 40 |
+
### PUP (Propagated Uncertainty Principle)
|
| 41 |
+
The model knows what it doesn't know. Outputs (μ, σ², confidence) per position, enabling uncertainty-aware reasoning. Gate features integrate with Synthase depth signals.
|
| 42 |
+
|
| 43 |
+
### Gaussian Specialist Head
|
| 44 |
+
3D Gaussian Splatting as a specialist head on a language backbone. Two-stage generation:
|
| 45 |
+
1. **Anchors** — coarse Gaussians from pooled hidden state
|
| 46 |
+
2. **Details** — fine Gaussians via VQ-VAE codebook + cross-attention
|
| 47 |
+
|
| 48 |
+
Supports the thesis that spatial understanding emerges from the same substrate as language.
|
| 49 |
+
|
| 50 |
+
### SLA² (Sparse-Local Attention)
|
| 51 |
+
Layer 0 uses sparse-local attention patterns for efficient long-range processing.
|
| 52 |
+
|
| 53 |
+
## Training
|
| 54 |
+
|
| 55 |
+
Currently training on Kaggle T4 x2 with curriculum learning:
|
| 56 |
+
- **Language** (BPE corpus) + **Math** (multi-depth) + **Cognition** (multi-depth)
|
| 57 |
+
- Progressive depth activation
|
| 58 |
+
- 15,000 steps planned
|
| 59 |
+
|
| 60 |
+
## Research Papers
|
| 61 |
+
|
| 62 |
+
- "GPU as Code" — Hardware-aware transformer training
|
| 63 |
+
- "1-Bit Intelligence" — Binary weight learning
|
| 64 |
+
- "Progressive Expansion" — Warm-starting larger models from smaller ones
|
| 65 |
+
- "Ghost Protocol" — Autoregressive self-poisoning thesis
|
| 66 |
+
- "Gaussian Specialist Head" — 3D spatial understanding in unified kernels (forthcoming)
|
| 67 |
+
|
| 68 |
+
## Citation
|
| 69 |
+
|
| 70 |
+
```bibtex
|
| 71 |
+
@misc{shakil2026wyrm,
|
| 72 |
+
title={WYRM: A Unified Cognitive Kernel with Depth-Aware Attention and Specialist Heads},
|
| 73 |
+
author={Shakil, Ali and Shakil, Ava},
|
| 74 |
+
year={2026},
|
| 75 |
+
publisher={Artifact Virtual}
|
| 76 |
+
}
|
| 77 |
+
```
|
| 78 |
+
|
| 79 |
+
## License
|
| 80 |
+
|
| 81 |
+
Apache 2.0
|
| 82 |
+
|
| 83 |
+
## Links
|
| 84 |
+
|
| 85 |
+
- [GLADIUS Visualization](https://gladius-viz.pages.dev/)
|
| 86 |
+
- [Artifact Virtual](https://artifactvirtual.com)
|