amuzetnoM
/

gladius-v2-kernel

Model card Files Files and versions

xet

Community

amuzetnoM commited on Apr 8

Commit

933fb7b

verified ·

1 Parent(s): 50703f3

Updated model card — WYRM 500M Cognitive Kernel

Browse files

Files changed (1) hide show

README.md +82 -3

README.md CHANGED Viewed

@@ -1,7 +1,86 @@
 ---
-license: other
 ---
-# Repository Archived
-This repository has been archived. Content has been moved.

 ---
+license: apache-2.0
+tags:
+  - cognitive-kernel
+  - transformer
+  - novel-architecture
+  - synthase
+  - depth-attention
+  - gaussian-splatting
+  - uncertainty-propagation
+language: en
+pipeline_tag: text-generation
 ---
+# WYRM 500M — Cognitive Kernel (GLADIUS v2)
+**565.8M parameter unified cognitive kernel** with novel architecture innovations.
+> "There is no such thing as multi-modal." — A unified substrate that processes structure, not modalities.
+## Architecture
+| Component | Params | Description |
+|-----------|--------|-------------|
+| Base Kernel | 443.7M | 1024d / 24L / 32H / 4096 FFN transformer |
+| **Synthase** | 32.8M (6.89%) | ATP Synthase Depth Attention — learned depth profiles per layer |
+| **PUP** | 6.4K (0.001%) | Propagated Uncertainty Principle — (μ, σ², confidence) per position |
+| **SLA²** | — | Sparse-Local Attention on L0 |
+| MultiEmbed | 33.6M | Multi-domain embedding layer |
+| NexusRouter + Plug | 3.2M | Dynamic specialist routing |
+| AuxHead | 32.8M | Auxiliary prediction head |
+| **GaussianHead** | 19.7M | 3D Gaussian Splatting specialist (anchor/detail two-stage) |
+| **Total** | **565.8M** | |
+## Novel Components
+### ATP Synthase Depth Attention
+Biologically-inspired depth processing. Each layer has learned depth profiles controlling how information flows through the network at different cognitive depths. 4 depth KV heads with GQA 8:1 ratio.
+### PUP (Propagated Uncertainty Principle)
+The model knows what it doesn't know. Outputs (μ, σ², confidence) per position, enabling uncertainty-aware reasoning. Gate features integrate with Synthase depth signals.
+### Gaussian Specialist Head
+3D Gaussian Splatting as a specialist head on a language backbone. Two-stage generation:
+1. **Anchors** — coarse Gaussians from pooled hidden state
+2. **Details** — fine Gaussians via VQ-VAE codebook + cross-attention
+Supports the thesis that spatial understanding emerges from the same substrate as language.
+### SLA² (Sparse-Local Attention)
+Layer 0 uses sparse-local attention patterns for efficient long-range processing.
+## Training
+Currently training on Kaggle T4 x2 with curriculum learning:
+- **Language** (BPE corpus) + **Math** (multi-depth) + **Cognition** (multi-depth)
+- Progressive depth activation
+- 15,000 steps planned
+## Research Papers
+- "GPU as Code" — Hardware-aware transformer training
+- "1-Bit Intelligence" — Binary weight learning
+- "Progressive Expansion" — Warm-starting larger models from smaller ones
+- "Ghost Protocol" — Autoregressive self-poisoning thesis
+- "Gaussian Specialist Head" — 3D spatial understanding in unified kernels (forthcoming)
+## Citation
+```bibtex
+@misc{shakil2026wyrm,
+  title={WYRM: A Unified Cognitive Kernel with Depth-Aware Attention and Specialist Heads},
+  author={Shakil, Ali and Shakil, Ava},
+  year={2026},
+  publisher={Artifact Virtual}
+}
+```
+## License
+Apache 2.0
+## Links
+- [GLADIUS Visualization](https://gladius-viz.pages.dev/)
+- [Artifact Virtual](https://artifactvirtual.com)