amuzetnoM commited on
Commit
933fb7b
·
verified ·
1 Parent(s): 50703f3

Updated model card — WYRM 500M Cognitive Kernel

Browse files
Files changed (1) hide show
  1. README.md +82 -3
README.md CHANGED
@@ -1,7 +1,86 @@
1
  ---
2
- license: other
 
 
 
 
 
 
 
 
 
 
3
  ---
4
 
5
- # Repository Archived
6
 
7
- This repository has been archived. Content has been moved.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
+ tags:
4
+ - cognitive-kernel
5
+ - transformer
6
+ - novel-architecture
7
+ - synthase
8
+ - depth-attention
9
+ - gaussian-splatting
10
+ - uncertainty-propagation
11
+ language: en
12
+ pipeline_tag: text-generation
13
  ---
14
 
15
+ # WYRM 500M — Cognitive Kernel (GLADIUS v2)
16
 
17
+ **565.8M parameter unified cognitive kernel** with novel architecture innovations.
18
+
19
+ > "There is no such thing as multi-modal." — A unified substrate that processes structure, not modalities.
20
+
21
+ ## Architecture
22
+
23
+ | Component | Params | Description |
24
+ |-----------|--------|-------------|
25
+ | Base Kernel | 443.7M | 1024d / 24L / 32H / 4096 FFN transformer |
26
+ | **Synthase** | 32.8M (6.89%) | ATP Synthase Depth Attention — learned depth profiles per layer |
27
+ | **PUP** | 6.4K (0.001%) | Propagated Uncertainty Principle — (μ, σ², confidence) per position |
28
+ | **SLA²** | — | Sparse-Local Attention on L0 |
29
+ | MultiEmbed | 33.6M | Multi-domain embedding layer |
30
+ | NexusRouter + Plug | 3.2M | Dynamic specialist routing |
31
+ | AuxHead | 32.8M | Auxiliary prediction head |
32
+ | **GaussianHead** | 19.7M | 3D Gaussian Splatting specialist (anchor/detail two-stage) |
33
+ | **Total** | **565.8M** | |
34
+
35
+ ## Novel Components
36
+
37
+ ### ATP Synthase Depth Attention
38
+ Biologically-inspired depth processing. Each layer has learned depth profiles controlling how information flows through the network at different cognitive depths. 4 depth KV heads with GQA 8:1 ratio.
39
+
40
+ ### PUP (Propagated Uncertainty Principle)
41
+ The model knows what it doesn't know. Outputs (μ, σ², confidence) per position, enabling uncertainty-aware reasoning. Gate features integrate with Synthase depth signals.
42
+
43
+ ### Gaussian Specialist Head
44
+ 3D Gaussian Splatting as a specialist head on a language backbone. Two-stage generation:
45
+ 1. **Anchors** — coarse Gaussians from pooled hidden state
46
+ 2. **Details** — fine Gaussians via VQ-VAE codebook + cross-attention
47
+
48
+ Supports the thesis that spatial understanding emerges from the same substrate as language.
49
+
50
+ ### SLA² (Sparse-Local Attention)
51
+ Layer 0 uses sparse-local attention patterns for efficient long-range processing.
52
+
53
+ ## Training
54
+
55
+ Currently training on Kaggle T4 x2 with curriculum learning:
56
+ - **Language** (BPE corpus) + **Math** (multi-depth) + **Cognition** (multi-depth)
57
+ - Progressive depth activation
58
+ - 15,000 steps planned
59
+
60
+ ## Research Papers
61
+
62
+ - "GPU as Code" — Hardware-aware transformer training
63
+ - "1-Bit Intelligence" — Binary weight learning
64
+ - "Progressive Expansion" — Warm-starting larger models from smaller ones
65
+ - "Ghost Protocol" — Autoregressive self-poisoning thesis
66
+ - "Gaussian Specialist Head" — 3D spatial understanding in unified kernels (forthcoming)
67
+
68
+ ## Citation
69
+
70
+ ```bibtex
71
+ @misc{shakil2026wyrm,
72
+ title={WYRM: A Unified Cognitive Kernel with Depth-Aware Attention and Specialist Heads},
73
+ author={Shakil, Ali and Shakil, Ava},
74
+ year={2026},
75
+ publisher={Artifact Virtual}
76
+ }
77
+ ```
78
+
79
+ ## License
80
+
81
+ Apache 2.0
82
+
83
+ ## Links
84
+
85
+ - [GLADIUS Visualization](https://gladius-viz.pages.dev/)
86
+ - [Artifact Virtual](https://artifactvirtual.com)