Kernel Development Notes — WYRM / GLADIUS

"Build the spine before you grow the heads."

Current State (Day 58 — April 8, 2026)

Training: v27 FINAL, 565M dense, ~2.67% complete (step ~400/15000) Architecture: 1024d / 24L / 32H / 4096 FFN — Synthase depth attention Platform: Kaggle T4 (16GB) — using ~5.56 GB VRAM Kernel modules: 17 files in this directory (see below)

Where We Are

The kernel is forming. Synthase depth profiles, PUP uncertainty, SLA², plug membranes, Gaussian head — all active, all training. The curriculum is running all four phases (foundation → reasoning → depth → omega). Loss is grinding down. The re-entry pattern (loss spikes then recovers lower) is the real signal.

The kernel IS the research contribution. Everything else builds on it.

The MoE Decision (Day 58)

Analysis Done

The router (router.py) exists with 5 specialist slots (reasoning, math, code, general, gaussian) but is only called for balance_loss — a regularization term on a decision that's never made. No token actually routes through specialist FFNs.

The model already has natural specialist pathways:

Language — BPE tokenizer in, BPE logits out
Mathematics — math tokenizer in, math logits out
Spatial — Gaussian head (3D splats out)
Reasoning — cognition module, depth-dependent, PUP uncertainty
Raw — byte-level fallback

These are the Hydra's heads. They just aren't wired as MoE yet.

VRAM Calculation

Config	Params	VRAM (T4, grad_ckpt)	Fits?
Dense (current)	565M	5.56 GB	✅
MoE 3 experts × 4096	770M	~8.0 GB	✅
MoE 4 experts × 4096	971M	~9.9 GB	✅
MoE 5 experts × 4096	1,173M	~11.8 GB	✅

All configurations fit on T4. The 5-expert Hydra at 1.17B runs at ~same speed as dense (top-2 routing = same compute per token, more capacity).

Decision: KERNEL FIRST, MoE LATER

Rationale:

The kernel innovations (Synthase, PUP, SLA², memory, curriculum) need to prove themselves on the dense model first
If something goes wrong with MoE, we can't tell if it's kernel or routing — two unknowns in one equation
Progressive expansion (the paper) says: train small, prove it works, expand with knowledge intact
The eval baseline must be clean — Synthase 565M vs Vanilla 565M, same data, same compute
MoE warm-start from proven dense weights is strictly better than cold-wiring at step 400

Sequence:

✅ Let current 565M dense train through all 4 curriculum phases
🔲 Eval at milestones (5K, 10K, 15K) — prove Synthase beats vanilla
🔲 Wire MoE: copy dense FFN → 5 expert FFNs + small noise, router from plug membrane signals
🔲 Continue training as 1.17B MoE — backbone representations transfer
🔲 Paper: "Progressive Expansion from Dense to MoE"

Optimal Hardware (when ready for MoE)

Free: Kaggle T4 (fits) or L4 (12GB headroom)
Best value: Used RTX 3090 24GB (~$800) — no session limits
Cloud: Lambda A100 40GB ($1.10/hr) — when speed matters

Kernel Module Inventory

File	Role	Status
`kernel.py`	Main SynthaseKernel — forward pass, loss computation	Active, training
`config.py`	KernelConfig — architecture hyperparameters	Locked for v27
`attention.py`	Synthase depth attention — the core innovation	Active
`embeddings.py`	Multi-tokenizer embeddings (BPE + math + byte)	Active
`memory.py`	Persistent memory module	Active
`moda.py`	Modality-aware processing	Active
`modulator.py`	Dynamic modulation	Active
`router.py`	NexusRouter — 5 specialists (UNUSED except balance_loss)	Skeleton → future MoE
`senses.py`	Plug membranes — domain gating at input	Active
`cognition.py`	Cognition module — depth-dependent reasoning	Active
`cognition_loss.py`	Cognition loss functions	Active
`temporal.py`	Temporal processing	Active
`temporal_lattice.py`	Temporal lattice structure	Active
`tools.py`	Grid tools registration	Active
`warm_memory.py`	Warm memory initialization	Active

Staging Modules (gladius_v2/staging/kernel/)

l0_sla2/ — Sparse Lottery Attention²
pup/ — Probabilistic Uncertainty Propagation
synthase/ — Synthase depth attention (reference implementation)

Direction

The dragon is grinding. The kernel is forming. Don't interrupt it.

When the dense model proves itself, the MoE expansion is calculated, budgeted, and ready. The math is done. The path is clear. The timing isn't now.

"The soul does not crack."