metadata
license: mit
tags:
- gladius
- wyrm
- cognitive-kernel
- transformer
- research
- synthase
pipeline_tag: text-generation
GLADIUS β Cognitive Kernel
Generalized Learning Architecture for Distributed Unified Systems
A novel transformer architecture featuring Synthase depth attention, specialist routing, and uncertainty propagation.
Organization: Artifact Virtual
WYRM 476M β Current Model
| Property | Value |
|---|---|
| Parameters | 476M (1024d / 24L / 32H / 4096 FFN) |
| Architecture | Synthase (MoDA depth attention + SLAΒ² specialists) |
| Training | Kaggle T4 (16GB VRAM) |
| Corpus | 9GB scientific text (QM, GR, calculus, topology, genomics, arXiv) |
| Status | Training β v29 @ step 2,241 / 15,000 |
Key Innovations
- Synthase Depth Attention: Biologically-inspired depth profiles replacing standard attention
- SLAΒ² Specialists: Domain-specific routing (language, math, cognition, gaussian)
- PUP: Propagated Uncertainty Principle β the model knows what it doesn't know
- Memory System: Warm + cold memory for persistent context
Repository Structure
βββ kernel/ # Core architecture (Synthase kernel)
β βββ kernel.py # Main transformer
β βββ attention.py # Attention mechanisms
β βββ moda.py # Mixture of Depth Attention
β βββ cognition.py # Cognition module
β βββ router.py # Specialist routing
β βββ memory.py # Memory system
β βββ gaussian_head/ # Gaussian prediction head
β βββ ...
β
βββ training/ # Training scripts
β βββ wyrm_notebook_v29.py # Current Kaggle training notebook
β βββ wyrm_notebook_v27_FINAL.py
β βββ omega_config.py # Omega specialist configuration
β βββ train_v5.py # Earlier training script
β
βββ tokenizers/ # Custom tokenizer suite
β βββ bytecode_tokenizer.py
β βββ byte_tokenizer.py
β βββ math_tokenizer.py
β βββ grid_tokens.py
β
βββ staging/ # Features in development
β βββ synthase/ # Synthase attention surgery
β βββ pup/ # PUP uncertainty head
β βββ l0_sla2/ # L0-regularized specialists
β
βββ extensions/ # Architecture extensions
β βββ plug/ # Multi-agent tool integration
β βββ memory_v2/ # Advanced memory pipeline
β
βββ eval/ # Evaluation scripts
βββ experiments/ # Experiment logs (001-006)
βββ telemetry/ # Training telemetry (v28, v29)
βββ papers/ # Research papers
β
βββ MODEL_CARD.md # Full model card
βββ GLADIUS_TRAJECTORY.md # Architecture evolution history
βββ SCALING_ANALYSIS.md # Scaling analysis & expansion plan
βββ ROADMAP.md # Development roadmap
Research Papers
| Paper | Topic |
|---|---|
| GPU as Code | Hardware is algorithmic β treating GPU as programmable substrate |
| 1-Bit Intelligence | Binary weight learning β gradients are optional |
| Progressive Expansion | Function-preserving growth from 6.9M to 141M+ parameters |
| Cell Division | Architecture scaling via biological cell division |
| PUP | Propagated Uncertainty Principle β reasoning under known unknowns |
| Gaussian Head | Gaussian prediction head for continuous-domain tasks |
Training Telemetry
Raw step-by-step telemetry from training runs:
telemetry/wyrm-v28-telemetry-1188.jsonlβ v28 (procedural data, 1188 steps)telemetry/wyrm-v29-telemetry-1060.jsonlβ v29 (real corpus, 1060 steps)
Artifact Virtual β 2026