Gladius / README.md
ava-shakil's picture
Clean repo: remove non-research content, fix paths, update docs
6bec07e
metadata
license: mit
tags:
  - gladius
  - wyrm
  - cognitive-kernel
  - transformer
  - research
  - synthase
pipeline_tag: text-generation

GLADIUS β€” Cognitive Kernel

Generalized Learning Architecture for Distributed Unified Systems

A novel transformer architecture featuring Synthase depth attention, specialist routing, and uncertainty propagation.

Organization: Artifact Virtual


WYRM 476M β€” Current Model

Property Value
Parameters 476M (1024d / 24L / 32H / 4096 FFN)
Architecture Synthase (MoDA depth attention + SLAΒ² specialists)
Training Kaggle T4 (16GB VRAM)
Corpus 9GB scientific text (QM, GR, calculus, topology, genomics, arXiv)
Status Training β€” v29 @ step 2,241 / 15,000

Key Innovations

  • Synthase Depth Attention: Biologically-inspired depth profiles replacing standard attention
  • SLAΒ² Specialists: Domain-specific routing (language, math, cognition, gaussian)
  • PUP: Propagated Uncertainty Principle β€” the model knows what it doesn't know
  • Memory System: Warm + cold memory for persistent context

Repository Structure

β”œβ”€β”€ kernel/                  # Core architecture (Synthase kernel)
β”‚   β”œβ”€β”€ kernel.py            # Main transformer
β”‚   β”œβ”€β”€ attention.py         # Attention mechanisms
β”‚   β”œβ”€β”€ moda.py              # Mixture of Depth Attention
β”‚   β”œβ”€β”€ cognition.py         # Cognition module
β”‚   β”œβ”€β”€ router.py            # Specialist routing
β”‚   β”œβ”€β”€ memory.py            # Memory system
β”‚   β”œβ”€β”€ gaussian_head/       # Gaussian prediction head
β”‚   └── ...
β”‚
β”œβ”€β”€ training/                # Training scripts
β”‚   β”œβ”€β”€ wyrm_notebook_v29.py # Current Kaggle training notebook
β”‚   β”œβ”€β”€ wyrm_notebook_v27_FINAL.py
β”‚   β”œβ”€β”€ omega_config.py      # Omega specialist configuration
β”‚   └── train_v5.py          # Earlier training script
β”‚
β”œβ”€β”€ tokenizers/              # Custom tokenizer suite
β”‚   β”œβ”€β”€ bytecode_tokenizer.py
β”‚   β”œβ”€β”€ byte_tokenizer.py
β”‚   β”œβ”€β”€ math_tokenizer.py
β”‚   └── grid_tokens.py
β”‚
β”œβ”€β”€ staging/                 # Features in development
β”‚   β”œβ”€β”€ synthase/            # Synthase attention surgery
β”‚   β”œβ”€β”€ pup/                 # PUP uncertainty head
β”‚   └── l0_sla2/             # L0-regularized specialists
β”‚
β”œβ”€β”€ extensions/              # Architecture extensions
β”‚   β”œβ”€β”€ plug/                # Multi-agent tool integration
β”‚   └── memory_v2/           # Advanced memory pipeline
β”‚
β”œβ”€β”€ eval/                    # Evaluation scripts
β”œβ”€β”€ experiments/             # Experiment logs (001-006)
β”œβ”€β”€ telemetry/               # Training telemetry (v28, v29)
β”œβ”€β”€ papers/                  # Research papers
β”‚
β”œβ”€β”€ MODEL_CARD.md            # Full model card
β”œβ”€β”€ GLADIUS_TRAJECTORY.md    # Architecture evolution history
β”œβ”€β”€ SCALING_ANALYSIS.md      # Scaling analysis & expansion plan
└── ROADMAP.md               # Development roadmap

Research Papers

Paper Topic
GPU as Code Hardware is algorithmic β€” treating GPU as programmable substrate
1-Bit Intelligence Binary weight learning β€” gradients are optional
Progressive Expansion Function-preserving growth from 6.9M to 141M+ parameters
Cell Division Architecture scaling via biological cell division
PUP Propagated Uncertainty Principle β€” reasoning under known unknowns
Gaussian Head Gaussian prediction head for continuous-domain tasks

Training Telemetry

Raw step-by-step telemetry from training runs:

  • telemetry/wyrm-v28-telemetry-1188.jsonl β€” v28 (procedural data, 1188 steps)
  • telemetry/wyrm-v29-telemetry-1060.jsonl β€” v29 (real corpus, 1060 steps)

Artifact Virtual β€” 2026