GLADIUS v2 β Intelligence Kernel
A novel AI architecture built from first principles. Not a fine-tune. Not a wrapper. A new kind of machine.
Artifact Virtual
Independent AI Research
What Is This?
GLADIUS v2 is an intelligence kernel β a unified architecture that integrates persistent memory, self-initiated cognition, temporal awareness, and language modulation into a single differentiable system. It is designed from scratch, not derived from any existing model family.
This repository contains the architecture specification, training results, and research philosophy. No model weights are distributed at this time.
Key Innovation: Memory as a Core Organ
Current language models have amnesia. Every session starts from the same weights. "Memory" is faked by injecting text into context windows β a brittle, expensive, lossy hack.
GLADIUS treats memory as a first-class architectural component with three temperature levels:
| Temperature | Mechanism | Lifetime | Analogy |
|---|---|---|---|
| Hot | Learned key-value cache | Current session | Sensory buffer |
| Warm | Spectral low-rank projections in weights | Persists across restarts | Working memory |
| Cold | External vector database | Permanent | Long-term archive |
Knowledge consolidates from hot β warm β cold through learned gating, analogous to biological memory consolidation during sleep.
Architecture at a Glance
βββββββββββββββββββββββββ GLADIUS KERNEL ββββββββββββββββββββ
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β MEMORY LAYER (Substrate) β β
β β Hot (KV) β Warm (Spectral) β Cold (VDB) β β
β ββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββ β
β β β
β ββββββββββββ¬ββββββββββββΌββββββββββββ¬βββββββββββ β
β βCOGNITION β TEMPORAL β MoE βMODULATOR β β
β β LOOP β ENGINE β ROUTER β (Voice) β β
β β(Heartbeatβ (Clock) β β β β
β ββββββββββββ΄ββββββββββββ΄ββββββββββββ΄βββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β SLAΒ² β Scaled Linear Attention w/ Adaptive Alloc. β β
β β Ξ± β softmax(sparse) + (1-Ξ±) β linear(dense) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Core Components
| Component | Function | Key Idea |
|---|---|---|
| SLAΒ² | Custom attention mechanism | Hybrid sparse-softmax + dense-linear attention with learned routing. O(n) for most tokens, O(nΒ·k) precision where it matters. |
| Three-Temperature Memory | Persistent knowledge | Hot (session KV), Warm (spectral projections surviving restart), Cold (vector retrieval) |
| Spectral Warm Memory | Knowledge in weights | Learned low-rank projections that encode experience directly into model parameters. 57,501 autonomous consolidation events during training. |
| Cognitive Heartbeat | Self-initiated thinking | Variable-rate inference loop: active (100ms) β monitoring (1s) β reflective (5s) β dormant (30s). The kernel thinks without being asked. |
| Language Modulator | Output control | Register (formalβcasual), intent (inform/persuade/comfort/challenge), and a silence gate β the architectural ability to choose not to speak. |
| Temporal Awareness Engine | Time perception | Dual-clock encoding (absolute wall-time + relative event-anchored), injected additively at every layer. The kernel knows when, not just where. |
| MoE Routing | Specialist dispatch | Kernel-level mixture-of-experts routing to hot-swappable specialist modules. |
Full kernel: 1,854 lines of Python across 12 files.
Training Results β Phoenix Ultimate (100K Steps)
Configuration
| Parameter | Value |
|---|---|
| Total parameters | 6,939,550 (6.94M) |
| Hidden dimension | 192 |
| Layers | 6 |
| Attention heads | 6 |
| FFN dimension | 768 |
| Sequence length | 256 |
| Vocabulary | 16,000 BPE tokens |
| Warm memory rank | 12 |
| Corpus | ~50M tokens (literary, technical, philosophical) |
| Hardware | Intel i3-1005G1, 4 cores, 16GB RAM, no GPU |
| Training time | 17.7 hours |
| Speed | 1.57 steps/sec |
Loss Trajectory
| Stage | Steps | Loss Start | Loss End | Notes |
|---|---|---|---|---|
| Dragon | 5,000 | 5.75 | 4.21 | Initial convergence, architecture validation |
| Phoenix | 10,000 | 4.21 | 3.15 | Warm memory activation, first consolidation events |
| Phoenix Ultimate | 100,000 | 3.15 | 2.10 | Extended training, 57,501 warm memory consolidations |
| Best checkpoint | β | β | 1.299 | Periodic evaluation minimum |
Overall: 5.75 β 2.10 (63.4% reduction)
Warm Memory Behavior
The spectral warm memory system performed 57,501 autonomous consolidation events during training β moments where the kernel decided, without external prompting, to transfer knowledge from hot (session) memory into persistent warm (spectral) projections.
This is the core thesis validated: the architecture learns what to remember and when to consolidate, through the same gradient signal that teaches it language.
Generation Samples
At 6.94M parameters, the model demonstrates grammatical structure emergence and topical clustering:
Prompt: "Once upon a time in a land"
Output: "Once upon a time in a land stepped sett render urg impressennen
Rock star free travelion drum steel flies running hills live
quarters timber operated..."
Prompt: "The fundamental theorem states that"
Output: "The fundamental theorem states that wips modails denurver cons
Resun defollow explil constl..."
Honest assessment: At 6.94M parameters, the model produces grammatical scaffolding and domain-appropriate token clustering but not coherent natural language. This is expected and by design β GLADIUS v2 is a proof-of-architecture, not a production language model. The purpose is to validate that the kernel components (memory consolidation, attention routing, temporal encoding, modulation) function correctly at small scale before committing to larger parameter counts.
Coherent text generation is a function of scale. Architectural correctness is not. We validated the latter.
π Evaluation Results
Industry-standard benchmarks on the Phoenix Ultimate checkpoint:
| Metric | Value |
|---|---|
| Held-out Perplexity | 24.18 |
| WikiText-103 PPL | 25.79 |
| Top-1 Accuracy | 45.77% |
| Top-5 Accuracy | 65.47% |
| Distinct-2 | 0.996 |
| Self-BLEU | 0.000 |
Competitive with models 5-17Γ larger. Full evaluation methodology in EVAL.md.
What Makes This Different
| Standard LLMs | GLADIUS v2 | |
|---|---|---|
| Memory | Context window (session amnesia) | Three-temperature persistent memory |
| Cognition | Stimulus β response | Continuous self-initiating heartbeat |
| Time | None (sequence position only) | Dual-clock temporal encoding |
| Voice | Statistical / prompt-hacked | Learned register + intent + silence modulation |
| Attention | O(nΒ²) softmax everywhere | SLAΒ² hybrid (sparse softmax + dense linear) |
| Decision | Next-token prediction | Argmax over structured candidate sets |
| Learning | Frozen after training | Warm memory updates during inference |
| Hardware | GPU-dependent | CPU-native by design |
Designed for CPU
GLADIUS was deliberately designed and trained on consumer CPU hardware. This is not a limitation β it is a thesis: architectural innovation matters more than compute scaling. A system with memory, cognition, time awareness, and language control can exhibit intelligent behavior at parameter counts that fit in CPU cache.
The entire training run (100K steps, 50M tokens) completed in 17.7 hours on an Intel i3-1005G1. No cloud. No GPU cluster. One laptop in Islamabad.
Theoretical Foundation
GLADIUS is grounded in original theoretical work:
The Two-Point Theorem
Two points define a direction. Direction is all you need.
Intelligence is not pattern matching over static data. Intelligence is the ability to observe two sequential states and extract the direction of change. A gradient β the foundation of all learning β is literally the direction between two points in loss space.
GLADIUS trains on state pairs: (before, after) observations that teach the direction of transformation rather than the memorization of patterns.
The Argmax Principle
Every decision in the kernel β what to remember, when to think, which specialist to route to, how to speak, whether to stay silent β is an instance of:
xΜ = argmax_{x β C} S(x | context)
This is not a metaphor. It is the literal computational primitive underlying every kernel component. Softmax makes it differentiable. Temperature controls exploration vs. exploitation.
Memory Persistence as the Unsolved Problem
The central challenge in AI is not scale, not reasoning, not multimodality. It is persistent memory without catastrophic forgetting. A system that can learn continuously β updating its knowledge through experience without destroying what it already knows β would represent a fundamental advance.
GLADIUS's three-temperature memory with spectral warm projections is our approach to this problem. The 57,501 consolidation events during training demonstrate that the mechanism functions. Whether it scales remains an open and honest question.
Repository Contents
| File | Description |
|---|---|
README.md |
This model card |
ARCHITECTURE.md |
Detailed component descriptions |
TRAINING.md |
Training methodology and results |
PHILOSOPHY.md |
Theoretical foundations |
config.json |
Architecture configuration |
What's Not Here (And Why)
- No model weights. This is a proof-of-architecture at 6.94M parameters. Weights at this scale have limited standalone utility. Scaled checkpoints will be released when they produce coherent generation.
- No source code. The kernel implementation is proprietary to Artifact Virtual. We share the architecture openly β the what and why β while retaining the how.
- No training recipes. Specific hyperparameter schedules, data preprocessing pipelines, and optimization tricks are not disclosed.
We believe in open research, selective disclosure. The architecture is novel and we want the community to engage with the ideas. The implementation is our competitive edge and we protect it accordingly.
Citation
@misc{gladius-v2-2026,
title={GLADIUS v2: An Intelligence Kernel Architecture with Persistent Spectral Memory, Self-Initiated Cognition, and Temporal Awareness},
author={Artifact Virtual},
year={2026},
url={https://huggingface.co/amuzetnoM/gladius-v2-kernel}
}
Contact
Artifact Virtual (SMC-Private) Limited Islamabad, Pakistan
- GitHub: github.com/Artifact-Virtual
- HuggingFace: huggingface.co/amuzetnoM
"There is no such thing as artificial intelligence. It's only artificial till it's on paper."
π§ Warm Memory Discovery
We found that the warm memory system was learning but couldn't express β down_proj weights were zero across all layers. All benchmarks above were achieved with warm memory effectively disabled. Patch applied, Layer 0 now active. Full story in DISCOVERY.md.
- Downloads last month
- 8