You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

GLADIUS v2 β€” Intelligence Kernel

A novel AI architecture built from first principles. Not a fine-tune. Not a wrapper. A new kind of machine.

Artifact Virtual
Independent AI Research


What Is This?

GLADIUS v2 is an intelligence kernel β€” a unified architecture that integrates persistent memory, self-initiated cognition, temporal awareness, and language modulation into a single differentiable system. It is designed from scratch, not derived from any existing model family.

This repository contains the architecture specification, training results, and research philosophy. No model weights are distributed at this time.

Key Innovation: Memory as a Core Organ

Current language models have amnesia. Every session starts from the same weights. "Memory" is faked by injecting text into context windows β€” a brittle, expensive, lossy hack.

GLADIUS treats memory as a first-class architectural component with three temperature levels:

Temperature Mechanism Lifetime Analogy
Hot Learned key-value cache Current session Sensory buffer
Warm Spectral low-rank projections in weights Persists across restarts Working memory
Cold External vector database Permanent Long-term archive

Knowledge consolidates from hot β†’ warm β†’ cold through learned gating, analogous to biological memory consolidation during sleep.


Architecture at a Glance

╔════════════════════════ GLADIUS KERNEL ═══════════════════╗
β•‘                                                           β•‘
β•‘  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β•‘
β•‘  β”‚              MEMORY LAYER (Substrate)                β”‚ β•‘
β•‘  β”‚        Hot (KV) ↔ Warm (Spectral) ↔ Cold (VDB)     β”‚ β•‘
β•‘  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β•‘
β•‘                         β”‚                                 β•‘
β•‘  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β•‘
β•‘  β”‚COGNITION β”‚ TEMPORAL  β”‚   MoE     β”‚MODULATOR β”‚         β•‘
β•‘  β”‚  LOOP    β”‚ ENGINE    β”‚  ROUTER   β”‚ (Voice)  β”‚         β•‘
β•‘  β”‚(Heartbeatβ”‚ (Clock)   β”‚           β”‚          β”‚         β•‘
β•‘  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β•‘
β•‘                                                           β•‘
β•‘  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β•‘
β•‘  β”‚  SLAΒ² β€” Scaled Linear Attention w/ Adaptive Alloc.  β”‚  β•‘
β•‘  β”‚  Ξ± βŠ™ softmax(sparse) + (1-Ξ±) βŠ™ linear(dense)      β”‚  β•‘
β•‘  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

Core Components

Component Function Key Idea
SLAΒ² Custom attention mechanism Hybrid sparse-softmax + dense-linear attention with learned routing. O(n) for most tokens, O(nΒ·k) precision where it matters.
Three-Temperature Memory Persistent knowledge Hot (session KV), Warm (spectral projections surviving restart), Cold (vector retrieval)
Spectral Warm Memory Knowledge in weights Learned low-rank projections that encode experience directly into model parameters. 57,501 autonomous consolidation events during training.
Cognitive Heartbeat Self-initiated thinking Variable-rate inference loop: active (100ms) β†’ monitoring (1s) β†’ reflective (5s) β†’ dormant (30s). The kernel thinks without being asked.
Language Modulator Output control Register (formal↔casual), intent (inform/persuade/comfort/challenge), and a silence gate β€” the architectural ability to choose not to speak.
Temporal Awareness Engine Time perception Dual-clock encoding (absolute wall-time + relative event-anchored), injected additively at every layer. The kernel knows when, not just where.
MoE Routing Specialist dispatch Kernel-level mixture-of-experts routing to hot-swappable specialist modules.

Full kernel: 1,854 lines of Python across 12 files.


Training Results β€” Phoenix Ultimate (100K Steps)

Configuration

Parameter Value
Total parameters 6,939,550 (6.94M)
Hidden dimension 192
Layers 6
Attention heads 6
FFN dimension 768
Sequence length 256
Vocabulary 16,000 BPE tokens
Warm memory rank 12
Corpus ~50M tokens (literary, technical, philosophical)
Hardware Intel i3-1005G1, 4 cores, 16GB RAM, no GPU
Training time 17.7 hours
Speed 1.57 steps/sec

Loss Trajectory

Stage Steps Loss Start Loss End Notes
Dragon 5,000 5.75 4.21 Initial convergence, architecture validation
Phoenix 10,000 4.21 3.15 Warm memory activation, first consolidation events
Phoenix Ultimate 100,000 3.15 2.10 Extended training, 57,501 warm memory consolidations
Best checkpoint β€” β€” 1.299 Periodic evaluation minimum

Overall: 5.75 β†’ 2.10 (63.4% reduction)

Warm Memory Behavior

The spectral warm memory system performed 57,501 autonomous consolidation events during training β€” moments where the kernel decided, without external prompting, to transfer knowledge from hot (session) memory into persistent warm (spectral) projections.

This is the core thesis validated: the architecture learns what to remember and when to consolidate, through the same gradient signal that teaches it language.

Generation Samples

At 6.94M parameters, the model demonstrates grammatical structure emergence and topical clustering:

Prompt: "Once upon a time in a land"
Output: "Once upon a time in a land stepped sett render urg impressennen 
         Rock star free travelion drum steel flies running hills live 
         quarters timber operated..."

Prompt: "The fundamental theorem states that"
Output: "The fundamental theorem states that wips modails denurver cons 
         Resun defollow explil constl..."

Honest assessment: At 6.94M parameters, the model produces grammatical scaffolding and domain-appropriate token clustering but not coherent natural language. This is expected and by design β€” GLADIUS v2 is a proof-of-architecture, not a production language model. The purpose is to validate that the kernel components (memory consolidation, attention routing, temporal encoding, modulation) function correctly at small scale before committing to larger parameter counts.

Coherent text generation is a function of scale. Architectural correctness is not. We validated the latter.


πŸ“Š Evaluation Results

Industry-standard benchmarks on the Phoenix Ultimate checkpoint:

Metric Value
Held-out Perplexity 24.18
WikiText-103 PPL 25.79
Top-1 Accuracy 45.77%
Top-5 Accuracy 65.47%
Distinct-2 0.996
Self-BLEU 0.000

Competitive with models 5-17Γ— larger. Full evaluation methodology in EVAL.md.


What Makes This Different

Standard LLMs GLADIUS v2
Memory Context window (session amnesia) Three-temperature persistent memory
Cognition Stimulus β†’ response Continuous self-initiating heartbeat
Time None (sequence position only) Dual-clock temporal encoding
Voice Statistical / prompt-hacked Learned register + intent + silence modulation
Attention O(nΒ²) softmax everywhere SLAΒ² hybrid (sparse softmax + dense linear)
Decision Next-token prediction Argmax over structured candidate sets
Learning Frozen after training Warm memory updates during inference
Hardware GPU-dependent CPU-native by design

Designed for CPU

GLADIUS was deliberately designed and trained on consumer CPU hardware. This is not a limitation β€” it is a thesis: architectural innovation matters more than compute scaling. A system with memory, cognition, time awareness, and language control can exhibit intelligent behavior at parameter counts that fit in CPU cache.

The entire training run (100K steps, 50M tokens) completed in 17.7 hours on an Intel i3-1005G1. No cloud. No GPU cluster. One laptop in Islamabad.


Theoretical Foundation

GLADIUS is grounded in original theoretical work:

The Two-Point Theorem

Two points define a direction. Direction is all you need.

Intelligence is not pattern matching over static data. Intelligence is the ability to observe two sequential states and extract the direction of change. A gradient β€” the foundation of all learning β€” is literally the direction between two points in loss space.

GLADIUS trains on state pairs: (before, after) observations that teach the direction of transformation rather than the memorization of patterns.

The Argmax Principle

Every decision in the kernel β€” what to remember, when to think, which specialist to route to, how to speak, whether to stay silent β€” is an instance of:

xΜ‚ = argmax_{x ∈ C} S(x | context)

This is not a metaphor. It is the literal computational primitive underlying every kernel component. Softmax makes it differentiable. Temperature controls exploration vs. exploitation.

Memory Persistence as the Unsolved Problem

The central challenge in AI is not scale, not reasoning, not multimodality. It is persistent memory without catastrophic forgetting. A system that can learn continuously β€” updating its knowledge through experience without destroying what it already knows β€” would represent a fundamental advance.

GLADIUS's three-temperature memory with spectral warm projections is our approach to this problem. The 57,501 consolidation events during training demonstrate that the mechanism functions. Whether it scales remains an open and honest question.


Repository Contents

File Description
README.md This model card
ARCHITECTURE.md Detailed component descriptions
TRAINING.md Training methodology and results
PHILOSOPHY.md Theoretical foundations
config.json Architecture configuration

What's Not Here (And Why)

  • No model weights. This is a proof-of-architecture at 6.94M parameters. Weights at this scale have limited standalone utility. Scaled checkpoints will be released when they produce coherent generation.
  • No source code. The kernel implementation is proprietary to Artifact Virtual. We share the architecture openly β€” the what and why β€” while retaining the how.
  • No training recipes. Specific hyperparameter schedules, data preprocessing pipelines, and optimization tricks are not disclosed.

We believe in open research, selective disclosure. The architecture is novel and we want the community to engage with the ideas. The implementation is our competitive edge and we protect it accordingly.


Citation

@misc{gladius-v2-2026,
  title={GLADIUS v2: An Intelligence Kernel Architecture with Persistent Spectral Memory, Self-Initiated Cognition, and Temporal Awareness},
  author={Artifact Virtual},
  year={2026},
  url={https://huggingface.co/amuzetnoM/gladius-v2-kernel}
}

Contact

Artifact Virtual (SMC-Private) Limited Islamabad, Pakistan


"There is no such thing as artificial intelligence. It's only artificial till it's on paper."

🧠 Warm Memory Discovery

We found that the warm memory system was learning but couldn't express β€” down_proj weights were zero across all layers. All benchmarks above were achieved with warm memory effectively disabled. Patch applied, Layer 0 now active. Full story in DISCOVERY.md.

Downloads last month
8
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support