Nord-AI / README.md
zerdovzad's picture
Update README.md
68c9b40 verified
metadata
license: apache-2.0
language:
  - en
pipeline_tag: text-generation
tags:
  - snn
  - spiking-neural-network
  - neuromorphic
  - language-model
  - from-scratch
  - energy-efficient
  - mixture-of-experts
  - brain-inspired

⚑ Nord v4.2 β€” Brain-Inspired Spiking Neural Network Language Model (140M)

The first SNN language model with spike-driven MoE, zonal specialization, and memory cortex β€” trained from scratch.

What's New in v4.2

Nord v4.2 is a complete architectural rebuild from v3. The key breakthrough: the model self-organizes into functionally distinct brain zones during training β€” sensory zones learn low firing rates, executive zones learn high firing rates, with no explicit supervision.

v3 (previous) v4.2 (current)
Parameters 144M 140M
Sparsity 97% (but spikes broken at scale) 91% (spikes working)
MoE None Spike-driven, 4 experts top-2
Memory None 128-neuron cortex, Ο„=0.99
Zonal architecture No Yes (self-organizing)
Loss at 39K steps ~4.9 4.3
Training speed Slower convergence 35% faster to same loss

Model Description

Nord v4.2 is a 140M-parameter Spiking Neural Network (SNN) for text generation. It uses biologically-inspired Leaky Integrate-and-Fire neurons with membrane potentials, firing thresholds, and binary spikes. Unlike transformers where 100% of neurons activate per token, Nord activates only 3-9% β€” with different brain-inspired zones specializing in different functions.

Trained entirely from scratch β€” no transformer teacher, no distillation, no ANN-to-SNN conversion.

Key Features

Feature Details
Parameters 139.9M
Architecture Original brain-inspired zonal SNN
Zones Sensory β†’ Association (MoE) β†’ Memory β†’ Executive
MoE 4 spike-driven experts, top-2 routing
Memory 128 persistent neurons, gated temporal attention
Sparsity 89-95% (dynamic, input-dependent)
Timesteps 10 (8 fast + 2 slow)
Training method Surrogate gradients + spike homeostasis
Training data ~2.2M samples, general English corpus
Training cost ~$15 USD
Online learning STDP available during inference

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Temporal Spike Encoder                       β”‚
β”‚  Token β†’ 8 fast + 2 slow timestep currents    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Sensory Zone (2 blocks)     rates: 8-10%     β”‚
β”‚  Standard FFN + LIF, feature extraction       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Association Zone (2 blocks) rates: 10-14%    β”‚
β”‚  Spike-Driven MoE (4 experts, top-2) + LIF   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Memory Cortex               rates: 0.5-1%    β”‚
β”‚  128 neurons, Ο„=0.99, gated temporal attn     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Executive Zone (2 blocks)   rates: 11-26%    β”‚
β”‚  Standard FFN + LIF, decision & output        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Readout (EMA over membrane potential)        β”‚
β”‚  β†’ LM Head β†’ vocabulary logits                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Components

  • Associative LIF Neurons β€” Learnable membrane time constants, voltage thresholds, synaptic currents, cascade amplification across 64 neural clusters
  • ATan Surrogate Gradient β€” Differentiable spike function for backpropagation
  • Spike-Driven MoE β€” Expert routing based on cluster spike-rate activity, not dense networks
  • Memory Cortex β€” Persistent slow memory with multi-head temporal attention readout
  • Adaptive Spike Regulator β€” Asymmetric homeostasis: penalizes too-low firing 3x more than too-high, anti-death floor at 1%
  • RoPE β€” Rotary position embeddings for sequence position encoding
  • Synaptic Resonance Attention β€” Temporal mixing over spike patterns (not naive flattening)

Model Configuration

d_model: 496
n_heads: 8
n_layers: 6 (2 sensory + 2 association + 2 executive)
d_ff: 1024
n_experts: 4
top_k_experts: 2
memory_size: 128
T_fast: 8, T_slow: 2
max_seq_len: 512
vocab_size: 128,256
tokenizer: Llama-3.2 (meta-llama/Llama-3.2-1B)

Emergent Zonal Specialization

The most significant finding: the model self-organizes functionally distinct zones during standard training. No manual assignment, no hardcoded rates.

Zone              Spike Rate    Biological Analog
─────────────────────────────────────────────────────
Sensory           8-10%         Primary sensory cortex
Association       10-14%        Parietal/temporal cortex
Memory Cortex     0.5-1%        Hippocampus (selective)
Executive [0]     11-15%        Premotor cortex
Executive [1]     22-26%        Prefrontal cortex
─────────────────────────────────────────────────────

This mirrors biological cortical organization where prefrontal cortex has higher baseline activity than sensory cortex.

Training

  • Dataset: ~2.2M text samples, general English corpus
  • Hardware: NVIDIA A5000 24GB (rented on Vast.ai)
  • Optimizer: AdamW (lr=3e-4 β†’ 1e-5 cosine decay, weight_decay=0.01)
  • Batch size: 2 Γ— grad_accum=16 (effective 32)
  • Sequence length: 512

Loss Progression

Step Loss Sparsity LR Event
0 8.9 68% warmup Start
1,500 6.2 69% 3.0e-04 Rapid descent
10,000 4.95 99% 3.0e-04 v4.1 plateau, spikes dying
14,000 7.6β†’5.2 75% 3.0e-04 v4.2 fixes, spike revival
20,000 4.70 91% 3.0e-04 Surpassed v4.1
30,000 4.50 91% 1.2e-04 Cosine decay
39,000 4.30 91% 6.0e-05 Current best

Parameter Breakdown

Component Parameters
Sensory Zone 4.0M (2 blocks)
Association Zone 4.1M (2 blocks, MoE)
Memory Cortex 0.2M
Executive Zone 4.0M (2 blocks)
Encoder + Readout + LM Head ~127.6M
Total 139.9M

Usage

import torch
from nord_core_v4 import NordConfig, NordModel
from transformers import AutoTokenizer

# Load
ckpt = torch.load("nord_v4_latest.pt", map_location="cuda")
cfg = NordConfig(**ckpt["config"])
model = NordModel(cfg).cuda()

# Filter persistent state buffers (size varies with batch)
state = {k: v for k, v in ckpt["model_state_dict"].items()
         if "_v_mem_state" not in k and "_i_syn_state" not in k}
model.load_state_dict(state, strict=False)
model.eval()

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")

Or use the interactive chat:

python chat_v4.py
# Commands: /stats, /memory, /expert, /stdp on|off, /reset, /quit

Generation Examples

Step 3,600 (loss 5.5) β€” no coherence:

"Queen was being too late. The lake is not to be found in a variety of birds and stynesan trees."

Step 29,000 (loss 4.5) β€” topic understanding, broken logic:

"The internet is equipped with computers that harness data from television and radio vehicles. Its central and large uses can help business use and share information on devices and systems."

Step 39,000 (loss 4.3) β€” thematic coherence, real entities:

"A cybersecurity campaign that uses a computer science machine learning robot to guide players, and has refined algorithms. The popular game research software made by OpenAI security researchers..."

Spike Dynamics

Context Sparsity Interpretation
Simple tokens 95-96% Confident β€” minimal firing
Complex tokens 89-91% More neurons recruited
Training average 91% Healthy spike activity

Sparsity is dynamic and input-dependent β€” the model recruits more neurons for harder inputs, just like a biological brain.

Comparison with Other SNN Language Models

Model Params From Scratch? MoE Zonal Sparsity
Nord v4.2 140M βœ… βœ… βœ… 91%
Nord v3 144M βœ… ❌ ❌ 97%
SpikeGPT 216M βœ… ❌ ❌ ~90%
SpikeLLM 7-70B ❌ ❌ ❌ varies
SpikeBERT ~110M ❌ ❌ ❌ varies

Version History

Version Key Change Result
v3 First SNN LLM 97% sparsity, 51K Reddit views
v3.5 Scale to 500M Failed β€” sparsity stuck at 100%
v4.1 MoE + Zonal + Memory Fixed spikes, loss 4.95
v4.2 Adaptive regulator + Executive fix Loss 4.3, stable 91% sparsity

Limitations

  • Text quality not competitive with GPT-2 at same parameter count (loss 4.3 vs ~3.0)
  • Coherence degrades after 2-3 sentences at 140M scale
  • Multilingual leakage in long generations (dataset artifact)
  • Scaling beyond 140M untested for v4.2
  • No formal benchmark evaluation yet
  • Hallucination present

Scaling Hypothesis

If zonal specialization persists at scale, an 86B SNN could potentially:

  • Match 86B transformer quality
  • Run inference with compute of a 3-4B dense model (96% sparsity)
  • Deploy on neuromorphic hardware (Intel Loihi) with orders of magnitude energy savings

This is unproven. The roadmap: 140M β†’ 500M β†’ 1-2B, testing at each scale.

Citation

@software{nord2026,
  title={Nord v4.2: Brain-Inspired Spiking Neural Network Language Model with Spike-Driven MoE and Zonal Specialization},
  author={Zemondsa},
  year={2026},
  url={https://github.com/zemondsa/nord-ai}
}

About

Built solo by an 18-year-old Ukrainian student studying electronics in Norway. No PhD, no team, no funding β€” just a rented A5000 and curiosity.