zerdovzad
/

Nord-AI

@@ -5,85 +5,169 @@ language:
 pipeline_tag: text-generation
 tags:
 - snn
----
-language:
-- en
-tags:
 - spiking-neural-network
-- SNN
 - neuromorphic
 - language-model
 - from-scratch
 - energy-efficient
 ---
-# ⚡ Nord — Spiking Neural Network Language Model (144M)
-**The first pure SNN language model with a fully original architecture, trained from scratch.**
 ## Model Description
-Nord is a 144M-parameter Spiking Neural Network (SNN) for text generation. It uses biologically-inspired neurons with membrane potentials, firing thresholds, and binary spikes. Unlike other SNN language models, Nord was trained **entirely from scratch** — no transformer teacher, no distillation, no ANN-to-SNN conversion.
 ## Key Features
 | Feature | Details |
 |---------|---------|
-| Parameters | 144.3M |
-| Architecture | Original (not RWKV, not Transformer) |
-| Training method | From scratch with surrogate gradients |
-| Training data | FineWeb-Edu |
-| Sparsity (training) | 97% |
-| Sparsity (inference) | 97-99.8% |
-| Online learning | STDP active during inference |
-| Mobile deployment | Android via Termux |
-| Training cost | ~$10 USD |
 ## Architecture
-Nord combines five mechanisms from different subfields:
-- **LeakyClamp** — Prevents gradient death in deep SNN layers
-- **Multi-Scale Temporal Encoding** — T_fast=8 + T_slow=2 timesteps
-- **Associative Cascade** — Chain reactions keep sparse networks alive
-- **Temporal Co-firing Resonance** — Feature binding without attention
-- **Reward-Modulated STDP** — Aligns Hebbian learning with backprop
 ### Model Configuration
-```
-d_model: 512
-n_layers: 6
 n_heads: 8
 d_ff: 1024
-T_fast: 8
-T_slow: 2
 max_seq_len: 512
 vocab_size: 128,256
 tokenizer: Llama-3.2 (meta-llama/Llama-3.2-1B)
 ```
 ## Training
-- **Dataset:** FineWeb-Edu (~950M tokens, 10GB subset)
-- **Hardware:** RTX A5000 24GB (rented on Vast.ai at $0.177/hr)
-- **Optimizer:** AdamW (lr=6e-4, weight_decay=0.05)
-- **Batch size:** 48 (4 × grad_accum=12)
 - **Sequence length:** 512
-- **Loss progression:** 6.28 (1k) → 5.30 (5k) → 5.00 (10k) → 4.59 (34k)
 ## Usage
 ```python
 import torch
-from nord_core import NordConfig, NordModel
 from transformers import AutoTokenizer
 # Load
-ckpt = torch.load("nord_latest.pt", map_location="cuda")
 cfg = NordConfig(**ckpt["config"])
 model = NordModel(cfg).cuda()
-model.load_state_dict(ckpt["model_state_dict"])
 model.eval()
 tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
@@ -92,58 +176,79 @@ tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
 Or use the interactive chat:
 ```bash
-python chat.py
 ```
-## Generation Examples (Step 34k, loss 4.59)
-**Prompt:** "If you don't write properly I will delete your file"
-**Nord:** "(I'm really careful) that a project is pretty good for the following reasons... The file will then turn this data into your code. It's not an algorithm to say that all the file is a project called 'read'. In order to get a lot of information on you are using this program, and we'll have no more knowledge about them!"
-**Prompt:** "bro my laptop have problems and off wifi how on wifi"
-**Nord:** "The project has been carried out by researchers at a site in 2013, including three main projects that work with wireless phones... using Wi-Fi sensors to monitor their devices such as wireless LANs... network that would have been created using more than 2% of 4 million people to use Bluetooth technology."
-## Spike Statistics
 | Context | Sparsity | Interpretation |
 |---------|----------|----------------|
-| Familiar topic | 99.8% | Confident — minimal neural activity |
-| Training | 97% | Active learning — neurons spiking |
-| Out-of-distribution | 77% | Uncertain — massive activation |
-Sparsity functions as a **built-in uncertainty detector** — no separate calibration needed.
 ## Limitations
-- Repetition remains an issue (mitigated with repetition penalty in decoding)
-- Not competitive with GPT-2 in raw quality
-- Scaling above 144M is untested
 - No formal benchmark evaluation yet
-- Hallucination present (generates plausible but fictional details)
-## Comparison with Other SNN Language Models
-| Model | Params | From Scratch? | Architecture |
-|-------|--------|:-------------:|-------------|
-| **Nord** | 144M | ✅ | Fully original |
-| SpikeGPT | 216M | ✅ | Modified RWKV |
-| SpikeLLM | 7-70B | ❌ | Converted LLaMA |
-| SpikeBERT | ~110M | ❌ | Distilled from BERT |
-| BrainTransformers | 3B | ❌ | Converted Qwen2 |
 ## Citation
 ```bibtex
-@misc{nord2025,
-  title={Nord: A Spiking Neural Network Language Model Trained from Scratch},
-  author={zerdovzad},
-  year={2025},
-  url={https://github.com/gtausa197-svg/-Project-Nord-Spiking-Neural-Network-Language-Model}
 }
 ```
 ## About
-Built by an 18-year-old electronics student from Ukraine, studying in Norway. No PhD, no team, no funding.

 pipeline_tag: text-generation
 tags:
 - snn
 - spiking-neural-network
 - neuromorphic
 - language-model
 - from-scratch
 - energy-efficient
+- mixture-of-experts
+- brain-inspired
 ---
+# ⚡ Nord v4.2 — Brain-Inspired Spiking Neural Network Language Model (140M)
+**The first SNN language model with spike-driven MoE, zonal specialization, and memory cortex — trained from scratch.**
+## What's New in v4.2
+Nord v4.2 is a complete architectural rebuild from v3. The key breakthrough: **the model self-organizes into functionally distinct brain zones during training** — sensory zones learn low firing rates, executive zones learn high firing rates, with no explicit supervision.
+| | v3 (previous) | v4.2 (current) |
+|---|---|---|
+| **Parameters** | 144M | 140M |
+| **Sparsity** | 97% (but spikes broken at scale) | 91% (spikes working) |
+| **MoE** | None | Spike-driven, 4 experts top-2 |
+| **Memory** | None | 128-neuron cortex, τ=0.99 |
+| **Zonal architecture** | No | Yes (self-organizing) |
+| **Loss at 39K steps** | ~4.9 | **4.3** |
+| **Training speed** | Slower convergence | 35% faster to same loss |
 ## Model Description
+Nord v4.2 is a 140M-parameter Spiking Neural Network (SNN) for text generation. It uses biologically-inspired Leaky Integrate-and-Fire neurons with membrane potentials, firing thresholds, and binary spikes. Unlike transformers where 100% of neurons activate per token, Nord activates only **3-9%** — with different brain-inspired zones specializing in different functions.
+Trained **entirely from scratch** — no transformer teacher, no distillation, no ANN-to-SNN conversion.
 ## Key Features
 | Feature | Details |
 |---------|---------|
+| Parameters | 139.9M |
+| Architecture | Original brain-inspired zonal SNN |
+| Zones | Sensory → Association (MoE) → Memory → Executive |
+| MoE | 4 spike-driven experts, top-2 routing |
+| Memory | 128 persistent neurons, gated temporal attention |
+| Sparsity | 89-95% (dynamic, input-dependent) |
+| Timesteps | 10 (8 fast + 2 slow) |
+| Training method | Surrogate gradients + spike homeostasis |
+| Training data | ~2.2M samples, general English corpus |
+| Training cost | ~$15 USD |
+| Online learning | STDP available during inference |
 ## Architecture
+```
+┌───────────────────────────────────────────────┐
+│  Temporal Spike Encoder                       │
+│  Token → 8 fast + 2 slow timestep currents    │
+├───────────────────────────────────────────────┤
+│  Sensory Zone (2 blocks)     rates: 8-10%     │
+│  Standard FFN + LIF, feature extraction       │
+├───────────────────────────────────────────────┤
+│  Association Zone (2 blocks) rates: 10-14%    │
+│  Spike-Driven MoE (4 experts, top-2) + LIF   │
+├───────────────────────────────────────────────┤
+│  Memory Cortex               rates: 0.5-1%    │
+│  128 neurons, τ=0.99, gated temporal attn     │
+├───────────────────────────────────────────────┤
+│  Executive Zone (2 blocks)   rates: 11-26%    │
+│  Standard FFN + LIF, decision & output        │
+├───────────────────────────────────────────────┤
+│  Readout (EMA over membrane potential)        │
+│  → LM Head → vocabulary logits                │
+└───────────────────────────────────────────────┘
+```
+### Key Components
+- **Associative LIF Neurons** — Learnable membrane time constants, voltage thresholds, synaptic currents, cascade amplification across 64 neural clusters
+- **ATan Surrogate Gradient** — Differentiable spike function for backpropagation
+- **Spike-Driven MoE** — Expert routing based on cluster spike-rate activity, not dense networks
+- **Memory Cortex** — Persistent slow memory with multi-head temporal attention readout
+- **Adaptive Spike Regulator** — Asymmetric homeostasis: penalizes too-low firing 3x more than too-high, anti-death floor at 1%
+- **RoPE** — Rotary position embeddings for sequence position encoding
+- **Synaptic Resonance Attention** — Temporal mixing over spike patterns (not naive flattening)
 ### Model Configuration
+```python
+d_model: 496
 n_heads: 8
+n_layers: 6 (2 sensory + 2 association + 2 executive)
 d_ff: 1024
+n_experts: 4
+top_k_experts: 2
+memory_size: 128
+T_fast: 8, T_slow: 2
 max_seq_len: 512
 vocab_size: 128,256
 tokenizer: Llama-3.2 (meta-llama/Llama-3.2-1B)
 ```
+## Emergent Zonal Specialization
+The most significant finding: **the model self-organizes functionally distinct zones** during standard training. No manual assignment, no hardcoded rates.
+```
+Zone              Spike Rate    Biological Analog
+─────────────────────────────────────────────────────
+Sensory           8-10%         Primary sensory cortex
+Association       10-14%        Parietal/temporal cortex
+Memory Cortex     0.5-1%        Hippocampus (selective)
+Executive [0]     11-15%        Premotor cortex
+Executive [1]     22-26%        Prefrontal cortex
+─────────────────────────────────────────────────────
+```
+This mirrors biological cortical organization where prefrontal cortex has higher baseline activity than sensory cortex.
 ## Training
+- **Dataset:** ~2.2M text samples, general English corpus
+- **Hardware:** NVIDIA A5000 24GB (rented on Vast.ai)
+- **Optimizer:** AdamW (lr=3e-4 → 1e-5 cosine decay, weight_decay=0.01)
+- **Batch size:** 2 × grad_accum=16 (effective 32)
 - **Sequence length:** 512
+### Loss Progression
+| Step | Loss | Sparsity | LR | Event |
+|------|------|----------|-----|-------|
+| 0 | 8.9 | 68% | warmup | Start |
+| 1,500 | 6.2 | 69% | 3.0e-04 | Rapid descent |
+| 10,000 | 4.95 | 99% | 3.0e-04 | v4.1 plateau, spikes dying |
+| 14,000 | 7.6→5.2 | 75% | 3.0e-04 | v4.2 fixes, spike revival |
+| 20,000 | 4.70 | 91% | 3.0e-04 | Surpassed v4.1 |
+| 30,000 | 4.50 | 91% | 1.2e-04 | Cosine decay |
+| 39,000 | 4.30 | 91% | 6.0e-05 | Current best |
+### Parameter Breakdown
+| Component | Parameters |
+|-----------|-----------|
+| Sensory Zone | 4.0M (2 blocks) |
+| Association Zone | 4.1M (2 blocks, MoE) |
+| Memory Cortex | 0.2M |
+| Executive Zone | 4.0M (2 blocks) |
+| Encoder + Readout + LM Head | ~127.6M |
+| **Total** | **139.9M** |
 ## Usage
 ```python
 import torch
+from nord_core_v4 import NordConfig, NordModel
 from transformers import AutoTokenizer
 # Load
+ckpt = torch.load("nord_v4_latest.pt", map_location="cuda")
 cfg = NordConfig(**ckpt["config"])
 model = NordModel(cfg).cuda()
+# Filter persistent state buffers (size varies with batch)
+state = {k: v for k, v in ckpt["model_state_dict"].items()
+         if "_v_mem_state" not in k and "_i_syn_state" not in k}
+model.load_state_dict(state, strict=False)
 model.eval()
 tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
 Or use the interactive chat:
 ```bash
+python chat_v4.py
+# Commands: /stats, /memory, /expert, /stdp on|off, /reset, /quit
 ```
+## Generation Examples
+**Step 3,600 (loss 5.5)** — no coherence:
+> "Queen was being too late. The lake is not to be found in a variety of birds and stynesan trees."
+**Step 29,000 (loss 4.5)** — topic understanding, broken logic:
+> "The internet is equipped with computers that harness data from television and radio vehicles. Its central and large uses can help business use and share information on devices and systems."
+**Step 39,000 (loss 4.3)** — thematic coherence, real entities:
+> "A cybersecurity campaign that uses a computer science machine learning robot to guide players, and has refined algorithms. The popular game research software made by OpenAI security researchers..."
+## Spike Dynamics
 | Context | Sparsity | Interpretation |
 |---------|----------|----------------|
+| Simple tokens | 95-96% | Confident — minimal firing |
+| Complex tokens | 89-91% | More neurons recruited |
+| Training average | 91% | Healthy spike activity |
+Sparsity is **dynamic and input-dependent** — the model recruits more neurons for harder inputs, just like a biological brain.
+## Comparison with Other SNN Language Models
+| Model | Params | From Scratch? | MoE | Zonal | Sparsity |
+|-------|--------|:---:|:---:|:---:|---|
+| **Nord v4.2** | 140M | ✅ | ✅ | ✅ | 91% |
+| Nord v3 | 144M | ✅ | ❌ | ❌ | 97% |
+| SpikeGPT | 216M | ✅ | ❌ | ❌ | ~90% |
+| SpikeLLM | 7-70B | ❌ | ❌ | ❌ | varies |
+| SpikeBERT | ~110M | ❌ | ❌ | ❌ | varies |
+## Version History
+| Version | Key Change | Result |
+|---------|-----------|--------|
+| v3 | First SNN LLM | 97% sparsity, 51K Reddit views |
+| v3.5 | Scale to 500M | Failed — sparsity stuck at 100% |
+| v4.1 | MoE + Zonal + Memory | Fixed spikes, loss 4.95 |
+| **v4.2** | **Adaptive regulator + Executive fix** | **Loss 4.3, stable 91% sparsity** |
 ## Limitations
+- Text quality not competitive with GPT-2 at same parameter count (loss 4.3 vs ~3.0)
+- Coherence degrades after 2-3 sentences at 140M scale
+- Multilingual leakage in long generations (dataset artifact)
+- Scaling beyond 140M untested for v4.2
 - No formal benchmark evaluation yet
+- Hallucination present
+## Scaling Hypothesis
+If zonal specialization persists at scale, an 86B SNN could potentially:
+- Match 86B transformer quality
+- Run inference with compute of a 3-4B dense model (96% sparsity)
+- Deploy on neuromorphic hardware (Intel Loihi) with orders of magnitude energy savings
+This is unproven. The roadmap: 140M → 500M → 1-2B, testing at each scale.
 ## Citation
 ```bibtex
+@software{nord2026,
+  title={Nord v4.2: Brain-Inspired Spiking Neural Network Language Model with Spike-Driven MoE and Zonal Specialization},
+  author={Zemondsa},
+  year={2026},
+  url={https://github.com/zemondsa/nord-ai}
 }
 ```
 ## About
+Built solo by an 18-year-old Ukrainian student studying electronics in Norway. No PhD, no team, no funding — just a rented A5000 and curiosity.