docs: comprehensive README with architecture, benchmarks, and examples

51b2aff verified 19 days ago

12.9 kB

	# MLE — Morpho-Logic Engine

	> A novel energy-based reasoning AI architecture, CPU-native, gradient-free, built on hyperdimensional computing.

	[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
	[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
	[![Tests: 7/7](https://img.shields.io/badge/tests-7%2F7%20passing-brightgreen.svg)]()

	```
	███╗ ███╗ ██╗ ███████╗
	████╗ ████║ ██║ ██╔════╝
	██╔████╔██║ ██║ █████╗ Morpho-Logic Engine v0.1.0
	██║╚██╔╝██║ ██║ ██╔══╝ Energy-Based Reasoning AI
	██║ ╚═╝ ██║ ███████╗███████╗
	╚═╝ ╚═╝ ╚══════╝╚══════╝
	```

	---

	## 🧠 What is MLE?

	MLE is a new class of reasoning engine that replaces neural network backpropagation with energy-based dynamics operating on hyperdimensional binary vectors. It draws from:

	- Kanerva's Sparse Distributed Memory — memory indexed by proximity in Hamming space
	- Holographic Reduced Representations (Plate 1995) — circular convolution for semantic binding
	- Modern Hopfield Networks (Ramsauer et al. 2020) — energy-based pattern completion
	- Binary Spatter Codes — ultra-fast XOR binding for CPU-native computation

	The result is a system that can reason about concepts, solve analogies, compose meanings, and traverse knowledge graphs — all without GPU, without gradients, without training — using pure bitwise operations optimized for CPU SIMD instructions.

	---

	## 🏗️ Architecture

	```
	┌─────────────┐ ┌──────────────┐ ┌──────────────┐ ┌─────────────┐
	│ Query │───▶│ Routing │───▶│ Binding │───▶│ Energy │
	│ Encoder │ │ (JIT Beam) │ │ (Compose) │ │ (Relax) │
	│ │ │ Top-500 │ │ XOR / FFT │ │ Hopfield + │
	│ str→4096b │ │ LSH+Expand │ │ Conv Circ. │ │ Bit-flip │
	└─────────────┘ └──────────────┘ └──────────────┘ └─────────────┘
	│ │
	│ ┌──────────────┐ ┌──────────────┐ │
	└───────────│ Response │◀───│ Decode │◀─────────┘
	│ │ │ NN + Role │
	└──────────────┘ └──────────────┘
	```

	### 5 Modules

	\| Module \| File \| Description \|
	\|--------\|------\|-------------\|
	\| `memory` \| `sparse_address_table.py` \| Sparse Address Table: 4096-bit binary vectors, SoA layout, LSH index (32 tables × 8-bit signatures), multi-probe search, activation tracking \|
	\| `routing` \| `recursive_jit_router.py` \| Recursive JIT Routing: LSH init → beam-500 refinement → neighbor expansion → convergence check. Multi-hop chaining for chain-of-thought \|
	\| `binding` \| `semantic_binding.py` \| Dual binding: Binary (XOR) O(N/64) exact recovery + HRR (FFT) O(N log N) approximate. Triple encoding, analogy queries, sequence binding via permutation \|
	\| `energy` \| `energy_model.py` \| Composite energy: compatibility + binding coherence + sparsity + smoothness. Hopfield (continuous, attention-based) + Binary relaxation (simulated annealing). Hybrid mode for best results \|
	\| `inference` \| `reasoning_engine.py` \| Full pipeline: encode → route → bind → relax → decode. Association, analogy, composition, structured queries. Multi-step reasoning with convergence detection \|

	### SIMD-Optimized Core (`utils/simd_ops.py`)

	All core operations are backed by a GCC-compiled C library with `-march=native` for automatic SIMD vectorization:

	```c
	// Compiles to AVX-512 VPOPCNTQ or AVX2 POPCNT automatically
	int hamming_single(const uint64_t a, const uint64_t b, int n) {
	int cnt = 0;
	for (int i = 0; i < n; i++)
	cnt += __builtin_popcountll(a[i] ^ b[i]);
	return cnt;
	}
	```

	\| Operation \| Throughput \| Notes \|
	\|-----------\|-----------\|-------\|
	\| Hamming distance (single) \| ~100M ops/s \| 64 × `POPCNT` per pair \|
	\| Hamming batch (100K vectors) \| 25-41M vecs/s \| Vectorized XOR + popcount \|
	\| Top-500 selection \| O(N log K) \| Max-heap in C \|
	\| Binary bind (XOR) \| ~95K ops/s \| 64 × `XOR` per op \|
	\| HRR bind (FFT) \| ~10K ops/s \| `numpy.fft.rfft` \|

	Fallback: Pure NumPy LUT-based popcount when GCC isn't available — portable across all platforms.

	---

	## 📐 Core Concepts

	### 4096-bit Binary Vectors

	Every concept, relation, and memory address is a 4096-bit binary vector stored as 64 × `uint64` words (512 bytes):

	```python
	# Each vector: 4096 bits = 512 bytes = 64 × uint64
	# Storage layout: Structure of Arrays (SoA) for cache locality
	# addresses: (N, 64) uint64 — contiguous, cache-aligned
	# contents: (N, 64) uint64 — separate for SIMD batch ops
	```

	Key property: Random vectors have Hamming distance ≈ 2048 (50%). Semantic similarity is encoded as deviation from this baseline. Vectors with distance << 2048 are "similar"; vectors at ~2048 are orthogonal.

	### Sparse Address Table

	Memory entries are indexed by binary vectors. Access uses Hamming distance as the proximity metric:

	```python
	from mle import SparseAddressTable

	sat = SparseAddressTable(capacity=100_000)
	sat.store(address_vec, content_vec, metadata={'name': 'cat'})
	results = sat.query_nearest(query_vec, k=10) # [(index, distance), ...]
	```

	LSH Index: 32 hash tables with 8-bit random-bit-sampling signatures. Multi-probe search (1-bit and 2-bit flips) for high recall. Sub-linear search time for large memories.

	### Binding Operations

	Binary (XOR): `bind(A, B) = A ⊕ B`. Self-inverse (exact recovery), quasi-orthogonal to inputs, O(N/64).

	HRR (FFT): `bind(A, B) = IFFT(FFT(A) · FFT(B))`. Circular convolution, approximate recovery, similarity-preserving, O(N log N).

	```python
	from mle.binding import BinaryBinding

	# Encode: "king IS_A man"
	triple = BinaryBinding.encode_triple(king_vec, is_a_vec, man_vec)

	# Decode: recover man from triple
	decoded = BinaryBinding.unbind(BinaryBinding.unbind(triple, king_vec), is_a_vec)
	# decoded == man_vec (exact with XOR!)

	# Analogy: king:man :: queen:?
	query = BinaryBinding.create_analogy_query(king_vec, man_vec, queen_vec)
	# query ≈ woman_vec (find nearest in codebook)
	```

	### Energy-Based Reasoning

	No backpropagation. No gradients stored. Reasoning is energy minimization:

	```
	E(state) = α·E_compat + β·E_binding + γ·E_sparse + δ·E_smooth
	```

	\| Component \| Formula \| Purpose \|
	\|-----------\|---------\|---------\|
	\| Compatibility \| -Σ wᵢ · sim(state, contextᵢ) \| State agrees with activated memories \|
	\| Binding coherence \| Σ hamming(unbind(bᵢ, rᵢ), fᵢ) / N \| Stored relations remain intact \|
	\| Sparsity \| ‖activations‖₁ \| Focused, not diffuse, activation \|
	\| Smoothness \| hamming(current, previous) / N \| Stable reasoning trajectory \|

	Two-phase minimization:
	1. Hopfield update: `ξ_new = X @ softmax(β · X^T @ ξ)` — fast coarse convergence via attention over patterns
	2. Binary relaxation: bit-flip search with simulated annealing — fine discrete refinement

	---

	## 🚀 Quick Start

	```python
	from mle import MorphoLogicEngine

	# Initialize
	engine = MorphoLogicEngine(beam_width=500, energy_mode='hybrid')

	# Build knowledge
	engine.add_concept("cat")
	engine.add_concept("dog")
	engine.add_concept("animal")
	engine.add_relation("cat", "is_a", "animal")
	engine.add_relation("dog", "is_a", "animal")

	# Reason
	result = engine.reason("cat", max_steps=3)
	print(result['response']['nearest_concepts'])
	# → [('cat', 0.99), ('animal', 0.75), ...]

	# Associations
	assocs = engine.associate("cat", top_k=5)
	# → [('cat_is_a_animal', 0.74), ('dog', 0.52), ...]

	# Analogy: king:man :: queen:?
	analogy = engine.solve_analogy("king", "man", "queen")
	print(analogy['codebook_ranking'][:3])

	# Composition: water + animal → ?
	comp = engine.compose("water", "animal")
	print(comp['response']['nearest_concepts'][:3])
	```

	---

	## 📊 Benchmarks

	Measured on a 2-vCPU machine (cpu-basic), single-threaded:

	### SIMD Throughput
	\| Corpus Size \| Batch Hamming \| Top-500 \|
	\|-------------\|--------------\|---------\|
	\| 1,000 \| 0.04ms (28M/s) \| 0.06ms \|
	\| 10,000 \| 0.29ms (35M/s) \| 0.32ms \|
	\| 100,000 \| 4.56ms (22M/s) \| 4.79ms \|

	### Routing Latency
	\| Memory Size \| Avg Latency \| P99 \| Candidates \|
	\|-------------\|-------------\|-----\|------------\|
	\| 1,000 \| 3.8ms \| 5.4ms \| 953 \|
	\| 10,000 \| 2.5ms \| 3.2ms \| 3,335 \|
	\| 50,000 \| 2.7ms \| 3.5ms \| 2,679 \|

	### Memory Efficiency
	- 1,024 bytes/entry (512 address + 512 content)
	- 1,000 entries = 1 MB
	- 100,000 entries = 100 MB

	### Binding Performance
	- Binary (XOR): 95,000 ops/sec
	- HRR (FFT): 10,500 ops/sec

	---

	## 🧪 Tests

	```bash
	pip install numpy scipy
	python -m mle.tests.test_full_system
	```

	7/7 test groups passing:
	- ✅ SIMD Operations (correctness + performance)
	- ✅ Memory & LSH (storage, retrieval, 100% cluster recall)
	- ✅ Routing (beam width, convergence, scalability)
	- ✅ Binding (XOR exact recovery, HRR approximate recovery, triple encoding)
	- ✅ Energy Convergence (monotonic decrease, Hopfield attention concentration)
	- ✅ Reasoning (association, query, analogy, composition, structured queries)
	- ✅ Integration (500+ concept KB, batch queries, memory efficiency)

	---

	## 🎯 Demo

	```bash
	python -m mle.demo
	```

	Runs a full demonstration with 40+ concepts, 42 relations, and tests for concept queries, associations, analogies, compositions, structured queries, and multi-step reasoning.

	---

	## 📁 Project Structure

	```
	mle/
	├── __init__.py # Package init, public API
	├── demo.py # Interactive demonstration
	├── utils/
	│ ├── __init__.py
	│ └── simd_ops.py # SIMD C library + NumPy fallback
	├── memory/
	│ ├── __init__.py
	│ └── sparse_address_table.py # SparseAddressTable + HammingLSH
	├── routing/
	│ ├── __init__.py
	│ └── recursive_jit_router.py # RecursiveJITRouter
	├── binding/
	│ ├── __init__.py
	│ └── semantic_binding.py # HRRBinding + BinaryBinding + BindingEngine
	├── energy/
	│ ├── __init__.py
	│ └── energy_model.py # EnergyFunction + Relaxation + Hopfield
	├── inference/
	│ ├── __init__.py
	│ └── reasoning_engine.py # ReasoningEngine (full pipeline)
	└── tests/
	├── __init__.py
	└── test_full_system.py # Comprehensive test suite
	```

	---

	## 🔬 Theoretical Foundations

	\| Paper \| Contribution to MLE \|
	\|-------\|-------------------\|
	\| Kanerva (1988) "Sparse Distributed Memory" \| Binary vector addressing, Hamming distance proximity \|
	\| Plate (1995) "Holographic Reduced Representations" \| Circular convolution binding, FFT implementation \|
	\| Gayler (2003) "Vector Symbolic Architectures" \| XOR binding (BSC), majority-vote bundling \|
	\| Ramsauer et al. (2020) "Hopfield Networks Is All You Need" \| Modern Hopfield energy, exponential capacity, attention ≡ update rule \|
	\| Frady et al. (2021) "SDM and Transformers" \| SDM Hamming threshold ≈ transformer attention \|
	\| Thomas et al. (2023) "Efficient HDC with Static Optimization" \| Optimal BSC dimensions, analytical thresholds \|
	\| Langford et al. (2024) "Linear Codes for HDC" \| GF(2) factorization, 100% XOR recovery \|

	---

	## 🛤️ Roadmap

	- [ ] Persistent storage: Serialize memory to disk (mmap for instant loading)
	- [ ] Learned embeddings: Pre-encode concepts from text corpora (word2vec → binary projection)
	- [ ] Multi-threaded SIMD: Parallel batch Hamming with OpenMP
	- [ ] Graph walk reasoning: Follow relation chains for multi-hop inference
	- [ ] Incremental learning: Hebbian-style weight updates from experience
	- [ ] Benchmark suite: Standardized reasoning tasks (bAbI, CLUTRR, etc.)

	---

	## 📜 License

	MIT

	---

	## 🙏 Acknowledgments

	Inspired by the vision of frugal, explainable AI that reasons rather than retrieves. Built on decades of research in hyperdimensional computing, energy-based models, and sparse distributed memory.