Upload README.md

fbea604 verified about 1 month ago

2.93 kB

	# MLE Competitive Model

	A high-dimensional sparse distributed memory system with energy-based dynamics and knowledge-based question-answering capabilities.

	## Architecture

	- Vector size: 4096 bits (sparse, ~5% active = 205 bits)
	- Memory: Sparse Address Table with deterministic hash encoding
	- Knowledge base: 1848 facts, 1106 words, 53 sequences
	- Encoding: MD5-based deterministic sparse vectors
	- Inference: Symbolic reasoning with Hamming similarity

	## Benchmarks - Ultimate Model (v3)

	\| Task \| Accuracy \| Details \|
	\|------\|----------\|---------\|
	\| Question Answering \| 62.8% \| 49/78 correct \|
	\| Analogy Solving \| 67.9% \| 19/28 correct \|
	\| Sequence Completion \| 93.3% \| 14/15 correct \|
	\| Word Retrieval (noise 15%) \| 100% \| 100/100 correct \|
	\| Generalization \| 100% \| 20/20 correct \|
	\| Robustness (noise 50%) \| 100% \| Up to 50% bit corruption \|
	\| Overall \| 84.8% \| Weighted average \|

	## Corpus Coverage

	- Animals: 185 animals across 5 categories
	- Plants: 175 plants across 6 categories
	- Locations: 150+ cities across 15 countries and 5 continents
	- Vehicles: 10 vehicles with parts and actions
	- Professions: 20 professions with workplace associations
	- Food: 55 food and drink items
	- Body: 36 body parts with functions
	- Colors: 29 colors with associations
	- Emotions: 20 emotions with opposites
	- Time: Full temporal sequences (days, months, seasons, ordinals)
	- Tools: 20 tools with functions
	- Music: 17 instruments and 12 genres
	- Sports: 22 sports
	- Weather: 14 weather types
	- Clothing: 20 clothing items
	- Technology: 19 devices
	- Celestial: 20 celestial bodies

	## Files

	- `mle_ultimate_model.npz` - Word vectors (1106 × 4096 binary)
	- `mle_ultimate_facts.json` - Complete knowledge base (1848 facts)
	- `mle_ultimate_results.json` - Benchmark results
	- `mle_ultimate_config.json` - Model configuration

	## Usage

	```python
	import numpy as np

	# Load model
	data = np.load("mle_ultimate_model.npz", allow_pickle=True)
	words = data['words']
	vectors = data['vectors']

	# Each vector is a 4096-bit sparse binary vector
	# ~205 bits active (5% sparsity)
	```

	## Supported Query Patterns

	- "what is X" → category (is_a)
	- "what has X" → possessor (has)
	- "what can X" → agent (can)
	- "where is X" → location (in/lives_in)
	- "what color is X" → color (is/has_color)
	- "what is opposite of X" → antonym (opposite)
	- "what is before X" → predecessor (before)
	- "what is X made of" → material (made_of)
	- "what is X used for" → function (used_for)

	## Model Versions

	\| Version \| Facts \| Words \| Overall \|
	\|---------\|-------\|-------\|---------\|
	\| v1 (mle_best) \| 283 \| 334 \| 97.2% \|
	\| v2 (mle_advanced) \| 933 \| 705 \| 68.2% \|
	\| v3 (mle_ultimate) \| 1848 \| 1106 \| 84.8% \|

	v3 offers the largest knowledge coverage with strong robustness to noise.

	# MLE Competitive Model

	A high-dimensional sparse distributed memory system with energy-based dynamics and knowledge-based question-answering capabilities.

	## Architecture

	- Vector size: 4096 bits (sparse, ~5% active = 205 bits)
	- Memory: Sparse Address Table with deterministic hash encoding
	- Knowledge base: 1848 facts, 1106 words, 53 sequences
	- Encoding: MD5-based deterministic sparse vectors
	- Inference: Symbolic reasoning with Hamming similarity

	## Benchmarks - Ultimate Model (v3)

	\| Task \| Accuracy \| Details \|
	\|------\|----------\|---------\|
	\| Question Answering \| 62.8% \| 49/78 correct \|
	\| Analogy Solving \| 67.9% \| 19/28 correct \|
	\| Sequence Completion \| 93.3% \| 14/15 correct \|
	\| Word Retrieval (noise 15%) \| 100% \| 100/100 correct \|
	\| Generalization \| 100% \| 20/20 correct \|
	\| Robustness (noise 50%) \| 100% \| Up to 50% bit corruption \|
	\| Overall \| 84.8% \| Weighted average \|

	## Corpus Coverage

	- Animals: 185 animals across 5 categories
	- Plants: 175 plants across 6 categories
	- Locations: 150+ cities across 15 countries and 5 continents
	- Vehicles: 10 vehicles with parts and actions
	- Professions: 20 professions with workplace associations
	- Food: 55 food and drink items
	- Body: 36 body parts with functions
	- Colors: 29 colors with associations
	- Emotions: 20 emotions with opposites
	- Time: Full temporal sequences (days, months, seasons, ordinals)
	- Tools: 20 tools with functions
	- Music: 17 instruments and 12 genres
	- Sports: 22 sports
	- Weather: 14 weather types
	- Clothing: 20 clothing items
	- Technology: 19 devices
	- Celestial: 20 celestial bodies

	## Files

	- `mle_ultimate_model.npz` - Word vectors (1106 × 4096 binary)
	- `mle_ultimate_facts.json` - Complete knowledge base (1848 facts)
	- `mle_ultimate_results.json` - Benchmark results
	- `mle_ultimate_config.json` - Model configuration

	## Usage

	```python
	import numpy as np

	# Load model
	data = np.load("mle_ultimate_model.npz", allow_pickle=True)
	words = data['words']
	vectors = data['vectors']

	# Each vector is a 4096-bit sparse binary vector
	# ~205 bits active (5% sparsity)
	```

	## Supported Query Patterns

	- "what is X" → category (is_a)
	- "what has X" → possessor (has)
	- "what can X" → agent (can)
	- "where is X" → location (in/lives_in)
	- "what color is X" → color (is/has_color)
	- "what is opposite of X" → antonym (opposite)
	- "what is before X" → predecessor (before)
	- "what is X made of" → material (made_of)
	- "what is X used for" → function (used_for)

	## Model Versions

	\| Version \| Facts \| Words \| Overall \|
	\|---------\|-------\|-------\|---------\|
	\| v1 (mle_best) \| 283 \| 334 \| 97.2% \|
	\| v2 (mle_advanced) \| 933 \| 705 \| 68.2% \|
	\| v3 (mle_ultimate) \| 1848 \| 1106 \| 84.8% \|

	v3 offers the largest knowledge coverage with strong robustness to noise.