Update README.md

f8b03a9 verified 4 days ago

4.95 kB

	---
	base_model: tiiuae/Falcon-H1-0.5B-Base
	tags:
	- dpo
	- neuromorphic
	- bnn
	- hybrid-intelligence
	- falcon
	- reasoning
	license: apache-2.0
	language:
	- en
	- ar
	pipeline_tag: text-generation
	---

	# Hybrid Intelligence 0.5B

	![hybrid](https://cdn-uploads.huggingface.co/production/uploads/67329d3f69fded92d56ab41a/9UTL-QW7qIXWt4jjWtGWA.jpeg)

	This is the first public checkpoint of a hybrid intelligence system from Merlin Research.

	Hybrid intelligence means the system is not purely statistical (LLM) and not purely
	symbolic — it couples a language model with a neuromorphic Biological Neural Network (BNN)
	that observes, evaluates, and selects the LLM's outputs in real time.
	The two components evolve together: the LLM generates, the BNN judges,
	and both improve from the same stream of experience.

	## Architecture: Two Systems, One Loop

	![loop](https://cdn-uploads.huggingface.co/production/uploads/67329d3f69fded92d56ab41a/HnbeHFJcCTsf8bugOXs4h.jpeg)

	The LLM (Falcon H1 0.5B) generates multiple candidate answers.
	The BNN encodes uncertainty signals as neuromorphic spike trains and selects
	the best candidate. The correctness of that selection feeds back as training
	signal for both the BNN and (via DPO) the LLM itself.

	## The BNN Component

	The BNN is inspired by biological neural circuits. It uses
	Leaky Integrate-and-Fire (LIF) neurons with 4 time scales
	(decay constants: 0.70, 0.80, 0.85, 0.95) and generates spikes
	via Poisson statistics — the same model used to describe
	real neuron firing in cortex. This gives the selector a temporal
	memory of the generation process, not just a snapshot.

	![branch](https://cdn-uploads.huggingface.co/production/uploads/67329d3f69fded92d56ab41a/w1TMtYSq9eRJK9SXzJIw-.jpeg)

	Runs entirely in pure NumPy — no GPU, no special hardware.
	Total weights: ~8 KB.

	## Key Discovery: Calibration Inversion

	> A small LLM is systematically more confident on wrong answers than on right ones.

	We measured first-token entropy across thousands of hybrid loop iterations.
	Correct answers show higher entropy and lower probability margin than wrong ones
	(t=2.28 and t=−3.41 respectively). The LLM "hesitates" more when it is actually correct.

	This is the core insight the BNN learned to exploit. Rather than trusting the
	model's confidence, the hybrid system uses neuromorphic signals to see past
	the model's miscalibration and identify the genuinely better answer.

	## How the System Was Built: 30,000 Experiments

	Merlin runs 6 autonomous researchers every night (01:00–07:00):

	\| Process \| Role \|
	\|---\|---\|
	\| `hybrid` \| Main hybrid loop — generates, encodes, selects, evaluates \|
	\| `bnn_trainer` \| Retrains BNN every 5 min from accumulated experience \|
	\| `candidate_pool` \| Generates diverse candidates (4 sampling strategies) \|
	\| `neuro_coupling` \| BNN-guided token-by-token temperature adjustment \|
	\| `ml` \| Collects DPO preference pairs for LLM fine-tuning \|
	\| `meta_analyzer` \| Updates evolutionary mutation weights before each session \|

	Encoder parameters (pulse width, burst count, frequency, entropy scale) are found
	by evolutionary search — propose mutation, run 100 benchmark questions,
	keep if improvement ≥ 0.5pp. This process ran for ~30,000 experiments
	and produced 38+ confirmed improvements before this checkpoint.

	## Results

	\| System \| Accuracy \|
	\|---\|---\|
	\| Raw Falcon H1 0.5B (baseline) \| 21.0% \|
	\| Hybrid Intelligence (BNN + LLM) \| ~26–28% \|

	+5–7 percentage points improvement. The gap is entirely from the hybrid loop —
	the BNN selector adds no latency perceivable to the user (~1ms overhead).

	## DPO Fine-Tuning

	The LLM component was fine-tuned with DPO on 4,234 preference pairs
	collected autonomously by the `ml` researcher over multiple nights.

	- LoRA: r=16, α=32, target modules: q_proj + v_proj
	- β=0.1, 3 epochs, cosine schedule

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained(
	"MerlinSafety/falcon-h1-0.5b-dpo",
	trust_remote_code=True,
	)
	tokenizer = AutoTokenizer.from_pretrained(
	"MerlinSafety/falcon-h1-0.5b-dpo",
	trust_remote_code=True,
	)

	prompt = "Question: What is the capital of France?\nAnswer:"
	inputs = tokenizer(prompt, return_tensors="pt")
	out = model.generate(**inputs, max_new_tokens=40, do_sample=False)
	print(tokenizer.decode(out[0][inputs['input_ids'].shape[1]:]))
	```

	## Status & Roadmap

	This is Checkpoint #1. The hybrid loop continues to run and improve.

	- [ ] Stronger base model (Qwen2.5-Math-1.5B or any Qwen3.5)
	- [ ] Scale DPO dataset to 10,000+ pairs
	- [ ] Online BNN adaptation during inference
	- [ ] Multi-model candidate pool
	- [ ] We hope to collaborate with [Cortical Labs](https://corticallabs.com) —
	running the hybrid loop on biological neurons (CL1) as a true wetware selector

	---
	Merlin Research — building hybrid intelligence, one checkpoint at a time.