---
base_model: tiiuae/Falcon-H1-0.5B-Base
tags:
- dpo
- neuromorphic
- bnn
- hybrid-intelligence
- falcon
- reasoning
license: apache-2.0
language:
- en
- ar
pipeline_tag: text-generation
---

# Hybrid Intelligence 0.5B

![hybrid](https://cdn-uploads.huggingface.co/production/uploads/67329d3f69fded92d56ab41a/9UTL-QW7qIXWt4jjWtGWA.jpeg)

This is the **first public checkpoint of a hybrid intelligence system** from Merlin Research.

Hybrid intelligence means the system is not purely statistical (LLM) and not purely
symbolic — it couples a language model with a neuromorphic Biological Neural Network (BNN)
that observes, evaluates, and selects the LLM's outputs in real time.
The two components evolve together: the LLM generates, the BNN judges,
and both improve from the same stream of experience.

## Architecture: Two Systems, One Loop

![loop](https://cdn-uploads.huggingface.co/production/uploads/67329d3f69fded92d56ab41a/HnbeHFJcCTsf8bugOXs4h.jpeg)

The LLM (Falcon H1 0.5B) generates multiple candidate answers.
The BNN encodes uncertainty signals as neuromorphic spike trains and selects
the best candidate. The correctness of that selection feeds back as training
signal for both the BNN and (via DPO) the LLM itself.

## The BNN Component

The BNN is inspired by biological neural circuits. It uses
**Leaky Integrate-and-Fire (LIF) neurons** with 4 time scales
(decay constants: 0.70, 0.80, 0.85, 0.95) and generates spikes
via **Poisson statistics** — the same model used to describe
real neuron firing in cortex. This gives the selector a temporal
memory of the generation process, not just a snapshot.

![branch](https://cdn-uploads.huggingface.co/production/uploads/67329d3f69fded92d56ab41a/w1TMtYSq9eRJK9SXzJIw-.jpeg)

Runs entirely in **pure NumPy** — no GPU, no special hardware.
Total weights: ~8 KB.

## Key Discovery: Calibration Inversion

> **A small LLM is systematically more confident on wrong answers than on right ones.**

We measured first-token entropy across thousands of hybrid loop iterations.
Correct answers show *higher* entropy and *lower* probability margin than wrong ones
(t=2.28 and t=−3.41 respectively). The LLM "hesitates" more when it is actually correct.

This is the core insight the BNN learned to exploit. Rather than trusting the
model's confidence, the hybrid system uses neuromorphic signals to see past
the model's miscalibration and identify the genuinely better answer.

## How the System Was Built: 30,000 Experiments

Merlin runs **6 autonomous researchers** every night (01:00–07:00):

| Process | Role |
|---|---|
| `hybrid` | Main hybrid loop — generates, encodes, selects, evaluates |
| `bnn_trainer` | Retrains BNN every 5 min from accumulated experience |
| `candidate_pool` | Generates diverse candidates (4 sampling strategies) |
| `neuro_coupling` | BNN-guided token-by-token temperature adjustment |
| `ml` | Collects DPO preference pairs for LLM fine-tuning |
| `meta_analyzer` | Updates evolutionary mutation weights before each session |

Encoder parameters (pulse width, burst count, frequency, entropy scale) are found
by **evolutionary search** — propose mutation, run 100 benchmark questions,
keep if improvement ≥ 0.5pp. This process ran for ~**30,000 experiments**
and produced 38+ confirmed improvements before this checkpoint.

## Results

| System | Accuracy |
|---|---|
| Raw Falcon H1 0.5B (baseline) | 21.0% |
| Hybrid Intelligence (BNN + LLM) | ~26–28% |

**+5–7 percentage points** improvement. The gap is entirely from the hybrid loop —
the BNN selector adds no latency perceivable to the user (~1ms overhead).

## DPO Fine-Tuning

The LLM component was fine-tuned with DPO on **4,234 preference pairs**
collected autonomously by the `ml` researcher over multiple nights.

- LoRA: r=16, α=32, target modules: q_proj + v_proj
- β=0.1, 3 epochs, cosine schedule

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "MerlinSafety/falcon-h1-0.5b-dpo",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "MerlinSafety/falcon-h1-0.5b-dpo",
    trust_remote_code=True,
)

prompt = "Question: What is the capital of France?\nAnswer:"
inputs = tokenizer(prompt, return_tensors="pt")
out = model.generate(**inputs, max_new_tokens=40, do_sample=False)
print(tokenizer.decode(out[0][inputs['input_ids'].shape[1]:]))
```

## Status & Roadmap

This is **Checkpoint #1**. The hybrid loop continues to run and improve.

- [ ] Stronger base model (Qwen2.5-Math-1.5B or any Qwen3.5)
- [ ] Scale DPO dataset to 10,000+ pairs
- [ ] Online BNN adaptation during inference
- [ ] Multi-model candidate pool
- [ ] We hope to collaborate with [Cortical Labs](https://corticallabs.com) —
      running the hybrid loop on biological neurons (CL1) as a true wetware selector

---
*Merlin Research — building hybrid intelligence, one checkpoint at a time.*