Hybrid Intelligence 0.5B

hybrid

This is the first public checkpoint of a hybrid intelligence system from Merlin Research.

Hybrid intelligence means the system is not purely statistical (LLM) and not purely symbolic β€” it couples a language model with a neuromorphic Biological Neural Network (BNN) that observes, evaluates, and selects the LLM's outputs in real time. The two components evolve together: the LLM generates, the BNN judges, and both improve from the same stream of experience.

Architecture: Two Systems, One Loop

loop

The LLM (Falcon H1 0.5B) generates multiple candidate answers. The BNN encodes uncertainty signals as neuromorphic spike trains and selects the best candidate. The correctness of that selection feeds back as training signal for both the BNN and (via DPO) the LLM itself.

The BNN Component

The BNN is inspired by biological neural circuits. It uses Leaky Integrate-and-Fire (LIF) neurons with 4 time scales (decay constants: 0.70, 0.80, 0.85, 0.95) and generates spikes via Poisson statistics β€” the same model used to describe real neuron firing in cortex. This gives the selector a temporal memory of the generation process, not just a snapshot.

branch

Runs entirely in pure NumPy β€” no GPU, no special hardware. Total weights: ~8 KB.

Key Discovery: Calibration Inversion

A small LLM is systematically more confident on wrong answers than on right ones.

We measured first-token entropy across thousands of hybrid loop iterations. Correct answers show higher entropy and lower probability margin than wrong ones (t=2.28 and t=βˆ’3.41 respectively). The LLM "hesitates" more when it is actually correct.

This is the core insight the BNN learned to exploit. Rather than trusting the model's confidence, the hybrid system uses neuromorphic signals to see past the model's miscalibration and identify the genuinely better answer.

How the System Was Built: 30,000 Experiments

Merlin runs 6 autonomous researchers every night (01:00–07:00):

Process Role
hybrid Main hybrid loop β€” generates, encodes, selects, evaluates
bnn_trainer Retrains BNN every 5 min from accumulated experience
candidate_pool Generates diverse candidates (4 sampling strategies)
neuro_coupling BNN-guided token-by-token temperature adjustment
ml Collects DPO preference pairs for LLM fine-tuning
meta_analyzer Updates evolutionary mutation weights before each session

Encoder parameters (pulse width, burst count, frequency, entropy scale) are found by evolutionary search β€” propose mutation, run 100 benchmark questions, keep if improvement β‰₯ 0.5pp. This process ran for ~30,000 experiments and produced 38+ confirmed improvements before this checkpoint.

Results

System Accuracy
Raw Falcon H1 0.5B (baseline) 21.0%
Hybrid Intelligence (BNN + LLM) ~26–28%

+5–7 percentage points improvement. The gap is entirely from the hybrid loop β€” the BNN selector adds no latency perceivable to the user (~1ms overhead).

DPO Fine-Tuning

The LLM component was fine-tuned with DPO on 4,234 preference pairs collected autonomously by the ml researcher over multiple nights.

  • LoRA: r=16, Ξ±=32, target modules: q_proj + v_proj
  • Ξ²=0.1, 3 epochs, cosine schedule

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "MerlinSafety/falcon-h1-0.5b-dpo",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "MerlinSafety/falcon-h1-0.5b-dpo",
    trust_remote_code=True,
)

prompt = "Question: What is the capital of France?\nAnswer:"
inputs = tokenizer(prompt, return_tensors="pt")
out = model.generate(**inputs, max_new_tokens=40, do_sample=False)
print(tokenizer.decode(out[0][inputs['input_ids'].shape[1]:]))

Status & Roadmap

This is Checkpoint #1. The hybrid loop continues to run and improve.

  • Stronger base model (Qwen2.5-Math-1.5B or any Qwen3.5)
  • Scale DPO dataset to 10,000+ pairs
  • Online BNN adaptation during inference
  • Multi-model candidate pool
  • We hope to collaborate with Cortical Labs β€” running the hybrid loop on biological neurons (CL1) as a true wetware selector

Merlin Research β€” building hybrid intelligence, one checkpoint at a time.

Downloads last month
12
Safetensors
Model size
0.5B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for MerlinSafety/HybridIntelligence-0.5B

Finetuned
(8)
this model