| | --- |
| | base_model: tiiuae/Falcon-H1-0.5B-Base |
| | tags: |
| | - dpo |
| | - neuromorphic |
| | - bnn |
| | - hybrid-intelligence |
| | - falcon |
| | - reasoning |
| | license: apache-2.0 |
| | language: |
| | - en |
| | - ar |
| | pipeline_tag: text-generation |
| | --- |
| | |
| | # Hybrid Intelligence 0.5B |
| |
|
| |  |
| |
|
| | This is the **first public checkpoint of a hybrid intelligence system** from Merlin Research. |
| |
|
| | Hybrid intelligence means the system is not purely statistical (LLM) and not purely |
| | symbolic β it couples a language model with a neuromorphic Biological Neural Network (BNN) |
| | that observes, evaluates, and selects the LLM's outputs in real time. |
| | The two components evolve together: the LLM generates, the BNN judges, |
| | and both improve from the same stream of experience. |
| |
|
| | ## Architecture: Two Systems, One Loop |
| |
|
| |  |
| |
|
| | The LLM (Falcon H1 0.5B) generates multiple candidate answers. |
| | The BNN encodes uncertainty signals as neuromorphic spike trains and selects |
| | the best candidate. The correctness of that selection feeds back as training |
| | signal for both the BNN and (via DPO) the LLM itself. |
| |
|
| | ## The BNN Component |
| |
|
| | The BNN is inspired by biological neural circuits. It uses |
| | **Leaky Integrate-and-Fire (LIF) neurons** with 4 time scales |
| | (decay constants: 0.70, 0.80, 0.85, 0.95) and generates spikes |
| | via **Poisson statistics** β the same model used to describe |
| | real neuron firing in cortex. This gives the selector a temporal |
| | memory of the generation process, not just a snapshot. |
| |
|
| |  |
| |
|
| | Runs entirely in **pure NumPy** β no GPU, no special hardware. |
| | Total weights: ~8 KB. |
| |
|
| | ## Key Discovery: Calibration Inversion |
| |
|
| | > **A small LLM is systematically more confident on wrong answers than on right ones.** |
| |
|
| | We measured first-token entropy across thousands of hybrid loop iterations. |
| | Correct answers show *higher* entropy and *lower* probability margin than wrong ones |
| | (t=2.28 and t=β3.41 respectively). The LLM "hesitates" more when it is actually correct. |
| |
|
| | This is the core insight the BNN learned to exploit. Rather than trusting the |
| | model's confidence, the hybrid system uses neuromorphic signals to see past |
| | the model's miscalibration and identify the genuinely better answer. |
| |
|
| | ## How the System Was Built: 30,000 Experiments |
| |
|
| | Merlin runs **6 autonomous researchers** every night (01:00β07:00): |
| |
|
| | | Process | Role | |
| | |---|---| |
| | | `hybrid` | Main hybrid loop β generates, encodes, selects, evaluates | |
| | | `bnn_trainer` | Retrains BNN every 5 min from accumulated experience | |
| | | `candidate_pool` | Generates diverse candidates (4 sampling strategies) | |
| | | `neuro_coupling` | BNN-guided token-by-token temperature adjustment | |
| | | `ml` | Collects DPO preference pairs for LLM fine-tuning | |
| | | `meta_analyzer` | Updates evolutionary mutation weights before each session | |
| |
|
| | Encoder parameters (pulse width, burst count, frequency, entropy scale) are found |
| | by **evolutionary search** β propose mutation, run 100 benchmark questions, |
| | keep if improvement β₯ 0.5pp. This process ran for ~**30,000 experiments** |
| | and produced 38+ confirmed improvements before this checkpoint. |
| |
|
| | ## Results |
| |
|
| | | System | Accuracy | |
| | |---|---| |
| | | Raw Falcon H1 0.5B (baseline) | 21.0% | |
| | | Hybrid Intelligence (BNN + LLM) | ~26β28% | |
| |
|
| | **+5β7 percentage points** improvement. The gap is entirely from the hybrid loop β |
| | the BNN selector adds no latency perceivable to the user (~1ms overhead). |
| |
|
| | ## DPO Fine-Tuning |
| |
|
| | The LLM component was fine-tuned with DPO on **4,234 preference pairs** |
| | collected autonomously by the `ml` researcher over multiple nights. |
| |
|
| | - LoRA: r=16, Ξ±=32, target modules: q_proj + v_proj |
| | - Ξ²=0.1, 3 epochs, cosine schedule |
| |
|
| | ## Usage |
| |
|
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | |
| | model = AutoModelForCausalLM.from_pretrained( |
| | "MerlinSafety/falcon-h1-0.5b-dpo", |
| | trust_remote_code=True, |
| | ) |
| | tokenizer = AutoTokenizer.from_pretrained( |
| | "MerlinSafety/falcon-h1-0.5b-dpo", |
| | trust_remote_code=True, |
| | ) |
| | |
| | prompt = "Question: What is the capital of France?\nAnswer:" |
| | inputs = tokenizer(prompt, return_tensors="pt") |
| | out = model.generate(**inputs, max_new_tokens=40, do_sample=False) |
| | print(tokenizer.decode(out[0][inputs['input_ids'].shape[1]:])) |
| | ``` |
| |
|
| | ## Status & Roadmap |
| |
|
| | This is **Checkpoint #1**. The hybrid loop continues to run and improve. |
| |
|
| | - [ ] Stronger base model (Qwen2.5-Math-1.5B or any Qwen3.5) |
| | - [ ] Scale DPO dataset to 10,000+ pairs |
| | - [ ] Online BNN adaptation during inference |
| | - [ ] Multi-model candidate pool |
| | - [ ] We hope to collaborate with [Cortical Labs](https://corticallabs.com) β |
| | running the hybrid loop on biological neurons (CL1) as a true wetware selector |
| | |
| | --- |
| | *Merlin Research β building hybrid intelligence, one checkpoint at a time.* |