HeuristixAI β Dual-Path Disagreement Resolution Model
A QLoRA fine-tuned adapter for HAI-DualPath-0.5B trained to generate two competing answers, explicitly identify the disagreement between them, and resolve it into a final correct answer.
This is Project 2 of the HeuristixAI research series.
Project 1: HAI-ReflectMini-0.5B
by HeuristixAI Β· Research Paper
Model Description
Most small language models produce a single answer directly. This model is trained to reason through competing hypotheses before committing β generating Answer A, Answer B, identifying what specifically conflicts, then resolving to a final answer.
This structured disagreement-resolution pattern is a novel training schema not previously demonstrated at sub-1B parameter scale.
Training Details
| Parameter | Value |
|---|---|
| Base model | Qwen2.5-0.5B-Instruct |
| Method | QLoRA (4-bit quantization) |
| LoRA Rank | 8 |
| LoRA Alpha | 16 |
| LoRA Dropout | 0.05 |
| Epochs | 3 |
| Learning Rate | 2e-4 |
| Context Length | 768 tokens |
| Peak VRAM | 2.33 GB |
| Training Time | ~55 minutes |
| Hardware | NVIDIA GTX 1650 (4GB) |
| Final Train Loss | 1.733 |
Dataset
160 structured samples across two domains:
- Logic / Math (80 samples): arithmetic traps, logical syllogisms, probability puzzles, rate/work/time problems
- Common Sense (80 samples): causal reasoning, social judgment, science intuition, decision making
Each sample contains five fields: prompt, answer_a, answer_b, disagreement, resolution.
Output Format
Given a prompt, the model responds in this structure:
**Answer A:** [first reasoning path]
**Answer B:** [competing reasoning path]
**Disagreement:** [specific conflict between A and B]
**Resolution:** [final adjudicated answer with justification]
Evaluation
Evaluated on 20 held-out prompts not present in training data.
| Metric | Result |
|---|---|
| Dual-path format adherence | 20 / 20 (100%) |
| Disagreement field present | 20 / 20 (100%) |
Ablation finding: A model trained without the Disagreement field achieves lower training loss (1.421 vs 1.488) but produces weaker resolution quality β confirming that explicit disagreement identification acts as a necessary intermediate reasoning scaffold.
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch
model_name = "Qwen/Qwen2.5-0.5B-Instruct"
adapter_path = "heuristixai/HAI-DualPath-0.5B"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
base_model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
device_map="auto"
)
model = PeftModel.from_pretrained(base_model, adapter_path)
prompt = "A bat and a ball cost \$1.10 total. The bat costs \$1 more than the ball. How much does the ball cost?"
formatted = f"<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant\n"
inputs = tokenizer(formatted, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=400, temperature=0.7, do_sample=True)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Limitations
- Base model is 0.5B parameters β factual accuracy is limited on complex scientific or mathematical problems
- The model reliably produces the correct reasoning structure but may arrive at incorrect conclusions on problems requiring deep domain knowledge
- Trained on 160 samples β a larger dataset would improve factual reliability
Citation
If you use this model in your research, please cite:
@misc{heuristixai2026dualpathqwen,
title={Dual-Path Disagreement Resolution in Small Language Models},
author={HeuristixAI},
year={2026},
publisher={HuggingFace},
url={https://huggingface.co/heuristixai/HAI-DualPath-0.5B}
}
HeuristixAI Research Series
| Project | Model | Method |
|---|---|---|
| Project 1 | HAI-ReflectMini-0.5B | Self-reflective critique via LoRA |
| Project 2 | HAI-DualPath-0.5B | Dual-path disagreement resolution via QLoRA |
- Downloads last month
- 67