HeuristixAI — Dual-Path Disagreement Resolution Model

A QLoRA fine-tuned adapter for HAI-DualPath-0.5B trained to generate two competing answers, explicitly identify the disagreement between them, and resolve it into a final correct answer.

This is Project 2 of the HeuristixAI research series.
Project 1: HAI-ReflectMini-0.5B

by HeuristixAI · Research Paper

Model Description

Most small language models produce a single answer directly. This model is trained to reason through competing hypotheses before committing — generating Answer A, Answer B, identifying what specifically conflicts, then resolving to a final answer.

This structured disagreement-resolution pattern is a novel training schema not previously demonstrated at sub-1B parameter scale.

Training Details

Parameter	Value
Base model	Qwen2.5-0.5B-Instruct
Method	QLoRA (4-bit quantization)
LoRA Rank	8
LoRA Alpha	16
LoRA Dropout	0.05
Epochs	3
Learning Rate	2e-4
Context Length	768 tokens
Peak VRAM	2.33 GB
Training Time	~55 minutes
Hardware	NVIDIA GTX 1650 (4GB)
Final Train Loss	1.733

Dataset

160 structured samples across two domains:

Logic / Math (80 samples): arithmetic traps, logical syllogisms, probability puzzles, rate/work/time problems
Common Sense (80 samples): causal reasoning, social judgment, science intuition, decision making

Each sample contains five fields: prompt, answer_a, answer_b, disagreement, resolution.

Output Format

Given a prompt, the model responds in this structure:

**Answer A:** [first reasoning path]
**Answer B:** [competing reasoning path]
**Disagreement:** [specific conflict between A and B]
**Resolution:** [final adjudicated answer with justification]

Evaluation

Evaluated on 20 held-out prompts not present in training data.

Metric	Result
Dual-path format adherence	20 / 20 (100%)
Disagreement field present	20 / 20 (100%)

Ablation finding: A model trained without the Disagreement field achieves lower training loss (1.421 vs 1.488) but produces weaker resolution quality — confirming that explicit disagreement identification acts as a necessary intermediate reasoning scaffold.

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch

model_name = "Qwen/Qwen2.5-0.5B-Instruct"
adapter_path = "heuristixai/HAI-DualPath-0.5B"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True
)

tokenizer = AutoTokenizer.from_pretrained(model_name)
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto"
)

model = PeftModel.from_pretrained(base_model, adapter_path)

prompt = "A bat and a ball cost \$1.10 total. The bat costs \$1 more than the ball. How much does the ball cost?"
formatted = f"<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant\n"

inputs = tokenizer(formatted, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=400, temperature=0.7, do_sample=True)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Limitations

Base model is 0.5B parameters — factual accuracy is limited on complex scientific or mathematical problems
The model reliably produces the correct reasoning structure but may arrive at incorrect conclusions on problems requiring deep domain knowledge
Trained on 160 samples — a larger dataset would improve factual reliability

Citation

If you use this model in your research, please cite:

@misc{heuristixai2026dualpathqwen,
  title={Dual-Path Disagreement Resolution in Small Language Models},
  author={HeuristixAI},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/heuristixai/HAI-DualPath-0.5B}
}

HeuristixAI Research Series

Project	Model	Method
Project 1	HAI-ReflectMini-0.5B	Self-reflective critique via LoRA
Project 2	HAI-DualPath-0.5B	Dual-path disagreement resolution via QLoRA

Downloads last month: 67

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for heuristixai/HAI-DualPath-0.5B

Base model

Qwen/Qwen2.5-0.5B

Finetuned

Qwen/Qwen2.5-0.5B-Instruct

Adapter

(450)

this model