HAI-ReflectMini-0.5B

by HeuristixAI · Research Paper

HAI-ReflectMini-0.5B is an open-source LoRA-adapted language model that exhibits self-reflective reasoning — trained to evaluate and correct its own outputs without external feedback.

This is HeuristixAI's first released model, accompanying our published research on low-resource self-reflective fine-tuning.

How It Works

The model follows a structured four-stage reasoning pattern:

Prompt → Initial Answer → Self-Critique → Revised Answer

Rather than relying on prompting tricks or external feedback loops, reflective behaviour is internalized directly into model parameters via LoRA fine-tuning on a curated reflection dataset.

Model Details

Property	Value
Base Model	Qwen/Qwen2.5-0.5B-Instruct
Adaptation	Low-Rank Adaptation (LoRA)
Dataset Size	120 reflection-formatted examples
Domains	Logic, mathematics, ML concepts, ethics, common-sense
Training Hardware	Single GTX 1650 (4 GB VRAM)
Peak VRAM Usage	~2.8 GB
Training Time	~20 minutes
Released by	HeuristixAI

Only LoRA adapters are provided. Base model weights are not redistributed.

Training Configuration

Hyperparameter	Value
LoRA Rank	8
LoRA Alpha	16
Dropout	0.05
Epochs	3
Learning Rate	2e-4
Sequence Length	512
Quantization	4-bit NF4

Training Schema

Each example follows a four-stage reflection format:

Prompt — the input question or problem
Initial Answer — a first attempt, intentionally imperfect
Self-Critique — explicit identification of errors or gaps
Revised Answer — corrected, improved reasoning

This teaches the model to evaluate its own outputs as part of inference.

Intended Use

HAI-ReflectMini-0.5B is designed for research and experimentation in:

Self-reflective and iterative reasoning
Low-resource fine-tuning techniques
Parameter-efficient learning (LoRA)
Compact language model adaptation

Suitable for educational purposes, prototyping reasoning pipelines, and studying reflection-based training paradigms.

Limitations

Evaluation is primarily qualitative
Reflection improves reasoning structure but does not guarantee factual correctness
Trained on 120 examples — domain coverage is intentionally narrow

Citation

If you use this model in your research, please cite:

@misc{heuristixai2026dualpathqwen,
  title={Low-Resource Self-Reflective Fine-Tuning of Compact Language Models Using LoRA},
  author={HeuristixAI},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/heuristixai/HAI-ReflectMini-0.5B}
}

About HeuristixAI HeuristixAI is an independent AI research initiative publishing open work on efficient language models, applied intelligence, and human-centered AI systems.

🔗 Website: heuristixai.com

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for heuristixai/HAI-ReflectMini-0.5B

Base model

Qwen/Qwen2.5-0.5B

Finetuned

Qwen/Qwen2.5-0.5B-Instruct

Adapter

(449)

this model