HAI-ReflectMini-0.5B
by HeuristixAI Β· Research Paper
HAI-ReflectMini-0.5B is an open-source LoRA-adapted language model that exhibits self-reflective reasoning β trained to evaluate and correct its own outputs without external feedback.
This is HeuristixAI's first released model, accompanying our published research on low-resource self-reflective fine-tuning.
How It Works
The model follows a structured four-stage reasoning pattern:
Prompt β Initial Answer β Self-Critique β Revised Answer
Rather than relying on prompting tricks or external feedback loops, reflective behaviour is internalized directly into model parameters via LoRA fine-tuning on a curated reflection dataset.
Model Details
| Property | Value |
|---|---|
| Base Model | Qwen/Qwen2.5-0.5B-Instruct |
| Adaptation | Low-Rank Adaptation (LoRA) |
| Dataset Size | 120 reflection-formatted examples |
| Domains | Logic, mathematics, ML concepts, ethics, common-sense |
| Training Hardware | Single GTX 1650 (4 GB VRAM) |
| Peak VRAM Usage | ~2.8 GB |
| Training Time | ~20 minutes |
| Released by | HeuristixAI |
Only LoRA adapters are provided. Base model weights are not redistributed.
Training Configuration
| Hyperparameter | Value |
|---|---|
| LoRA Rank | 8 |
| LoRA Alpha | 16 |
| Dropout | 0.05 |
| Epochs | 3 |
| Learning Rate | 2e-4 |
| Sequence Length | 512 |
| Quantization | 4-bit NF4 |
Training Schema
Each example follows a four-stage reflection format:
- Prompt β the input question or problem
- Initial Answer β a first attempt, intentionally imperfect
- Self-Critique β explicit identification of errors or gaps
- Revised Answer β corrected, improved reasoning
This teaches the model to evaluate its own outputs as part of inference.
Intended Use
HAI-ReflectMini-0.5B is designed for research and experimentation in:
- Self-reflective and iterative reasoning
- Low-resource fine-tuning techniques
- Parameter-efficient learning (LoRA)
- Compact language model adaptation
Suitable for educational purposes, prototyping reasoning pipelines, and studying reflection-based training paradigms.
Limitations
- Evaluation is primarily qualitative
- Reflection improves reasoning structure but does not guarantee factual correctness
- Trained on 120 examples β domain coverage is intentionally narrow
Citation
If you use this model in your research, please cite:
@misc{heuristixai2026dualpathqwen,
title={Low-Resource Self-Reflective Fine-Tuning of Compact Language Models Using LoRA},
author={HeuristixAI},
year={2026},
publisher={HuggingFace},
url={https://huggingface.co/heuristixai/HAI-ReflectMini-0.5B}
}
About HeuristixAI HeuristixAI is an independent AI research initiative publishing open work on efficient language models, applied intelligence, and human-centered AI systems.
π Website: heuristixai.com