HAI-ReflectMini-0.5B

by HeuristixAI Β· Research Paper

HAI-ReflectMini-0.5B is an open-source LoRA-adapted language model that exhibits self-reflective reasoning β€” trained to evaluate and correct its own outputs without external feedback.

This is HeuristixAI's first released model, accompanying our published research on low-resource self-reflective fine-tuning.


How It Works

The model follows a structured four-stage reasoning pattern:

Prompt β†’ Initial Answer β†’ Self-Critique β†’ Revised Answer

Rather than relying on prompting tricks or external feedback loops, reflective behaviour is internalized directly into model parameters via LoRA fine-tuning on a curated reflection dataset.


Model Details

Property Value
Base Model Qwen/Qwen2.5-0.5B-Instruct
Adaptation Low-Rank Adaptation (LoRA)
Dataset Size 120 reflection-formatted examples
Domains Logic, mathematics, ML concepts, ethics, common-sense
Training Hardware Single GTX 1650 (4 GB VRAM)
Peak VRAM Usage ~2.8 GB
Training Time ~20 minutes
Released by HeuristixAI

Only LoRA adapters are provided. Base model weights are not redistributed.


Training Configuration

Hyperparameter Value
LoRA Rank 8
LoRA Alpha 16
Dropout 0.05
Epochs 3
Learning Rate 2e-4
Sequence Length 512
Quantization 4-bit NF4

Training Schema

Each example follows a four-stage reflection format:

  1. Prompt β€” the input question or problem
  2. Initial Answer β€” a first attempt, intentionally imperfect
  3. Self-Critique β€” explicit identification of errors or gaps
  4. Revised Answer β€” corrected, improved reasoning

This teaches the model to evaluate its own outputs as part of inference.


Intended Use

HAI-ReflectMini-0.5B is designed for research and experimentation in:

  • Self-reflective and iterative reasoning
  • Low-resource fine-tuning techniques
  • Parameter-efficient learning (LoRA)
  • Compact language model adaptation

Suitable for educational purposes, prototyping reasoning pipelines, and studying reflection-based training paradigms.


Limitations

  • Evaluation is primarily qualitative
  • Reflection improves reasoning structure but does not guarantee factual correctness
  • Trained on 120 examples β€” domain coverage is intentionally narrow

Citation

If you use this model in your research, please cite:

@misc{heuristixai2026dualpathqwen,
  title={Low-Resource Self-Reflective Fine-Tuning of Compact Language Models Using LoRA},
  author={HeuristixAI},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/heuristixai/HAI-ReflectMini-0.5B}
}

About HeuristixAI HeuristixAI is an independent AI research initiative publishing open work on efficient language models, applied intelligence, and human-centered AI systems.

πŸ”— Website: heuristixai.com

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for heuristixai/HAI-ReflectMini-0.5B

Adapter
(449)
this model