heuristixai
/

HAI-ReflectMini-0.5B

self-reflection

parameter-efficient-finetuning

Model card Files Files and versions

heuristixai commited on Feb 18

Commit

a5df70f

·

verified ·

1 Parent(s): 4bcb9b5

Update readme.md

Files changed (1) hide show

readme.md +51 -17

readme.md CHANGED Viewed

@@ -1,32 +1,66 @@
-# HeuristixAI Self-Reflect Qwen 0.5B (v1)
-This repository contains LoRA adapters trained to induce
-self-reflective reasoning behavior in a compact language model.
 ## Base Model
 Qwen/Qwen2.5-0.5B-Instruct
 ## Method
-Parameter-efficient fine-tuning (LoRA) on reflection-formatted data:
-Prompt → Initial Answer → Self-Critique → Revised Answer
 ## Capabilities
 - Structured reasoning
-- Self-critique behavior
 - Reduced hallucination
 - Improved logical consistency
-## Training Setup
-- LoRA r=8, alpha=16, dropout=0.05
-- 4-bit NF4 quantization
-- Dataset size: 120 reflection examples
-- Peak VRAM: ~2.8 GB
-- Training time: ~20 minutes (GTX 1650)
 ## Usage
-See reflection_lora_v1_demo.py for example inference.
-## License
-Adapters released for research use.
----
-Developed by HeuristixAI.

+# HeuristixAI Self-Reflect Qwen 0.5B
+This repository contains LoRA adapters trained to induce **self-reflective reasoning** in a compact language model.
+The model learns to follow a structured pattern:
+Prompt → Initial Answer → Self-Critique → Revised Answer
+---
 ## Base Model
 Qwen/Qwen2.5-0.5B-Instruct
+---
 ## Method
+Parameter-efficient fine-tuning (LoRA) on reflection-formatted data.
+Each training example contains:
+- Prompt
+- Initial Answer (intentionally imperfect)
+- Self-Critique
+- Revised Answer
+This explicitly teaches the model to evaluate and correct its own outputs.
+---
+## Training Setup
+- LoRA rank: 8
+- Alpha: 16
+- Dropout: 0.05
+- Epochs: 3
+- Learning rate: 2e-4
+- Sequence length: 512
+- Quantization: 4-bit NF4
+- Dataset size: 120 reflection examples
+- Peak VRAM: ~2.8 GB
+- Training time: ~20 minutes (GTX 1650)
+Only LoRA adapters are provided.
+---
 ## Capabilities
 - Structured reasoning
+- Explicit self-critique
 - Reduced hallucination
 - Improved logical consistency
+- Better explanation quality
+---
 ## Usage
+See `reflection_lora_v1_demo.py` for inference example.
+Install dependencies:
+```bash
+pip install -r requirements.txt