heuristixai commited on
Commit
a5df70f
·
verified ·
1 Parent(s): 4bcb9b5

Update readme.md

Browse files
Files changed (1) hide show
  1. readme.md +51 -17
readme.md CHANGED
@@ -1,32 +1,66 @@
1
- # HeuristixAI Self-Reflect Qwen 0.5B (v1)
2
 
3
- This repository contains LoRA adapters trained to induce
4
- self-reflective reasoning behavior in a compact language model.
 
 
 
 
 
5
 
6
  ## Base Model
 
7
  Qwen/Qwen2.5-0.5B-Instruct
8
 
 
 
9
  ## Method
10
- Parameter-efficient fine-tuning (LoRA) on reflection-formatted data:
11
- Prompt Initial Answer → Self-Critique → Revised Answer
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
  ## Capabilities
 
14
  - Structured reasoning
15
- - Self-critique behavior
16
  - Reduced hallucination
17
  - Improved logical consistency
 
18
 
19
- ## Training Setup
20
- - LoRA r=8, alpha=16, dropout=0.05
21
- - 4-bit NF4 quantization
22
- - Dataset size: 120 reflection examples
23
- - Peak VRAM: ~2.8 GB
24
- - Training time: ~20 minutes (GTX 1650)
25
 
26
  ## Usage
27
- See reflection_lora_v1_demo.py for example inference.
28
 
29
- ## License
30
- Adapters released for research use.
31
- ---
32
- Developed by HeuristixAI.
 
 
 
1
+ # HeuristixAI Self-Reflect Qwen 0.5B
2
 
3
+ This repository contains LoRA adapters trained to induce **self-reflective reasoning** in a compact language model.
4
+
5
+ The model learns to follow a structured pattern:
6
+
7
+ Prompt → Initial Answer → Self-Critique → Revised Answer
8
+
9
+ ---
10
 
11
  ## Base Model
12
+
13
  Qwen/Qwen2.5-0.5B-Instruct
14
 
15
+ ---
16
+
17
  ## Method
18
+
19
+ Parameter-efficient fine-tuning (LoRA) on reflection-formatted data.
20
+
21
+ Each training example contains:
22
+
23
+ - Prompt
24
+ - Initial Answer (intentionally imperfect)
25
+ - Self-Critique
26
+ - Revised Answer
27
+
28
+ This explicitly teaches the model to evaluate and correct its own outputs.
29
+
30
+ ---
31
+
32
+ ## Training Setup
33
+
34
+ - LoRA rank: 8
35
+ - Alpha: 16
36
+ - Dropout: 0.05
37
+ - Epochs: 3
38
+ - Learning rate: 2e-4
39
+ - Sequence length: 512
40
+ - Quantization: 4-bit NF4
41
+ - Dataset size: 120 reflection examples
42
+ - Peak VRAM: ~2.8 GB
43
+ - Training time: ~20 minutes (GTX 1650)
44
+
45
+ Only LoRA adapters are provided.
46
+
47
+ ---
48
 
49
  ## Capabilities
50
+
51
  - Structured reasoning
52
+ - Explicit self-critique
53
  - Reduced hallucination
54
  - Improved logical consistency
55
+ - Better explanation quality
56
 
57
+ ---
 
 
 
 
 
58
 
59
  ## Usage
 
60
 
61
+ See `reflection_lora_v1_demo.py` for inference example.
62
+
63
+ Install dependencies:
64
+
65
+ ```bash
66
+ pip install -r requirements.txt