liberal commited on
Update README.md
Browse files
README.md
CHANGED
|
@@ -334,3 +334,9 @@ This approach enables self-corrective, explainable, and meta-aware learning, pus
|
|
| 334 |
<p align="center">
|
| 335 |
<img src="https://huggingface.co/liberalusa/liberalmind_bin/resolve/main/lora_training_diagramab.png" width="600"/>
|
| 336 |
</p>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 334 |
<p align="center">
|
| 335 |
<img src="https://huggingface.co/liberalusa/liberalmind_bin/resolve/main/lora_training_diagramab.png" width="600"/>
|
| 336 |
</p>
|
| 337 |
+
|
| 338 |
+
We use a reinforcement learning method based on a GMPo reasoning loop (Generate–Match–Plan–Optimize), where each step structures the model’s decision process. A separate Critic module evaluates the output, providing a scalar reward and analysis of reasoning quality, KL divergence, and a novel intuition metric—measuring how close the model’s confidence was to actual correctness. Only LoRA adapters are updated, using KL-regularized policy optimization to ensure stable learning. The same setup is applied to long, 1000-line prompt traces, where the model learns to reflect on structured hints and task sequences during training.
|
| 339 |
+
|
| 340 |
+
<p align="center">
|
| 341 |
+
<img src="https://huggingface.co/liberalusa/liberalmind_bin/blob/main/understanding_alignment_charta.png" width="600"/>
|
| 342 |
+
</p>
|