Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -334,3 +334,9 @@ This approach enables self-corrective, explainable, and meta-aware learning, pus
 <p align="center">
   <img src="https://huggingface.co/liberalusa/liberalmind_bin/resolve/main/lora_training_diagramab.png" width="600"/>
 </p>

 <p align="center">
   <img src="https://huggingface.co/liberalusa/liberalmind_bin/resolve/main/lora_training_diagramab.png" width="600"/>
 </p>
+We use a reinforcement learning method based on a GMPo reasoning loop (Generate–Match–Plan–Optimize), where each step structures the model’s decision process. A separate Critic module evaluates the output, providing a scalar reward and analysis of reasoning quality, KL divergence, and a novel intuition metric—measuring how close the model’s confidence was to actual correctness. Only LoRA adapters are updated, using KL-regularized policy optimization to ensure stable learning. The same setup is applied to long, 1000-line prompt traces, where the model learns to reflect on structured hints and task sequences during training.
+<p align="center">
+  <img src="https://huggingface.co/liberalusa/liberalmind_bin/blob/main/understanding_alignment_charta.png" width="600"/>
+</p>