liberal commited on
Update README.md
Browse files
README.md
CHANGED
|
@@ -327,11 +327,10 @@ Training Scope Only LoRA weights updated; main model remains fixed
|
|
| 327 |
|
| 328 |
This approach enables self-corrective, explainable, and meta-aware learning, pushing beyond standard RLHF and toward autonomous reasoning agents.
|
| 329 |
|
| 330 |
-
kl divergence chart under lora gmpo criticism (improvement on the chart)
|
| 331 |
<p align="center">
|
| 332 |
<img src="https://huggingface.co/liberalusa/liberalmind_bin/resolve/main/kl_critic_plot.png" width="600"/>
|
| 333 |
</p>
|
| 334 |
|
| 335 |
-
the block diagram of this technology
|
| 336 |
<p align="center">
|
| 337 |
-
<img src="https://huggingface.co/liberalusa/liberalmind_bin/
|
|
|
|
|
|
| 327 |
|
| 328 |
This approach enables self-corrective, explainable, and meta-aware learning, pushing beyond standard RLHF and toward autonomous reasoning agents.
|
| 329 |
|
|
|
|
| 330 |
<p align="center">
|
| 331 |
<img src="https://huggingface.co/liberalusa/liberalmind_bin/resolve/main/kl_critic_plot.png" width="600"/>
|
| 332 |
</p>
|
| 333 |
|
|
|
|
| 334 |
<p align="center">
|
| 335 |
+
<img src="https://huggingface.co/liberalusa/liberalmind_bin/resolve/main/lora_training_diagramab.png" width="400"/>
|
| 336 |
+
</p>
|