Update README.md
Browse files
README.md
CHANGED
|
@@ -8,4 +8,8 @@ pipeline_tag: text-generation
|
|
| 8 |
tags:
|
| 9 |
- agent
|
| 10 |
---
|
| 11 |
-
**LLM agent** that learns to solve the logic puzzle **Masyu** (Necklace) using **Reinforcement Learning** with the **GRPO algorithm**.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
tags:
|
| 9 |
- agent
|
| 10 |
---
|
| 11 |
+
This repository has weights of an **LLM agent** that learns to solve the logic puzzle **Masyu** (Necklace) using **Reinforcement Learning** with the **GRPO algorithm**.
|
| 12 |
+
|
| 13 |
+

|
| 14 |
+
|
| 15 |
+
These are my results of training Qwen/Qwen2-1.5B-Instruct. Due to constraints on available computational resources, a significant improvement in performance was primarily achieved for the first four difficulty levels. More extensive training—with more steps, a larger base model, or higher num_generations—would likely be required to achieve improvements on more complex puzzles.
|