Bernoulli commited on
Commit
af2155d
·
verified ·
1 Parent(s): 39c0f1e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -1
README.md CHANGED
@@ -8,4 +8,8 @@ pipeline_tag: text-generation
8
  tags:
9
  - agent
10
  ---
11
- **LLM agent** that learns to solve the logic puzzle **Masyu** (Necklace) using **Reinforcement Learning** with the **GRPO algorithm**.
 
 
 
 
 
8
  tags:
9
  - agent
10
  ---
11
+ This repository has weights of an **LLM agent** that learns to solve the logic puzzle **Masyu** (Necklace) using **Reinforcement Learning** with the **GRPO algorithm**.
12
+
13
+ ![alt text](./masyu_bars.png)
14
+
15
+ These are my results of training Qwen/Qwen2-1.5B-Instruct. Due to constraints on available computational resources, a significant improvement in performance was primarily achieved for the first four difficulty levels. More extensive training—with more steps, a larger base model, or higher num_generations—would likely be required to achieve improvements on more complex puzzles.