---
license: unknown
language:
- en
base_model:
- Qwen/Qwen2.5-1.5B-Instruct
pipeline_tag: text-generation
tags:
- agent
---
This repository has weights of an **LLM agent** that learns to solve the logic puzzle **Masyu** (Necklace) using **Reinforcement Learning** with the **GRPO algorithm**.  

![alt text](./masyu_bars.png)

These are my results of training Qwen/Qwen2-1.5B-Instruct. Due to constraints on available computational resources, a significant improvement in performance was primarily achieved for the first four difficulty levels. More extensive training—with more steps, a larger base model, or higher num_generations—would likely be required to achieve improvements on more complex puzzles.