--- license: unknown language: - en base_model: - Qwen/Qwen2.5-1.5B-Instruct pipeline_tag: text-generation tags: - agent --- This repository has weights of an **LLM agent** that learns to solve the logic puzzle **Masyu** (Necklace) using **Reinforcement Learning** with the **GRPO algorithm**. ![alt text](./masyu_bars.png) These are my results of training Qwen/Qwen2-1.5B-Instruct. Due to constraints on available computational resources, a significant improvement in performance was primarily achieved for the first four difficulty levels. More extensive training—with more steps, a larger base model, or higher num_generations—would likely be required to achieve improvements on more complex puzzles.