Bernoulli
/

MasyuLLMAgent

Text Generation

Model card Files Files and versions

MasyuLLMAgent / README.md

Bernoulli's picture

Update README.md

af2155d verified 5 months ago

|

history blame contribute delete

713 Bytes

	---
	license: unknown
	language:
	- en
	base_model:
	- Qwen/Qwen2.5-1.5B-Instruct
	pipeline_tag: text-generation
	tags:
	- agent
	---
	This repository has weights of an LLM agent that learns to solve the logic puzzle Masyu (Necklace) using Reinforcement Learning with the GRPO algorithm.

	![alt text](./masyu_bars.png)

	These are my results of training Qwen/Qwen2-1.5B-Instruct. Due to constraints on available computational resources, a significant improvement in performance was primarily achieved for the first four difficulty levels. More extensive training—with more steps, a larger base model, or higher num_generations—would likely be required to achieve improvements on more complex puzzles.