reeeemo
/

ppo-puzzler

Reinforcement Learning

stable-baselines3

Model card Files Files and versions

Metrics Training metrics Community

ppo-puzzler / README.md

reeeemo's picture

Update README.md

1d53a18 verified about 1 month ago

|

history blame contribute delete

1.71 kB

	---
	license: mit
	datasets:
	- reeeemo/jigsaw_puzzle
	pipeline_tag: reinforcement-learning
	library_name: stable-baselines3
	---

	Reinforcement-Learning model (puzzler-v0) that utilizes MaskablePPO to guide assembly of a jigsaw puzzle (puzzle pieces with irregular, convex boundaries).

	Code to train/create this custom environment can be found in the "How Puzzling!" [github repo](https://github.com/reeeeemo/how-puzzling).

	Initialization Parameters:
	```
	def __init__(self, images, seg_model_path, max_steps=100, device="cpu")
	```
	- images: list[np.ndarray]
	- List of images that MUST BE SAME SAME, SAME ORIENTATION (3x3, 4x4 puzzles, etc.).
	- Each will be randomly initialized every `.reset()` of the environment
	- seg_model_path: string
	- Path to [image segmentation weight folder](https://huggingface.co/reeeemo/puzzle-segment-model)
	- Note that any image segmentation that segments puzzle pieces will work, however they must be axis aligned
	- The segmentation model is used to initialize another custom model found in the same github repo, [here](https://github.com/reeeeemo/how-puzzling/blob/main/model/model.py)
	- max_steps: int
	- Max number of steps allowed
	- Using MaskablePPO solves the issue of infinite actions, but if you decide to use PPO, ensure `max_steps` is set to the max number of puzzle pieces
	- device: string
	- CPU or GPU usage


	Training data can be found in [events.out.tfevents](./events.out.tfevents.0.500k_5_images_norm_rew) using `tensorboard --logdir .` after downloading the repo.

	The environment was trained with `VecNormalize` from the `stable_baselines3` libary, you can load from [vec_normalize.pkl](./vec_normalize.pkl)