kbsooo
/

AlphaApple

Reinforcement Learning

chrome-extension

Model card Files Files and versions

AlphaApple / README.md

kbsooo's picture

Upload README.md with huggingface_hub

3c2b820 verified 8 days ago

|

history blame contribute delete

2.4 kB

	---
	library_name: pytorch
	pipeline_tag: reinforcement-learning
	tags:
	- reinforcement-learning
	- dqn
	- onnx
	- fruitbox
	- gamesaien
	- chrome-extension
	- gymnasium
	---

	# AlphaApple - FruitBox DQN

	This model plays the FruitBox (Fruit Box) puzzle game hosted on Gamesaien. It predicts Q-values over all axis-aligned rectangles on a 10x17 board. A valid action is a rectangle whose cell sum is exactly 10, so you must apply an action mask to filter invalid rectangles before selecting the best move.

	## Quick facts
	- Board: 10x17, values 0-9 (0 means empty)
	- Action space: 8415 axis-aligned rectangles
	- Input: one-hot board with shape `[1, 10, 10, 17]`
	- Output: Q-values for all rectangles
	- Masking: required to remove invalid rectangles

	## Files in this repo
	- `model.pth`: PyTorch checkpoint dict with `policy_net`, `target_net`, `optimizer`
	- `model.onnx`: Exported ONNX model for browser/runtime inference

	## How to use (PyTorch)
	```python
	# Model definition is in https://github.com/kbsooo/AlphaApple (src/models.py)
	import torch
	from src.models import FruitBoxDQN

	rows, cols = 10, 17
	n_actions = 55 * 153 # (rows(rows+1)/2) (cols*(cols+1)/2) = 8415
	model = FruitBoxDQN(rows, cols, n_actions)

	ckpt = torch.load("model.pth", map_location="cpu")
	state = ckpt["policy_net"] if "policy_net" in ckpt else ckpt
	model.load_state_dict(state)
	model.eval()
	```

	## How to use (ONNX / browser)
	```js
	const session = await ort.InferenceSession.create("model.onnx");
	// input: Float32Array with shape [1, 10, 10, 17]
	const output = await session.run({ input });
	// output.output.data: Q-values for 8415 rectangles
	```

	## Action masking (required)
	You must mask invalid rectangles before selecting an action. A rectangle is valid if the sum of its cells equals 10. Without the mask, the model can pick illegal moves.

	## Training details
	- Environment: FruitBoxEnv (implemented in `envs/fruitbox_env.py`, class `FruitBoxEnvImproved`)
	- Board generator: BackwardBoardGenerator (solvable boards)
	- Curriculum: target coverage ramps from 0.3 to 0.95 in steps of 0.1
	- Optimizer: Adam, gamma=0.99, lr=1e-4
	- Episodes: 10k (Colab integrated script)

	## Limitations
	- Trained on generated boards; performance may vary on edge cases.
	- Requires an accurate action mask and correct board extraction.

	## Links
	- Game: https://en.gamesaien.com/game/fruit_box/
	- Project: https://github.com/kbsooo/AlphaApple