kbsooo
/

AlphaApple

@@ -1,5 +1,6 @@
 ---
 library_name: pytorch
 tags:
 - reinforcement-learning
 - dqn
@@ -12,13 +13,14 @@ tags:
 # AlphaApple - FruitBox DQN
-This model plays the FruitBox (Fruit Box) puzzle game hosted on Gamesaien. It predicts Q-values over all axis-aligned rectangles on a 10x17 board. A valid action is a rectangle whose cell sum is exactly 10; you must apply an action mask to filter invalid rectangles before selecting the best move.
-## Model summary
-- Architecture: CNN-based DQN
-- Input: one-hot board (10 channels) with shape `[1, 10, 10, 17]`
-- Output: Q-values for all rectangles (8415 actions)
-- Training: curriculum + backward board generator to ensure solvable boards
 ## Files in this repo
 - `model.pth`: PyTorch checkpoint dict with `policy_net`, `target_net`, `optimizer`
@@ -26,6 +28,7 @@ This model plays the FruitBox (Fruit Box) puzzle game hosted on Gamesaien. It pr
 ## How to use (PyTorch)
 ```python
 import torch
 from src.models import FruitBoxDQN
@@ -51,9 +54,10 @@ const output = await session.run({ input });
 You must mask invalid rectangles before selecting an action. A rectangle is valid if the sum of its cells equals 10. Without the mask, the model can pick illegal moves.
 ## Training details
-- Environment: FruitBoxEnvImproved (10x17)
-- Curriculum: target coverage ramps from 0.3 to 0.95
-- Optimizer: Adam, gamma=0.99
 - Episodes: 10k (Colab integrated script)
 ## Limitations

 ---
 library_name: pytorch
+pipeline_tag: reinforcement-learning
 tags:
 - reinforcement-learning
 - dqn
 # AlphaApple - FruitBox DQN
+This model plays the FruitBox (Fruit Box) puzzle game hosted on Gamesaien. It predicts Q-values over all axis-aligned rectangles on a 10x17 board. A valid action is a rectangle whose cell sum is exactly 10, so you must apply an action mask to filter invalid rectangles before selecting the best move.
+## Quick facts
+- Board: 10x17, values 0-9 (0 means empty)
+- Action space: 8415 axis-aligned rectangles
+- Input: one-hot board with shape `[1, 10, 10, 17]`
+- Output: Q-values for all rectangles
+- Masking: required to remove invalid rectangles
 ## Files in this repo
 - `model.pth`: PyTorch checkpoint dict with `policy_net`, `target_net`, `optimizer`
 ## How to use (PyTorch)
 ```python
+# Model definition is in https://github.com/kbsooo/AlphaApple (src/models.py)
 import torch
 from src.models import FruitBoxDQN
 You must mask invalid rectangles before selecting an action. A rectangle is valid if the sum of its cells equals 10. Without the mask, the model can pick illegal moves.
 ## Training details
+- Environment: FruitBoxEnv (implemented in `envs/fruitbox_env.py`, class `FruitBoxEnvImproved`)
+- Board generator: BackwardBoardGenerator (solvable boards)
+- Curriculum: target coverage ramps from 0.3 to 0.95 in steps of 0.1
+- Optimizer: Adam, gamma=0.99, lr=1e-4
 - Episodes: 10k (Colab integrated script)
 ## Limitations