kbsooo commited on
Commit
3c2b820
·
verified ·
1 Parent(s): 4440729

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +13 -9
README.md CHANGED
@@ -1,5 +1,6 @@
1
  ---
2
  library_name: pytorch
 
3
  tags:
4
  - reinforcement-learning
5
  - dqn
@@ -12,13 +13,14 @@ tags:
12
 
13
  # AlphaApple - FruitBox DQN
14
 
15
- This model plays the FruitBox (Fruit Box) puzzle game hosted on Gamesaien. It predicts Q-values over all axis-aligned rectangles on a 10x17 board. A valid action is a rectangle whose cell sum is exactly 10; you must apply an action mask to filter invalid rectangles before selecting the best move.
16
 
17
- ## Model summary
18
- - Architecture: CNN-based DQN
19
- - Input: one-hot board (10 channels) with shape `[1, 10, 10, 17]`
20
- - Output: Q-values for all rectangles (8415 actions)
21
- - Training: curriculum + backward board generator to ensure solvable boards
 
22
 
23
  ## Files in this repo
24
  - `model.pth`: PyTorch checkpoint dict with `policy_net`, `target_net`, `optimizer`
@@ -26,6 +28,7 @@ This model plays the FruitBox (Fruit Box) puzzle game hosted on Gamesaien. It pr
26
 
27
  ## How to use (PyTorch)
28
  ```python
 
29
  import torch
30
  from src.models import FruitBoxDQN
31
 
@@ -51,9 +54,10 @@ const output = await session.run({ input });
51
  You must mask invalid rectangles before selecting an action. A rectangle is valid if the sum of its cells equals 10. Without the mask, the model can pick illegal moves.
52
 
53
  ## Training details
54
- - Environment: FruitBoxEnvImproved (10x17)
55
- - Curriculum: target coverage ramps from 0.3 to 0.95
56
- - Optimizer: Adam, gamma=0.99
 
57
  - Episodes: 10k (Colab integrated script)
58
 
59
  ## Limitations
 
1
  ---
2
  library_name: pytorch
3
+ pipeline_tag: reinforcement-learning
4
  tags:
5
  - reinforcement-learning
6
  - dqn
 
13
 
14
  # AlphaApple - FruitBox DQN
15
 
16
+ This model plays the FruitBox (Fruit Box) puzzle game hosted on Gamesaien. It predicts Q-values over all axis-aligned rectangles on a 10x17 board. A valid action is a rectangle whose cell sum is exactly 10, so you must apply an action mask to filter invalid rectangles before selecting the best move.
17
 
18
+ ## Quick facts
19
+ - Board: 10x17, values 0-9 (0 means empty)
20
+ - Action space: 8415 axis-aligned rectangles
21
+ - Input: one-hot board with shape `[1, 10, 10, 17]`
22
+ - Output: Q-values for all rectangles
23
+ - Masking: required to remove invalid rectangles
24
 
25
  ## Files in this repo
26
  - `model.pth`: PyTorch checkpoint dict with `policy_net`, `target_net`, `optimizer`
 
28
 
29
  ## How to use (PyTorch)
30
  ```python
31
+ # Model definition is in https://github.com/kbsooo/AlphaApple (src/models.py)
32
  import torch
33
  from src.models import FruitBoxDQN
34
 
 
54
  You must mask invalid rectangles before selecting an action. A rectangle is valid if the sum of its cells equals 10. Without the mask, the model can pick illegal moves.
55
 
56
  ## Training details
57
+ - Environment: FruitBoxEnv (implemented in `envs/fruitbox_env.py`, class `FruitBoxEnvImproved`)
58
+ - Board generator: BackwardBoardGenerator (solvable boards)
59
+ - Curriculum: target coverage ramps from 0.3 to 0.95 in steps of 0.1
60
+ - Optimizer: Adam, gamma=0.99, lr=1e-4
61
  - Episodes: 10k (Colab integrated script)
62
 
63
  ## Limitations