kbsooo commited on
Commit
06de9f8
Β·
verified Β·
1 Parent(s): 5d44447

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +65 -50
README.md CHANGED
@@ -1,50 +1,65 @@
1
- # 🍎 AlphaApple - RL for Perfect FruitBox Play
2
-
3
- **λͺ©ν‘œ**: μ‚¬κ³Όκ²Œμž„(FruitBox) 170개 μ…€ **μ „λΆ€ 제거** (100% 클리어)
4
-
5
- κ°•ν™”ν•™μŠ΅μœΌλ‘œ 인간을 λ„˜μ–΄μ„œλŠ” μ„±λŠ₯ 달성을 λͺ©ν‘œλ‘œ ν•˜λŠ” ν”„λ‘œμ νŠΈμž…λ‹ˆλ‹€.
6
-
7
- ## πŸš€ ν˜„μž¬ μ§„ν–‰ 상황 및 μ„±κ³Ό
8
- - **DQN 베이슀라인 ꡬ좕 μ™„λ£Œ**: CNN 기반의 DQN λͺ¨λΈκ³Ό 컀리큘럼 ν•™μŠ΅μ„ 톡해 μ•ˆμ •μ μΈ ν•™μŠ΅ κΈ°λ°˜μ„ λ§ˆλ ¨ν–ˆμŠ΅λ‹ˆλ‹€.
9
- - **μ„±λŠ₯ 기둝**: μ•½ 10,000 μ—ν”Όμ†Œλ“œ ν•™μŠ΅ κ²°κ³Ό, 평균 **96% (163.4개)**의 사과λ₯Ό μ œκ±°ν•˜λŠ” μ„±κ³Όλ₯Ό λ‹¬μ„±ν–ˆμŠ΅λ‹ˆλ‹€.
10
- - **μ†”λ£¨μ…˜ 보μž₯ν˜• ν™˜κ²½**: `BackwardBoardGenerator`λ₯Ό λ„μž…ν•˜μ—¬ 항상 해닡이 μ‘΄μž¬ν•˜λŠ” λ³΄λ“œμ—μ„œ ν•™μŠ΅ν•  수 μžˆλ„λ‘ ν™˜κ²½μ„ κ°œμ„ ν–ˆμŠ΅λ‹ˆλ‹€.
11
-
12
- ## πŸ›  μ£Όμš” κΈ°λŠ₯
13
- - **κ³ μ„±λŠ₯ ν™˜κ²½ (`envs/fruitbox_env.py`)**: Prefix Sum 및 Incremental Action Masking을 μ μš©ν•˜μ—¬ μ—°μ‚° 속도λ₯Ό κ·ΉλŒ€ν™”ν–ˆμŠ΅λ‹ˆλ‹€.
14
- - **DQN μ—μ΄μ „νŠΈ (`src/agent.py`, `src/models.py`)**: 10채널 One-hot 인코딩 μž…λ ₯κ³Ό μ•‘μ…˜ λ§ˆμŠ€ν‚Ήμ„ μ§€μ›ν•˜λŠ” CNN λͺ¨λΈμž…λ‹ˆλ‹€.
15
- - **Colab μ΅œμ ν™”**: GPU 및 TPU 가속을 μ§€μ›ν•˜λŠ” 톡합 ν•™μŠ΅ λ…ΈνŠΈλΆ(`experiments/train_colab_integrated.ipynb`, `experiments/train_colab_jax.ipynb`)을 μ œκ³΅ν•©λ‹ˆλ‹€.
16
- - **μ‹œκ°ν™” 도ꡬ**: μ—μ΄μ „νŠΈμ˜ ν”Œλ ˆμ΄λ₯Ό 단계별 ASCII κ·Έλž˜ν”½μœΌλ‘œ λ Œλ”λ§ν•˜κ³  μ „λž΅μ„ 뢄석할 수 μžˆλŠ” κΈ°λŠ₯을 ν¬ν•¨ν•˜κ³  μžˆμŠ΅λ‹ˆλ‹€.
17
-
18
- ## πŸ“ ν”„λ‘œμ νŠΈ ꡬ쑰
19
- - `envs/`: μ‚¬κ³Όκ²Œμž„ ν™˜κ²½ 및 λ³΄λ“œ 생성기
20
- - `src/`: DQN λͺ¨λΈ μ•„ν‚€ν…μ²˜ 및 μ—μ΄μ „νŠΈ 둜직
21
- - `experiments/`: 둜컬 및 Colab용 ν•™μŠ΅ 슀크립트/λ…ΈνŠΈλΆ
22
- - `checkpoints/`: ν•™μŠ΅λœ λͺ¨λΈ μ €μž₯ 폴더
23
-
24
- ## πŸš€ λͺ¨λΈ 배포 및 μ‹€μ „ λ„μž…
25
- ### 1. ONNX λ³€ν™˜ 및 Hugging Face μ—…λ‘œλ“œ
26
- - **ONNX λ³€ν™˜**: λΈŒλΌμš°μ €μ—μ„œ μ‹€ν–‰ κ°€λŠ₯ν•˜λ„λ‘ λͺ¨λΈμ„ λ³€ν™˜ν•©λ‹ˆλ‹€.
27
- ```bash
28
- uv run python src/export_onnx.py --model_path checkpoints/model.pth --output_path extension/model.onnx
29
- ```
30
- - **Hugging Face μ—…λ‘œλ“œ**: ν•™μŠ΅λœ κ°€μ€‘μΉ˜μ™€ ONNX λͺ¨λΈμ„ ν—ˆλΈŒμ— κ³΅μœ ν•©λ‹ˆλ‹€.
31
- ```bash
32
- uv run python src/upload_hf.py --repo_id "μ‚¬μš©μž/리포지토리" --model_path checkpoints/model.pth --onnx_path extension/model.onnx
33
- ```
34
-
35
- ### 2. Chrome Extension (FruitBox Solver)
36
- μ‹€μ œ [Gamesaien Fruit Box](https://en.gamesaien.com/game/fruit_box/) μ‚¬μ΄νŠΈμ—μ„œ λͺ¨λΈμ„ μ‹€ν–‰ν•˜μ—¬ 해닡을 μ°Ύμ•„μ£ΌλŠ” ν™•μž₯ ν”„λ‘œκ·Έλž¨μž…λ‹ˆλ‹€.
37
-
38
- #### μ„€μΉ˜ 방법:
39
- 1. λΈŒλΌμš°μ € μ£Όμ†Œμ°½μ— `chrome://extensions/` μž…λ ₯
40
- 2. '개발자 λͺ¨λ“œ' ν™œμ„±ν™”
41
- 3. 'μ••μΆ•ν•΄μ œλœ ν™•μž₯ ν”„λ‘œκ·Έλž¨μ„ λ‘œλ“œν•©λ‹ˆλ‹€' 클릭 ν›„ ν”„λ‘œμ νŠΈμ˜ `extension/` 폴더 선택
42
- 4. **μ€‘μš”**: ν™•μž₯ ν”„λ‘œκ·Έλž¨ 폴더 μ•ˆμ— `model.onnx` 파일과 [onnxruntime-web](https://cdn.jsdelivr.net/npm/onnxruntime-web/dist/onnxruntime.min.js) λΌμ΄λΈŒλŸ¬λ¦¬κ°€ ν¬ν•¨λ˜μ–΄μ•Ό ν•©λ‹ˆλ‹€.
43
-
44
- #### μ‚¬μš© 방법:
45
- - κ²Œμž„ μ‚¬μ΄νŠΈ 접속 ν›„ ν™•μž₯ ν”„λ‘œκ·Έλž¨ νŒμ—…μ—μ„œ **"Find Best Move"** λ²„νŠΌ 클릭
46
- - 화면에 졜적의 사과 λ°•μŠ€κ°€ λΉ¨κ°„μƒ‰μœΌλ‘œ ν‘œμ‹œλ©λ‹ˆλ‹€.
47
-
48
- ## πŸ“ˆ μ•žμœΌλ‘œμ˜ κ³„νš
49
- - **100% 클리어 도전**: ν˜„μž¬μ˜ 96% μ„±κ³Όλ₯Ό λ„˜μ–΄ 100% 클리어λ₯Ό μœ„ν•΄ 더 κΉŠμ€ 신경망(ResNet λ“±)κ³Ό PPO μ•Œκ³ λ¦¬μ¦˜ λ„μž…μ„ κ²€ν†  μ€‘μž…λ‹ˆλ‹€.
50
- - **JAX/TPU 가속 ν™•λŒ€**: 더 λΉ λ₯Έ μ‹€ν—˜μ„ μœ„ν•΄ JAX 기반의 λΆ„μ‚° ν•™μŠ΅ ν™˜κ²½μ„ 고도화할 μ˜ˆμ •μž…λ‹ˆλ‹€.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: pytorch
3
+ tags:
4
+ - reinforcement-learning
5
+ - dqn
6
+ - onnx
7
+ - fruitbox
8
+ - gamesaien
9
+ - chrome-extension
10
+ - gymnasium
11
+ ---
12
+
13
+ # AlphaApple - FruitBox DQN
14
+
15
+ This model plays the FruitBox (Fruit Box) puzzle game hosted on Gamesaien. It predicts Q-values over all axis-aligned rectangles on a 10x17 board. A valid action is a rectangle whose cell sum is exactly 10; you must apply an action mask to filter invalid rectangles before selecting the best move.
16
+
17
+ ## Model summary
18
+ - Architecture: CNN-based DQN
19
+ - Input: one-hot board (10 channels) with shape `[1, 10, 10, 17]`
20
+ - Output: Q-values for all rectangles (8415 actions)
21
+ - Training: curriculum + backward board generator to ensure solvable boards
22
+
23
+ ## Files in this repo
24
+ - `model.pth`: PyTorch checkpoint dict with `policy_net`, `target_net`, `optimizer`
25
+ - `model.onnx`: Exported ONNX model for browser/runtime inference
26
+
27
+ ## How to use (PyTorch)
28
+ ```python
29
+ import torch
30
+ from src.models import FruitBoxDQN
31
+
32
+ rows, cols = 10, 17
33
+ n_actions = 55 * 153 # (rows*(rows+1)/2) * (cols*(cols+1)/2) = 8415
34
+ model = FruitBoxDQN(rows, cols, n_actions)
35
+
36
+ ckpt = torch.load("model.pth", map_location="cpu")
37
+ state = ckpt["policy_net"] if "policy_net" in ckpt else ckpt
38
+ model.load_state_dict(state)
39
+ model.eval()
40
+ ```
41
+
42
+ ## How to use (ONNX / browser)
43
+ ```js
44
+ const session = await ort.InferenceSession.create("model.onnx");
45
+ // input: Float32Array with shape [1, 10, 10, 17]
46
+ const output = await session.run({ input });
47
+ // output.output.data: Q-values for 8415 rectangles
48
+ ```
49
+
50
+ ## Action masking (required)
51
+ You must mask invalid rectangles before selecting an action. A rectangle is valid if the sum of its cells equals 10. Without the mask, the model can pick illegal moves.
52
+
53
+ ## Training details
54
+ - Environment: FruitBoxEnvImproved (10x17)
55
+ - Curriculum: target coverage ramps from 0.3 to 0.95
56
+ - Optimizer: Adam, gamma=0.99
57
+ - Episodes: 10k (Colab integrated script)
58
+
59
+ ## Limitations
60
+ - Trained on generated boards; performance may vary on edge cases.
61
+ - Requires an accurate action mask and correct board extraction.
62
+
63
+ ## Links
64
+ - Game: https://en.gamesaien.com/game/fruit_box/
65
+ - Project: https://github.com/kbsooo/AlphaApple