kbsooo commited on
Commit
1b1b18d
Β·
verified Β·
1 Parent(s): 0c79602

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +50 -110
README.md CHANGED
@@ -1,110 +1,50 @@
1
- ---
2
- library_name: stable-baselines3
3
- tags:
4
- - FruitBox
5
- - reinforcement-learning
6
- - ppo
7
- - game-ai
8
- - puzzle-solving
9
- model-index:
10
- - name: AlphaApple
11
- results:
12
- - task:
13
- type: reinforcement-learning
14
- name: Reinforcement Learning
15
- dataset:
16
- name: FruitBox Game
17
- type: fruitbox
18
- metrics:
19
- - type: mean_reward
20
- value: 77.0
21
- name: Mean Episode Score
22
- - type: improvement_vs_random
23
- value: 7.1%
24
- name: Improvement vs Random
25
- - type: improvement_vs_greedy
26
- value: 5.0%
27
- name: Improvement vs Greedy
28
- ---
29
-
30
- # AlphaApple: FruitBox Game AI Agent
31
-
32
- ## Model Description
33
-
34
- 이 λͺ¨λΈμ€ ν•œκ΅­μ˜ μ‚¬κ³Όκ²Œμž„(FruitBox) 퍼즐을 ν•΄κ²°ν•˜λŠ” AI μ—μ΄μ „νŠΈμž…λ‹ˆλ‹€.
35
- 10Γ—17 κ²©μžμ—μ„œ 합이 10인 μ§μ‚¬κ°ν˜•μ„ μ°Ύμ•„ μ œκ±°ν•˜λŠ” κ²Œμž„μ„ PPO(Proximal Policy Optimization) μ•Œκ³ λ¦¬μ¦˜μœΌλ‘œ ν•™μŠ΅ν–ˆμŠ΅λ‹ˆλ‹€.
36
-
37
- ## Game Rules
38
-
39
- - 10Γ—17 격자, 각 셀은 1-9 숫자
40
- - μ§μ‚¬κ°ν˜• μ˜μ—­μ„ μ„ νƒν•΄μ„œ 숫자 합이 μ •ν™•νžˆ 10이면 ν•΄λ‹Ή μ˜μ—­ 제거
41
- - 제거된 μ…€ 개수만큼 점수 νšλ“
42
- - 더 이상 μ œκ±°ν•  수 μžˆλŠ” μ˜μ—­μ΄ μ—†μœΌλ©΄ κ²Œμž„ μ’…λ£Œ
43
-
44
- ## Performance
45
-
46
- | Agent | Average Score | Improvement |
47
- |---------|--------------|-------------|
48
- | Random | 71.9 | - |
49
- | Greedy | 73.3 | +1.9% |
50
- | **PPO** | **77.0** | **+7.1%** |
51
-
52
- ## Usage
53
-
54
- ### Python (PyTorch)
55
-
56
- ```python
57
- from stable_baselines3 import PPO
58
- from stable_baselines3.common.vec_env import DummyVecEnv
59
-
60
- # Load model
61
- model = PPO.load("pytorch_model.zip")
62
-
63
- # Use for inference
64
- obs = env.reset()
65
- action, _ = model.predict(obs)
66
- ```
67
-
68
- ### Web/JavaScript (ONNX)
69
-
70
- ```javascript
71
- import { InferenceSession } from 'onnxruntime-web';
72
-
73
- // Load ONNX model
74
- const session = await InferenceSession.create('./fruitbox_ppo.onnx');
75
-
76
- // Predict action
77
- const { action_logits } = await session.run({
78
- board_input: new ort.Tensor('float32', board_data, [1, 17, 10, 1])
79
- });
80
- const action = action_logits.data.indexOf(Math.max(...action_logits.data));
81
- ```
82
-
83
- ## Files
84
-
85
- - `pytorch_model.zip`: Original SB3 PPO model
86
- - `fruitbox_ppo.onnx`: ONNX version for web deployment (2.95MB)
87
- - `model_info.json`: Model metadata and performance metrics
88
-
89
- ## Training Details
90
-
91
- - Algorithm: PPO with action masking
92
- - Network: Custom CNN (SmallGridCNN)
93
- - Training steps: 1,000,000
94
- - Environment: Custom Gymnasium environment
95
- - Action space: 8,415 possible rectangles (masked)
96
-
97
- ## Repository
98
-
99
- Source code: https://github.com/your-username/alphaapple
100
-
101
- ## Citation
102
-
103
- ```bibtex
104
- @misc{alphaapple2024,
105
- title={AlphaApple: AI Agent for FruitBox Puzzle Game},
106
- author={Your Name},
107
- year={2024},
108
- howpublished={\url{https://huggingface.co/AlphaApple}}
109
- }
110
- ```
 
1
+ # 🍎 AlphaApple - RL for Perfect FruitBox Play
2
+
3
+ **λͺ©ν‘œ**: μ‚¬κ³Όκ²Œμž„(FruitBox) 170개 μ…€ **μ „λΆ€ 제거** (100% 클리어)
4
+
5
+ κ°•ν™”ν•™μŠ΅μœΌλ‘œ 인간을 λ„˜μ–΄μ„œλŠ” μ„±λŠ₯ 달성을 λͺ©ν‘œλ‘œ ν•˜λŠ” ν”„λ‘œμ νŠΈμž…λ‹ˆλ‹€.
6
+
7
+ ## πŸš€ ν˜„μž¬ μ§„ν–‰ 상황 및 μ„±κ³Ό
8
+ - **DQN 베이슀라인 ꡬ좕 μ™„λ£Œ**: CNN 기반의 DQN λͺ¨λΈκ³Ό 컀리큘럼 ν•™μŠ΅μ„ 톡해 μ•ˆμ •μ μΈ ν•™μŠ΅ κΈ°λ°˜μ„ λ§ˆλ ¨ν–ˆμŠ΅λ‹ˆλ‹€.
9
+ - **μ„±λŠ₯ 기둝**: μ•½ 10,000 μ—ν”Όμ†Œλ“œ ν•™μŠ΅ κ²°κ³Ό, 평균 **96% (163.4개)**의 사과λ₯Ό μ œκ±°ν•˜λŠ” μ„±κ³Όλ₯Ό λ‹¬μ„±ν–ˆμŠ΅λ‹ˆλ‹€.
10
+ - **μ†”λ£¨μ…˜ 보μž₯ν˜• ν™˜κ²½**: `BackwardBoardGenerator`λ₯Ό λ„μž…ν•˜μ—¬ 항상 해닡이 μ‘΄μž¬ν•˜λŠ” λ³΄λ“œμ—μ„œ ν•™μŠ΅ν•  수 μžˆλ„λ‘ ν™˜κ²½μ„ κ°œμ„ ν–ˆμŠ΅λ‹ˆλ‹€.
11
+
12
+ ## πŸ›  μ£Όμš” κΈ°λŠ₯
13
+ - **κ³ μ„±λŠ₯ ν™˜κ²½ (`envs/fruitbox_env.py`)**: Prefix Sum 및 Incremental Action Masking을 μ μš©ν•˜μ—¬ μ—°μ‚° 속도λ₯Ό κ·ΉλŒ€ν™”ν–ˆμŠ΅λ‹ˆλ‹€.
14
+ - **DQN μ—μ΄μ „νŠΈ (`src/agent.py`, `src/models.py`)**: 10채널 One-hot 인코딩 μž…λ ₯κ³Ό μ•‘μ…˜ λ§ˆμŠ€ν‚Ήμ„ μ§€μ›ν•˜λŠ” CNN λͺ¨λΈμž…λ‹ˆλ‹€.
15
+ - **Colab μ΅œμ ν™”**: GPU 및 TPU 가속을 μ§€μ›ν•˜λŠ” 톡합 ν•™μŠ΅ λ…ΈνŠΈλΆ(`experiments/train_colab_integrated.ipynb`, `experiments/train_colab_jax.ipynb`)을 μ œκ³΅ν•©λ‹ˆλ‹€.
16
+ - **μ‹œκ°ν™” 도ꡬ**: μ—μ΄μ „νŠΈμ˜ ν”Œλ ˆμ΄λ₯Ό 단계별 ASCII κ·Έλž˜ν”½μœΌλ‘œ λ Œλ”λ§ν•˜κ³  μ „λž΅μ„ 뢄석할 수 μžˆλŠ” κΈ°λŠ₯을 ν¬ν•¨ν•˜κ³  μžˆμŠ΅λ‹ˆλ‹€.
17
+
18
+ ## πŸ“ ν”„λ‘œμ νŠΈ ꡬ쑰
19
+ - `envs/`: μ‚¬κ³Όκ²Œμž„ ν™˜κ²½ 및 λ³΄λ“œ 생성기
20
+ - `src/`: DQN λͺ¨λΈ μ•„ν‚€ν…μ²˜ 및 μ—μ΄μ „νŠΈ 둜직
21
+ - `experiments/`: 둜컬 및 Colab용 ν•™μŠ΅ 슀크립트/λ…ΈνŠΈλΆ
22
+ - `checkpoints/`: ν•™μŠ΅λœ λͺ¨λΈ μ €μž₯ 폴더
23
+
24
+ ## πŸš€ λͺ¨λΈ 배포 및 μ‹€μ „ λ„μž…
25
+ ### 1. ONNX λ³€ν™˜ 및 Hugging Face μ—…λ‘œλ“œ
26
+ - **ONNX λ³€ν™˜**: λΈŒλΌμš°μ €μ—μ„œ μ‹€ν–‰ κ°€λŠ₯ν•˜λ„λ‘ λͺ¨λΈμ„ λ³€ν™˜ν•©λ‹ˆλ‹€.
27
+ ```bash
28
+ uv run python src/export_onnx.py --model_path checkpoints/model.pth --output_path extension/model.onnx
29
+ ```
30
+ - **Hugging Face μ—…λ‘œλ“œ**: ν•™μŠ΅λœ κ°€μ€‘μΉ˜μ™€ ONNX λͺ¨λΈμ„ ν—ˆλΈŒμ— κ³΅μœ ν•©λ‹ˆλ‹€.
31
+ ```bash
32
+ uv run python src/upload_hf.py --repo_id "μ‚¬μš©μž/리포지토리" --model_path checkpoints/model.pth --onnx_path extension/model.onnx
33
+ ```
34
+
35
+ ### 2. Chrome Extension (FruitBox Solver)
36
+ μ‹€μ œ [Gamesaien Fruit Box](https://en.gamesaien.com/game/fruit_box/) μ‚¬μ΄νŠΈμ—μ„œ λͺ¨λΈμ„ μ‹€ν–‰ν•˜μ—¬ 해닡을 μ°Ύμ•„μ£ΌλŠ” ν™•μž₯ ν”„λ‘œκ·Έλž¨μž…λ‹ˆλ‹€.
37
+
38
+ #### μ„€μΉ˜ 방법:
39
+ 1. λΈŒλΌμš°μ € μ£Όμ†Œμ°½μ— `chrome://extensions/` μž…λ ₯
40
+ 2. '개발자 λͺ¨λ“œ' ν™œμ„±ν™”
41
+ 3. 'μ••μΆ•ν•΄μ œλœ ν™•μž₯ ν”„λ‘œκ·Έλž¨μ„ λ‘œλ“œν•©λ‹ˆλ‹€' 클릭 ν›„ ν”„λ‘œμ νŠΈμ˜ `extension/` 폴더 선택
42
+ 4. **μ€‘μš”**: ν™•μž₯ ν”„λ‘œκ·Έλž¨ 폴더 μ•ˆμ— `model.onnx` 파일과 [onnxruntime-web](https://cdn.jsdelivr.net/npm/onnxruntime-web/dist/onnxruntime.min.js) λΌμ΄λΈŒλŸ¬λ¦¬κ°€ ν¬ν•¨λ˜μ–΄μ•Ό ν•©λ‹ˆλ‹€.
43
+
44
+ #### μ‚¬μš© 방법:
45
+ - κ²Œμž„ μ‚¬μ΄νŠΈ 접속 ν›„ ν™•μž₯ ν”„λ‘œκ·Έλž¨ νŒμ—…μ—μ„œ **"Find Best Move"** λ²„νŠΌ 클릭
46
+ - 화면에 졜적의 사과 λ°•μŠ€κ°€ λΉ¨κ°„μƒ‰μœΌλ‘œ ν‘œμ‹œλ©λ‹ˆλ‹€.
47
+
48
+ ## πŸ“ˆ μ•žμœΌλ‘œμ˜ κ³„νš
49
+ - **100% 클리어 도전**: ν˜„μž¬μ˜ 96% μ„±κ³Όλ₯Ό λ„˜μ–΄ 100% 클리어λ₯Ό μœ„ν•΄ 더 κΉŠμ€ 신경망(ResNet λ“±)κ³Ό PPO μ•Œκ³ λ¦¬μ¦˜ λ„μž…μ„ κ²€ν†  μ€‘μž…λ‹ˆλ‹€.
50
+ - **JAX/TPU 가속 ν™•λŒ€**: 더 λΉ λ₯Έ μ‹€ν—˜μ„ μœ„ν•΄ JAX 기반의 λΆ„μ‚° ν•™μŠ΅ ν™˜κ²½μ„ 고도화할 μ˜ˆμ •μž…λ‹ˆλ‹€.