Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -1,110 +1,50 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
-
|
| 9 |
-
|
| 10 |
-
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
## Usage
|
| 53 |
-
|
| 54 |
-
### Python (PyTorch)
|
| 55 |
-
|
| 56 |
-
```python
|
| 57 |
-
from stable_baselines3 import PPO
|
| 58 |
-
from stable_baselines3.common.vec_env import DummyVecEnv
|
| 59 |
-
|
| 60 |
-
# Load model
|
| 61 |
-
model = PPO.load("pytorch_model.zip")
|
| 62 |
-
|
| 63 |
-
# Use for inference
|
| 64 |
-
obs = env.reset()
|
| 65 |
-
action, _ = model.predict(obs)
|
| 66 |
-
```
|
| 67 |
-
|
| 68 |
-
### Web/JavaScript (ONNX)
|
| 69 |
-
|
| 70 |
-
```javascript
|
| 71 |
-
import { InferenceSession } from 'onnxruntime-web';
|
| 72 |
-
|
| 73 |
-
// Load ONNX model
|
| 74 |
-
const session = await InferenceSession.create('./fruitbox_ppo.onnx');
|
| 75 |
-
|
| 76 |
-
// Predict action
|
| 77 |
-
const { action_logits } = await session.run({
|
| 78 |
-
board_input: new ort.Tensor('float32', board_data, [1, 17, 10, 1])
|
| 79 |
-
});
|
| 80 |
-
const action = action_logits.data.indexOf(Math.max(...action_logits.data));
|
| 81 |
-
```
|
| 82 |
-
|
| 83 |
-
## Files
|
| 84 |
-
|
| 85 |
-
- `pytorch_model.zip`: Original SB3 PPO model
|
| 86 |
-
- `fruitbox_ppo.onnx`: ONNX version for web deployment (2.95MB)
|
| 87 |
-
- `model_info.json`: Model metadata and performance metrics
|
| 88 |
-
|
| 89 |
-
## Training Details
|
| 90 |
-
|
| 91 |
-
- Algorithm: PPO with action masking
|
| 92 |
-
- Network: Custom CNN (SmallGridCNN)
|
| 93 |
-
- Training steps: 1,000,000
|
| 94 |
-
- Environment: Custom Gymnasium environment
|
| 95 |
-
- Action space: 8,415 possible rectangles (masked)
|
| 96 |
-
|
| 97 |
-
## Repository
|
| 98 |
-
|
| 99 |
-
Source code: https://github.com/your-username/alphaapple
|
| 100 |
-
|
| 101 |
-
## Citation
|
| 102 |
-
|
| 103 |
-
```bibtex
|
| 104 |
-
@misc{alphaapple2024,
|
| 105 |
-
title={AlphaApple: AI Agent for FruitBox Puzzle Game},
|
| 106 |
-
author={Your Name},
|
| 107 |
-
year={2024},
|
| 108 |
-
howpublished={\url{https://huggingface.co/AlphaApple}}
|
| 109 |
-
}
|
| 110 |
-
```
|
|
|
|
| 1 |
+
# π AlphaApple - RL for Perfect FruitBox Play
|
| 2 |
+
|
| 3 |
+
**λͺ©ν**: μ¬κ³Όκ²μ(FruitBox) 170κ° μ
**μ λΆ μ κ±°** (100% ν΄λ¦¬μ΄)
|
| 4 |
+
|
| 5 |
+
κ°ννμ΅μΌλ‘ μΈκ°μ λμ΄μλ μ±λ₯ λ¬μ±μ λͺ©νλ‘ νλ νλ‘μ νΈμ
λλ€.
|
| 6 |
+
|
| 7 |
+
## π νμ¬ μ§ν μν© λ° μ±κ³Ό
|
| 8 |
+
- **DQN λ² μ΄μ€λΌμΈ κ΅¬μΆ μλ£**: CNN κΈ°λ°μ DQN λͺ¨λΈκ³Ό 컀리νλΌ νμ΅μ ν΅ν΄ μμ μ μΈ νμ΅ κΈ°λ°μ λ§λ ¨νμ΅λλ€.
|
| 9 |
+
- **μ±λ₯ κΈ°λ‘**: μ½ 10,000 μνΌμλ νμ΅ κ²°κ³Ό, νκ· **96% (163.4κ°)**μ μ¬κ³Όλ₯Ό μ κ±°νλ μ±κ³Όλ₯Ό λ¬μ±νμ΅λλ€.
|
| 10 |
+
- **μ루μ
보μ₯ν νκ²½**: `BackwardBoardGenerator`λ₯Ό λμ
νμ¬ νμ ν΄λ΅μ΄ μ‘΄μ¬νλ 보λμμ νμ΅ν μ μλλ‘ νκ²½μ κ°μ νμ΅λλ€.
|
| 11 |
+
|
| 12 |
+
## π μ£Όμ κΈ°λ₯
|
| 13 |
+
- **κ³ μ±λ₯ νκ²½ (`envs/fruitbox_env.py`)**: Prefix Sum λ° Incremental Action Maskingμ μ μ©νμ¬ μ°μ° μλλ₯Ό κ·Ήλννμ΅λλ€.
|
| 14 |
+
- **DQN μμ΄μ νΈ (`src/agent.py`, `src/models.py`)**: 10μ±λ One-hot μΈμ½λ© μ
λ ₯κ³Ό μ‘μ
λ§μ€νΉμ μ§μνλ CNN λͺ¨λΈμ
λλ€.
|
| 15 |
+
- **Colab μ΅μ ν**: GPU λ° TPU κ°μμ μ§μνλ ν΅ν© νμ΅ λ
ΈνΈλΆ(`experiments/train_colab_integrated.ipynb`, `experiments/train_colab_jax.ipynb`)μ μ 곡ν©λλ€.
|
| 16 |
+
- **μκ°ν λꡬ**: μμ΄μ νΈμ νλ μ΄λ₯Ό λ¨κ³λ³ ASCII κ·Έλν½μΌλ‘ λ λλ§νκ³ μ λ΅μ λΆμν μ μλ κΈ°λ₯μ ν¬ν¨νκ³ μμ΅λλ€.
|
| 17 |
+
|
| 18 |
+
## π νλ‘μ νΈ κ΅¬μ‘°
|
| 19 |
+
- `envs/`: μ¬κ³Όκ²μ νκ²½ λ° λ³΄λ μμ±κΈ°
|
| 20 |
+
- `src/`: DQN λͺ¨λΈ μν€ν
μ² λ° μμ΄μ νΈ λ‘μ§
|
| 21 |
+
- `experiments/`: λ‘컬 λ° Colabμ© νμ΅ μ€ν¬λ¦½νΈ/λ
ΈνΈλΆ
|
| 22 |
+
- `checkpoints/`: νμ΅λ λͺ¨λΈ μ μ₯ ν΄λ
|
| 23 |
+
|
| 24 |
+
## π λͺ¨λΈ λ°°ν¬ λ° μ€μ λμ
|
| 25 |
+
### 1. ONNX λ³ν λ° Hugging Face μ
λ‘λ
|
| 26 |
+
- **ONNX λ³ν**: λΈλΌμ°μ μμ μ€ν κ°λ₯νλλ‘ λͺ¨λΈμ λ³νν©λλ€.
|
| 27 |
+
```bash
|
| 28 |
+
uv run python src/export_onnx.py --model_path checkpoints/model.pth --output_path extension/model.onnx
|
| 29 |
+
```
|
| 30 |
+
- **Hugging Face μ
λ‘λ**: νμ΅λ κ°μ€μΉμ ONNX λͺ¨λΈμ νλΈμ 곡μ ν©λλ€.
|
| 31 |
+
```bash
|
| 32 |
+
uv run python src/upload_hf.py --repo_id "μ¬μ©μ/리ν¬μ§ν 리" --model_path checkpoints/model.pth --onnx_path extension/model.onnx
|
| 33 |
+
```
|
| 34 |
+
|
| 35 |
+
### 2. Chrome Extension (FruitBox Solver)
|
| 36 |
+
μ€μ [Gamesaien Fruit Box](https://en.gamesaien.com/game/fruit_box/) μ¬μ΄νΈμμ λͺ¨λΈμ μ€ννμ¬ ν΄λ΅μ μ°Ύμμ£Όλ νμ₯ νλ‘κ·Έλ¨μ
λλ€.
|
| 37 |
+
|
| 38 |
+
#### μ€μΉ λ°©λ²:
|
| 39 |
+
1. λΈλΌμ°μ μ£Όμμ°½μ `chrome://extensions/` μ
λ ₯
|
| 40 |
+
2. 'κ°λ°μ λͺ¨λ' νμ±ν
|
| 41 |
+
3. 'μμΆν΄μ λ νμ₯ νλ‘κ·Έλ¨μ λ‘λν©λλ€' ν΄λ¦ ν νλ‘μ νΈμ `extension/` ν΄λ μ ν
|
| 42 |
+
4. **μ€μ**: νμ₯ νλ‘κ·Έλ¨ ν΄λ μμ `model.onnx` νμΌκ³Ό [onnxruntime-web](https://cdn.jsdelivr.net/npm/onnxruntime-web/dist/onnxruntime.min.js) λΌμ΄λΈλ¬λ¦¬κ° ν¬ν¨λμ΄μΌ ν©λλ€.
|
| 43 |
+
|
| 44 |
+
#### μ¬μ© λ°©λ²:
|
| 45 |
+
- κ²μ μ¬μ΄νΈ μ μ ν νμ₯ νλ‘κ·Έλ¨ νμ
μμ **"Find Best Move"** λ²νΌ ν΄λ¦
|
| 46 |
+
- νλ©΄μ μ΅μ μ μ¬κ³Ό λ°μ€κ° λΉ¨κ°μμΌλ‘ νμλ©λλ€.
|
| 47 |
+
|
| 48 |
+
## π μμΌλ‘μ κ³ν
|
| 49 |
+
- **100% ν΄λ¦¬μ΄ λμ **: νμ¬μ 96% μ±κ³Όλ₯Ό λμ΄ 100% ν΄λ¦¬μ΄λ₯Ό μν΄ λ κΉμ μ κ²½λ§(ResNet λ±)κ³Ό PPO μκ³ λ¦¬μ¦ λμ
μ κ²ν μ€μ
λλ€.
|
| 50 |
+
- **JAX/TPU κ°μ νλ**: λ λΉ λ₯Έ μ€νμ μν΄ JAX κΈ°λ°μ λΆμ° νμ΅ νκ²½μ κ³ λνν μμ μ
λλ€.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|