kbsooo
/

AlphaApple

@@ -1,50 +1,65 @@
-# 🍎 AlphaApple - RL for Perfect FruitBox Play
-**목표**: 사과게임(FruitBox) 170개 셀 **전부 제거** (100% 클리어)
-강화학습으로 인간을 넘어서는 성능 달성을 목표로 하는 프로젝트입니다.
-## 🚀 현재 진행 상황 및 성과
-- **DQN 베이스라인 구축 완료**: CNN 기반의 DQN 모델과 커리큘럼 학습을 통해 안정적인 학습 기반을 마련했습니다.
-- **성능 기록**: 약 10,000 에피소드 학습 결과, 평균 **96% (163.4개)**의 사과를 제거하는 성과를 달성했습니다.
-- **솔루션 보장형 환경**: `BackwardBoardGenerator`를 도입하여 항상 해답이 존재하는 보드에서 학습할 수 있도록 환경을 개선했습니다.
-## 🛠 주요 기능
-- **고성능 환경 (`envs/fruitbox_env.py`)**: Prefix Sum 및 Incremental Action Masking을 적용하여 연산 속도를 극대화했습니다.
-- **DQN 에이전트 (`src/agent.py`, `src/models.py`)**: 10채널 One-hot 인코딩 입력과 액션 마스킹을 지원하는 CNN 모델입니다.
-- **Colab 최적화**: GPU 및 TPU 가속을 지원하는 통합 학습 노트북(`experiments/train_colab_integrated.ipynb`, `experiments/train_colab_jax.ipynb`)을 제공합니다.
-- **시각화 도구**: 에이전트의 플레이를 단계별 ASCII 그래픽으로 렌더링하고 전략을 분석할 수 있는 기능을 포함하고 있습니다.
-## 📁 프로젝트 구조
-- `envs/`: 사과게임 환경 및 보드 생성기
-- `src/`: DQN 모델 아키텍처 및 에이전트 로직
-- `experiments/`: 로컬 및 Colab용 학습 스크립트/노트북
-- `checkpoints/`: 학습된 모델 저장 폴더
-## 🚀 모델 배포 및 실전 도입
-### 1. ONNX 변환 및 Hugging Face 업로드
-- **ONNX 변환**: 브라우저에서 실행 가능하도록 모델을 변환합니다.
-  ```bash
-  uv run python src/export_onnx.py --model_path checkpoints/model.pth --output_path extension/model.onnx
-  ```
-- **Hugging Face 업로드**: 학습된 가중치와 ONNX 모델을 허브에 공유합니다.
-  ```bash
-  uv run python src/upload_hf.py --repo_id "사용자/리포지토리" --model_path checkpoints/model.pth --onnx_path extension/model.onnx
-  ```
-### 2. Chrome Extension (FruitBox Solver)
-실제 [Gamesaien Fruit Box](https://en.gamesaien.com/game/fruit_box/) 사이트에서 모델을 실행하여 해답을 찾아주는 확장 프로그램입니다.
-#### 설치 방법:
-1. 브라우저 주소창에 `chrome://extensions/` 입력
-2. '개발자 모드' 활성화
-3. '압축해제된 확장 프로그램을 로드합니다' 클릭 후 프로젝트의 `extension/` 폴더 선택
-4. **중요**: 확장 프로그램 폴더 안에 `model.onnx` 파일과 [onnxruntime-web](https://cdn.jsdelivr.net/npm/onnxruntime-web/dist/onnxruntime.min.js) 라이브러리가 포함되어야 합니다.
-#### 사용 방법:
-- 게임 사이트 접속 후 확장 프로그램 팝업에서 **"Find Best Move"** 버튼 클릭
-- 화면에 최적의 사과 박스가 빨간색으로 표시됩니다.
-## 📈 앞으로의 계획
-- **100% 클리어 도전**: 현재의 96% 성과를 넘어 100% 클리어를 위해 더 깊은 신경망(ResNet 등)과 PPO 알고리즘 도입을 검토 중입니다.
-- **JAX/TPU 가속 확대**: 더 빠른 실험을 위해 JAX 기반의 분산 학습 환경을 고도화할 예정입니다.

+---
+library_name: pytorch
+tags:
+- reinforcement-learning
+- dqn
+- onnx
+- fruitbox
+- gamesaien
+- chrome-extension
+- gymnasium
+---
+# AlphaApple - FruitBox DQN
+This model plays the FruitBox (Fruit Box) puzzle game hosted on Gamesaien. It predicts Q-values over all axis-aligned rectangles on a 10x17 board. A valid action is a rectangle whose cell sum is exactly 10; you must apply an action mask to filter invalid rectangles before selecting the best move.
+## Model summary
+- Architecture: CNN-based DQN
+- Input: one-hot board (10 channels) with shape `[1, 10, 10, 17]`
+- Output: Q-values for all rectangles (8415 actions)
+- Training: curriculum + backward board generator to ensure solvable boards
+## Files in this repo
+- `model.pth`: PyTorch checkpoint dict with `policy_net`, `target_net`, `optimizer`
+- `model.onnx`: Exported ONNX model for browser/runtime inference
+## How to use (PyTorch)
+```python
+import torch
+from src.models import FruitBoxDQN
+rows, cols = 10, 17
+n_actions = 55 * 153  # (rows*(rows+1)/2) * (cols*(cols+1)/2) = 8415
+model = FruitBoxDQN(rows, cols, n_actions)
+ckpt = torch.load("model.pth", map_location="cpu")
+state = ckpt["policy_net"] if "policy_net" in ckpt else ckpt
+model.load_state_dict(state)
+model.eval()
+```
+## How to use (ONNX / browser)
+```js
+const session = await ort.InferenceSession.create("model.onnx");
+// input: Float32Array with shape [1, 10, 10, 17]
+const output = await session.run({ input });
+// output.output.data: Q-values for 8415 rectangles
+```
+## Action masking (required)
+You must mask invalid rectangles before selecting an action. A rectangle is valid if the sum of its cells equals 10. Without the mask, the model can pick illegal moves.
+## Training details
+- Environment: FruitBoxEnvImproved (10x17)
+- Curriculum: target coverage ramps from 0.3 to 0.95
+- Optimizer: Adam, gamma=0.99
+- Episodes: 10k (Colab integrated script)
+## Limitations
+- Trained on generated boards; performance may vary on edge cases.
+- Requires an accurate action mask and correct board extraction.
+## Links
+- Game: https://en.gamesaien.com/game/fruit_box/
+- Project: https://github.com/kbsooo/AlphaApple