Update README.md
Browse files
README.md
CHANGED
|
@@ -5,4 +5,59 @@ tags:
|
|
| 5 |
- checkers
|
| 6 |
- alphazero
|
| 7 |
- rl
|
| 8 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
- checkers
|
| 6 |
- alphazero
|
| 7 |
- rl
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
# AlphaCheckers-Zero
|
| 11 |
+
|
| 12 |
+
> A implementation of the **AlphaZero** algorithm for Checkers (Brazilian/International rules), featuring a specialized **Battle Arena** against Large Language Models (LLMs) and classical Minimax algorithms.
|
| 13 |
+
|
| 14 |
+

|
| 15 |
+

|
| 16 |
+

|
| 17 |
+
|
| 18 |
+
## About the Project
|
| 19 |
+
|
| 20 |
+
This repository contains a Deep Reinforcement Learning engine built from scratch using **PyTorch** and **Monte Carlo Tree Search (MCTS)**. The agent learns the game of Checkers solely through self-play, without any prior human knowledge or heuristics.
|
| 21 |
+
|
| 22 |
+
The project goes beyond standard implementation by including a unique **Arena Mode**, where the AlphaZero agent is benchmarked against:
|
| 23 |
+
1. **Classical Algorithms:** Minimax with Alpha-Beta Pruning.
|
| 24 |
+
2. **Generative AI:** State-of-the-art LLMs (via Groq API) to test logic vs. probabilistic generation.
|
| 25 |
+
|
| 26 |
+
## Benchmarks & Performance
|
| 27 |
+
|
| 28 |
+
The trained model (`checkers_master_final.pth`) was subjected to rigorous testing.
|
| 29 |
+
|
| 30 |
+
| Opponent | Type | Result | Notes |
|
| 31 |
+
| :--- | :--- | :--- | :--- |
|
| 32 |
+
| **Human Player** | Biological | ✅ **Win** | Surpassed the creator. |
|
| 33 |
+
| **Llama 3.3 70b** | LLM (Groq) | ✅ **Win** | Exploited the LLM's lack of spatial board consistency. |
|
| 34 |
+
| **Llama-4-maverick-17b-128e** | LLM | ✅ **Win** | Consistent tactical superiority. |
|
| 35 |
+
| **Kimi k2** | LLM | ✅ **Win** | The LLM failed to maintain long-term strategy. |
|
| 36 |
+
| **Minimax (Depth 8)** | Classical Algo | 🤝 **Draw** | **Crucial Result:** Proves the neural network has converged to a robust, defensive optimal policy, matching a brute-force engine calculating millions of moves. |
|
| 37 |
+
|
| 38 |
+
## Technical Architecture
|
| 39 |
+
|
| 40 |
+
* **Neural Network:** A ResNet-like architecture with a dual head:
|
| 41 |
+
* **Policy Head:** Outputs move probabilities ($p$).
|
| 42 |
+
* **Value Head:** Estimates the win probability ($v$) of the current state.
|
| 43 |
+
* **Inference:** Uses MCTS guided by the neural network to simulate future outcomes.
|
| 44 |
+
* **Training:** Continuous self-play loops with Replay Buffer and data augmentation.
|
| 45 |
+
|
| 46 |
+
## Project Structure
|
| 47 |
+
|
| 48 |
+
Please rename the source files to match the structure below for better organization:
|
| 49 |
+
|
| 50 |
+
* `AlphaCheckerTrainer.py` Main training loop, MCTS logic, and Network Architecture.
|
| 51 |
+
* `eval.py`: Interface to play against the AI locally.
|
| 52 |
+
* `evalLLM.py`: Script to battle against LLMs using Groq API.
|
| 53 |
+
* `evalminimax.py`: Script to battle against the Minimax algorithm.
|
| 54 |
+
* `checkers_master_final.pth`: The AlphaChecker Weight Trained
|
| 55 |
+
|
| 56 |
+
## Testing
|
| 57 |
+
https://huggingface.co/spaces/Madras1/AlphaCherckerZero
|
| 58 |
+
|
| 59 |
+
## Getting Started
|
| 60 |
+
|
| 61 |
+
|
| 62 |
+
Developed by Gabriel Yogi.
|
| 63 |
+
This project is for research purposes in the field of Reinforcement Learning.
|