|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- reinforcement-learning |
|
|
- checkers |
|
|
- alphazero |
|
|
- pytorch |
|
|
- deep-reinforcement-learning |
|
|
frameworks: |
|
|
- pytorch |
|
|
--- |
|
|
|
|
|
# AlphaCheckers-Zero |
|
|
|
|
|
> A implementation of the **AlphaZero** algorithm for Checkers (Brazilian/International rules), featuring a specialized **Battle Arena** against Large Language Models (LLMs) and classical Minimax algorithms. |
|
|
|
|
|
 |
|
|
 |
|
|
 |
|
|
|
|
|
## About the Project |
|
|
|
|
|
This repository contains a Deep Reinforcement Learning engine built from scratch using **PyTorch** and **Monte Carlo Tree Search (MCTS)**. The agent learns the game of Checkers solely through self-play, without any prior human knowledge or heuristics. |
|
|
|
|
|
The project goes beyond standard implementation by including a unique **Arena Mode**, where the AlphaZero agent is benchmarked against: |
|
|
1. **Classical Algorithms:** Minimax with Alpha-Beta Pruning. |
|
|
2. **Generative AI:** State-of-the-art LLMs (via Groq API) to test logic vs. probabilistic generation. |
|
|
|
|
|
## Benchmarks & Performance |
|
|
|
|
|
The trained model (`checkers_master_final.pth`) was subjected to rigorous testing. |
|
|
|
|
|
| Opponent | Type | Result | Notes | |
|
|
| :--- | :--- | :--- | :--- | |
|
|
| **Human Player** | Biological | β
**Win** | Surpassed the creator. | |
|
|
| **Llama 3.3 70b** | LLM (Groq) | β
**Win** | Exploited the LLM's lack of spatial board consistency. | |
|
|
| **Llama-4-maverick-17b-128e** | LLM | β
**Win** | Consistent tactical superiority. | |
|
|
| **Kimi k2** | LLM | β
**Win** | The LLM failed to maintain long-term strategy. | |
|
|
| **Minimax (Depth 8)** | Classical Algo | π€ **Draw** | **Crucial Result:** Proves the neural network has converged to a robust, defensive optimal policy, matching a brute-force engine calculating millions of moves. | |
|
|
|
|
|
## Technical Architecture |
|
|
|
|
|
* **Neural Network:** A ResNet-like architecture with a dual head: |
|
|
* **Policy Head:** Outputs move probabilities ($p$). |
|
|
* **Value Head:** Estimates the win probability ($v$) of the current state. |
|
|
* **Inference:** Uses MCTS guided by the neural network to simulate future outcomes. |
|
|
* **Training:** Continuous self-play loops with Replay Buffer and data augmentation. |
|
|
|
|
|
## Project Structure |
|
|
|
|
|
Please rename the source files to match the structure below for better organization: |
|
|
|
|
|
* `AlphaCheckerTrainer.py` Main training loop, MCTS logic, and Network Architecture. |
|
|
* `eval.py`: Interface to play against the AI locally. |
|
|
* `evalLLM.py`: Script to battle against LLMs using Groq API. |
|
|
* `evalminimax.py`: Script to battle against the Minimax algorithm. |
|
|
* `checkers_master_final.pth`: The AlphaChecker Weight Trained |
|
|
|
|
|
## Testing |
|
|
https://huggingface.co/spaces/Madras1/AlphaCherckerZero |
|
|
|
|
|
## Getting Started |
|
|
|
|
|
|
|
|
Developed by Gabriel Yogi. |
|
|
This project is for research purposes in the field of Reinforcement Learning. |