AlphaChecker-Zero / README.md
Madras1's picture
Update README.md
87c8301 verified
---
license: apache-2.0
tags:
- reinforcement-learning
- checkers
- alphazero
- pytorch
- deep-reinforcement-learning
frameworks:
- pytorch
---
# AlphaCheckers-Zero
> A implementation of the **AlphaZero** algorithm for Checkers (Brazilian/International rules), featuring a specialized **Battle Arena** against Large Language Models (LLMs) and classical Minimax algorithms.
![Python](https://img.shields.io/badge/Python-3.8%2B-blue)
![PyTorch](https://img.shields.io/badge/PyTorch-Deep%20Learning-red)
![Status](https://img.shields.io/badge/Status-Maintained-green)
## About the Project
This repository contains a Deep Reinforcement Learning engine built from scratch using **PyTorch** and **Monte Carlo Tree Search (MCTS)**. The agent learns the game of Checkers solely through self-play, without any prior human knowledge or heuristics.
The project goes beyond standard implementation by including a unique **Arena Mode**, where the AlphaZero agent is benchmarked against:
1. **Classical Algorithms:** Minimax with Alpha-Beta Pruning.
2. **Generative AI:** State-of-the-art LLMs (via Groq API) to test logic vs. probabilistic generation.
## Benchmarks & Performance
The trained model (`checkers_master_final.pth`) was subjected to rigorous testing.
| Opponent | Type | Result | Notes |
| :--- | :--- | :--- | :--- |
| **Human Player** | Biological | βœ… **Win** | Surpassed the creator. |
| **Llama 3.3 70b** | LLM (Groq) | βœ… **Win** | Exploited the LLM's lack of spatial board consistency. |
| **Llama-4-maverick-17b-128e** | LLM | βœ… **Win** | Consistent tactical superiority. |
| **Kimi k2** | LLM | βœ… **Win** | The LLM failed to maintain long-term strategy. |
| **Minimax (Depth 8)** | Classical Algo | 🀝 **Draw** | **Crucial Result:** Proves the neural network has converged to a robust, defensive optimal policy, matching a brute-force engine calculating millions of moves. |
## Technical Architecture
* **Neural Network:** A ResNet-like architecture with a dual head:
* **Policy Head:** Outputs move probabilities ($p$).
* **Value Head:** Estimates the win probability ($v$) of the current state.
* **Inference:** Uses MCTS guided by the neural network to simulate future outcomes.
* **Training:** Continuous self-play loops with Replay Buffer and data augmentation.
## Project Structure
Please rename the source files to match the structure below for better organization:
* `AlphaCheckerTrainer.py` Main training loop, MCTS logic, and Network Architecture.
* `eval.py`: Interface to play against the AI locally.
* `evalLLM.py`: Script to battle against LLMs using Groq API.
* `evalminimax.py`: Script to battle against the Minimax algorithm.
* `checkers_master_final.pth`: The AlphaChecker Weight Trained
## Testing
https://huggingface.co/spaces/Madras1/AlphaCherckerZero
## Getting Started
Developed by Gabriel Yogi.
This project is for research purposes in the field of Reinforcement Learning.