Madras1
/

AlphaChecker-Zero

Reinforcement Learning

deep-reinforcement-learning

Model card Files Files and versions

Madras1 commited on Nov 28, 2025

Commit

4b1cee9

·

verified ·

1 Parent(s): da98e14

Update README.md

Files changed (1) hide show

README.md +56 -1

README.md CHANGED Viewed

@@ -5,4 +5,59 @@ tags:
 - checkers
 - alphazero
 - rl
----

 - checkers
 - alphazero
 - rl
+---
+# AlphaCheckers-Zero
+> A implementation of the **AlphaZero** algorithm for Checkers (Brazilian/International rules), featuring a specialized **Battle Arena** against Large Language Models (LLMs) and classical Minimax algorithms.
+![Python](https://img.shields.io/badge/Python-3.8%2B-blue)
+![PyTorch](https://img.shields.io/badge/PyTorch-Deep%20Learning-red)
+![Status](https://img.shields.io/badge/Status-Maintained-green)
+## About the Project
+This repository contains a Deep Reinforcement Learning engine built from scratch using **PyTorch** and **Monte Carlo Tree Search (MCTS)**. The agent learns the game of Checkers solely through self-play, without any prior human knowledge or heuristics.
+The project goes beyond standard implementation by including a unique **Arena Mode**, where the AlphaZero agent is benchmarked against:
+1.  **Classical Algorithms:** Minimax with Alpha-Beta Pruning.
+2.  **Generative AI:** State-of-the-art LLMs (via Groq API) to test logic vs. probabilistic generation.
+##  Benchmarks & Performance
+The trained model (`checkers_master_final.pth`) was subjected to rigorous testing.
+| Opponent | Type | Result | Notes |
+| :--- | :--- | :--- | :--- |
+| **Human Player** | Biological | ✅ **Win** | Surpassed the creator. |
+| **Llama 3.3 70b** | LLM (Groq) | ✅ **Win** | Exploited the LLM's lack of spatial board consistency. |
+| **Llama-4-maverick-17b-128e** | LLM | ✅ **Win** | Consistent tactical superiority. |
+| **Kimi k2** | LLM | ✅ **Win** | The LLM failed to maintain long-term strategy. |
+| **Minimax (Depth 8)** | Classical Algo | 🤝 **Draw** | **Crucial Result:** Proves the neural network has converged to a robust, defensive optimal policy, matching a brute-force engine calculating millions of moves. |
+##  Technical Architecture
+*   **Neural Network:** A ResNet-like architecture with a dual head:
+    *   **Policy Head:** Outputs move probabilities ($p$).
+    *   **Value Head:** Estimates the win probability ($v$) of the current state.
+*   **Inference:** Uses MCTS guided by the neural network to simulate future outcomes.
+*   **Training:** Continuous self-play loops with Replay Buffer and data augmentation.
+## Project Structure
+Please rename the source files to match the structure below for better organization:
+*   `AlphaCheckerTrainer.py` Main training loop, MCTS logic, and Network Architecture.
+*   `eval.py`: Interface to play against the AI locally.
+*   `evalLLM.py`: Script to battle against LLMs using Groq API.
+*   `evalminimax.py`: Script to battle against the Minimax algorithm.
+*   `checkers_master_final.pth`: The AlphaChecker Weight Trained
+## Testing
+https://huggingface.co/spaces/Madras1/AlphaCherckerZero
+## Getting Started
+Developed by Gabriel Yogi.
+This project is for research purposes in the field of Reinforcement Learning.