Madras1 commited on
Commit
4b1cee9
·
verified ·
1 Parent(s): da98e14

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -1
README.md CHANGED
@@ -5,4 +5,59 @@ tags:
5
  - checkers
6
  - alphazero
7
  - rl
8
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  - checkers
6
  - alphazero
7
  - rl
8
+ ---
9
+
10
+ # AlphaCheckers-Zero
11
+
12
+ > A implementation of the **AlphaZero** algorithm for Checkers (Brazilian/International rules), featuring a specialized **Battle Arena** against Large Language Models (LLMs) and classical Minimax algorithms.
13
+
14
+ ![Python](https://img.shields.io/badge/Python-3.8%2B-blue)
15
+ ![PyTorch](https://img.shields.io/badge/PyTorch-Deep%20Learning-red)
16
+ ![Status](https://img.shields.io/badge/Status-Maintained-green)
17
+
18
+ ## About the Project
19
+
20
+ This repository contains a Deep Reinforcement Learning engine built from scratch using **PyTorch** and **Monte Carlo Tree Search (MCTS)**. The agent learns the game of Checkers solely through self-play, without any prior human knowledge or heuristics.
21
+
22
+ The project goes beyond standard implementation by including a unique **Arena Mode**, where the AlphaZero agent is benchmarked against:
23
+ 1. **Classical Algorithms:** Minimax with Alpha-Beta Pruning.
24
+ 2. **Generative AI:** State-of-the-art LLMs (via Groq API) to test logic vs. probabilistic generation.
25
+
26
+ ## Benchmarks & Performance
27
+
28
+ The trained model (`checkers_master_final.pth`) was subjected to rigorous testing.
29
+
30
+ | Opponent | Type | Result | Notes |
31
+ | :--- | :--- | :--- | :--- |
32
+ | **Human Player** | Biological | ✅ **Win** | Surpassed the creator. |
33
+ | **Llama 3.3 70b** | LLM (Groq) | ✅ **Win** | Exploited the LLM's lack of spatial board consistency. |
34
+ | **Llama-4-maverick-17b-128e** | LLM | ✅ **Win** | Consistent tactical superiority. |
35
+ | **Kimi k2** | LLM | ✅ **Win** | The LLM failed to maintain long-term strategy. |
36
+ | **Minimax (Depth 8)** | Classical Algo | 🤝 **Draw** | **Crucial Result:** Proves the neural network has converged to a robust, defensive optimal policy, matching a brute-force engine calculating millions of moves. |
37
+
38
+ ## Technical Architecture
39
+
40
+ * **Neural Network:** A ResNet-like architecture with a dual head:
41
+ * **Policy Head:** Outputs move probabilities ($p$).
42
+ * **Value Head:** Estimates the win probability ($v$) of the current state.
43
+ * **Inference:** Uses MCTS guided by the neural network to simulate future outcomes.
44
+ * **Training:** Continuous self-play loops with Replay Buffer and data augmentation.
45
+
46
+ ## Project Structure
47
+
48
+ Please rename the source files to match the structure below for better organization:
49
+
50
+ * `AlphaCheckerTrainer.py` Main training loop, MCTS logic, and Network Architecture.
51
+ * `eval.py`: Interface to play against the AI locally.
52
+ * `evalLLM.py`: Script to battle against LLMs using Groq API.
53
+ * `evalminimax.py`: Script to battle against the Minimax algorithm.
54
+ * `checkers_master_final.pth`: The AlphaChecker Weight Trained
55
+
56
+ ## Testing
57
+ https://huggingface.co/spaces/Madras1/AlphaCherckerZero
58
+
59
+ ## Getting Started
60
+
61
+
62
+ Developed by Gabriel Yogi.
63
+ This project is for research purposes in the field of Reinforcement Learning.