File size: 2,963 Bytes

da98e14
 
 
87c8301
da98e14
 
87c8301
 
 
 
4b1cee9

---
license: apache-2.0
tags:
- reinforcement-learning
- checkers
- alphazero
- pytorch
- deep-reinforcement-learning
frameworks:
- pytorch
---

# AlphaCheckers-Zero

> A implementation of the **AlphaZero** algorithm for Checkers (Brazilian/International rules), featuring a specialized **Battle Arena** against Large Language Models (LLMs) and classical Minimax algorithms.

![Python](https://img.shields.io/badge/Python-3.8%2B-blue)
![PyTorch](https://img.shields.io/badge/PyTorch-Deep%20Learning-red)
![Status](https://img.shields.io/badge/Status-Maintained-green)

## About the Project

This repository contains a Deep Reinforcement Learning engine built from scratch using **PyTorch** and **Monte Carlo Tree Search (MCTS)**. The agent learns the game of Checkers solely through self-play, without any prior human knowledge or heuristics.

The project goes beyond standard implementation by including a unique **Arena Mode**, where the AlphaZero agent is benchmarked against:
1.  **Classical Algorithms:** Minimax with Alpha-Beta Pruning.
2.  **Generative AI:** State-of-the-art LLMs (via Groq API) to test logic vs. probabilistic generation.

##  Benchmarks & Performance

The trained model (`checkers_master_final.pth`) was subjected to rigorous testing. 

| Opponent | Type | Result | Notes |
| :--- | :--- | :--- | :--- |
| **Human Player** | Biological | ✅ **Win** | Surpassed the creator. |
| **Llama 3.3 70b** | LLM (Groq) | ✅ **Win** | Exploited the LLM's lack of spatial board consistency. |
| **Llama-4-maverick-17b-128e** | LLM | ✅ **Win** | Consistent tactical superiority. |
| **Kimi k2** | LLM | ✅ **Win** | The LLM failed to maintain long-term strategy. |
| **Minimax (Depth 8)** | Classical Algo | 🤝 **Draw** | **Crucial Result:** Proves the neural network has converged to a robust, defensive optimal policy, matching a brute-force engine calculating millions of moves. |

##  Technical Architecture

*   **Neural Network:** A ResNet-like architecture with a dual head:
    *   **Policy Head:** Outputs move probabilities ($p$).
    *   **Value Head:** Estimates the win probability ($v$) of the current state.
*   **Inference:** Uses MCTS guided by the neural network to simulate future outcomes.
*   **Training:** Continuous self-play loops with Replay Buffer and data augmentation.

## Project Structure

Please rename the source files to match the structure below for better organization:

*   `AlphaCheckerTrainer.py` Main training loop, MCTS logic, and Network Architecture.
*   `eval.py`: Interface to play against the AI locally.
*   `evalLLM.py`: Script to battle against LLMs using Groq API.
*   `evalminimax.py`: Script to battle against the Minimax algorithm.
*   `checkers_master_final.pth`: The AlphaChecker Weight Trained

## Testing
https://huggingface.co/spaces/Madras1/AlphaCherckerZero

## Getting Started


Developed by Gabriel Yogi.
This project is for research purposes in the field of Reinforcement Learning.