Madras1
/

AlphaChecker-Zero

Reinforcement Learning

deep-reinforcement-learning

Model card Files Files and versions

AlphaChecker-Zero / README.md

Madras1's picture

Update README.md

87c8301 verified about 1 month ago

|

history blame contribute delete

2.96 kB

	---
	license: apache-2.0
	tags:
	- reinforcement-learning
	- checkers
	- alphazero
	- pytorch
	- deep-reinforcement-learning
	frameworks:
	- pytorch
	---

	# AlphaCheckers-Zero

	> A implementation of the AlphaZero algorithm for Checkers (Brazilian/International rules), featuring a specialized Battle Arena against Large Language Models (LLMs) and classical Minimax algorithms.

	![Python](https://img.shields.io/badge/Python-3.8%2B-blue)
	![PyTorch](https://img.shields.io/badge/PyTorch-Deep%20Learning-red)
	![Status](https://img.shields.io/badge/Status-Maintained-green)

	## About the Project

	This repository contains a Deep Reinforcement Learning engine built from scratch using PyTorch and Monte Carlo Tree Search (MCTS). The agent learns the game of Checkers solely through self-play, without any prior human knowledge or heuristics.

	The project goes beyond standard implementation by including a unique Arena Mode, where the AlphaZero agent is benchmarked against:
	1. Classical Algorithms: Minimax with Alpha-Beta Pruning.
	2. Generative AI: State-of-the-art LLMs (via Groq API) to test logic vs. probabilistic generation.

	## Benchmarks & Performance

	The trained model (`checkers_master_final.pth`) was subjected to rigorous testing.

	\| Opponent \| Type \| Result \| Notes \|
	\| :--- \| :--- \| :--- \| :--- \|
	\| Human Player \| Biological \| ✅ Win \| Surpassed the creator. \|
	\| Llama 3.3 70b \| LLM (Groq) \| ✅ Win \| Exploited the LLM's lack of spatial board consistency. \|
	\| Llama-4-maverick-17b-128e \| LLM \| ✅ Win \| Consistent tactical superiority. \|
	\| Kimi k2 \| LLM \| ✅ Win \| The LLM failed to maintain long-term strategy. \|
	\| Minimax (Depth 8) \| Classical Algo \| 🤝 Draw \| Crucial Result: Proves the neural network has converged to a robust, defensive optimal policy, matching a brute-force engine calculating millions of moves. \|

	## Technical Architecture

	* Neural Network: A ResNet-like architecture with a dual head:
	* Policy Head: Outputs move probabilities ($p$).
	* Value Head: Estimates the win probability ($v$) of the current state.
	* Inference: Uses MCTS guided by the neural network to simulate future outcomes.
	* Training: Continuous self-play loops with Replay Buffer and data augmentation.

	## Project Structure

	Please rename the source files to match the structure below for better organization:

	* `AlphaCheckerTrainer.py` Main training loop, MCTS logic, and Network Architecture.
	* `eval.py`: Interface to play against the AI locally.
	* `evalLLM.py`: Script to battle against LLMs using Groq API.
	* `evalminimax.py`: Script to battle against the Minimax algorithm.
	* `checkers_master_final.pth`: The AlphaChecker Weight Trained

	## Testing
	https://huggingface.co/spaces/Madras1/AlphaCherckerZero

	## Getting Started


	Developed by Gabriel Yogi.
	This project is for research purposes in the field of Reinforcement Learning.