🎮 십이장기 (12-Janggi) Reinforcement Learning Models
Deep reinforcement learning agents trained to play 십이장기, a simplified Korean chess variant played on an 8×3 board, using self-play and Maskable PPO algorithm.
📋 Model Overview
| Property | Value |
|---|---|
| Algorithm | Maskable PPO (Proximal Policy Optimization) |
| Framework | Stable-Baselines3-Contrib |
| Training Method | Self-play with adaptive timesteps |
| Board Dimensions | 8 rows × 3 columns |
| Total Actions | 576 (24×24 position combinations) |
| Game Type | 십이장기 (Korean Chess Variant) |
🎯 십이장기 Game Rules
Pieces (기물):
- 🤴 왕 (King): Moves one step in any direction (horizontal, vertical, diagonal)
- 💎 상 (Advisor/Elephant): Moves one step diagonally
- 🐘 장 (General): Moves one step horizontally or vertically
- 🚗 자 (Chariot/Soldier): Moves one step forward (promotes to 후 at end row)
- ⭐ 후 (Promoted Chariot): Enhanced movement after promotion
Victory Conditions:
- Capture opponent's King (왕)
- Move your King to opponent's back row (UP: row 5, DOWN: row 2)
Game Features:
- Simplified Korean chess with only 12 pieces (6 per player)
- Fast-paced gameplay on compact 8×3 board
- Captured pieces can be placed back on the board
📦 Available Models
{chr(10).join(f"- {f}: {'Standard (256 neurons)' if '_512' not in f else 'Large (512 neurons)'} - {'UP player (上家)' if 'up' in f else 'DOWN player (下家)'}" for f in model_files)}
🚀 Quick Start
Installation
pip install sb3-contrib gymnasium numpy
Load and Use Model
from sb3_contrib import MaskablePPO
from sb3_contrib.common.wrappers import ActionMasker
import gymnasium as gym
# Load 십이장기 environment
env = gym.make("MiniChess-v0") # Environment name kept for compatibility
env = ActionMasker(env, lambda e: e.unwrapped.get_valid_actions())
# Load trained model
model = MaskablePPO.load("model_up.zip")
# Play
obs, info = env.reset()
done = False
while not done:
# Get valid actions
action_masks = env.unwrapped.get_valid_actions()
# Predict with masking
action, _states = model.predict(obs, action_masks=action_masks, deterministic=False)
# Execute action
obs, reward, terminated, truncated, info = env.step(action)
done = terminated or truncated
print(f"Game ended with reward: {{reward}}")
Download from Hub
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(
repo_id="SoonchunhyangUniversity/12-RN",
filename="model_up.zip"
)
model = MaskablePPO.load(model_path)
🎓 Training Details
Hyperparameters
- Learning Rate: 5e-4 with adaptive adjustment
- Batch Size: 2048
- N-Steps: 2048
- Entropy Coefficient: 0.05 (exploration vs exploitation balance)
- Clip Range: 0.2
- N-Epochs: 10
- Network Architecture: [256, 256] or [512, 512] (policy and value networks)
Training Strategy
- Self-play: Models alternate training against each other
- Adaptive Training: Timesteps adjusted based on win rate performance
- Action Masking: Only legal moves are considered during training
- Parallel Environments: 48 simultaneous game instances for efficient data collection
Observation Space
Multi-channel board representation (Dict):
- Board:
(10, 8, 3)tensor - 10 channels for different piece types and players - Turn: Scalar indicating current player (0 or 1)
Action Space
- Type: Discrete(576)
- Encoding:
action = start_position * 24 + target_position - Masking: Invalid moves filtered via action masks
📊 Performance
Models achieve strong performance through self-play:
- Average episode reward: ~200-400 range
- Episode length: 15-25 moves on average
- Win rate stabilizes around 45-55% when evenly matched
🔗 Resources
- GitHub Repository: LikeLionSCH/12-RN
- Environment Code: See repository for
MiniChessEnvimplementation - Training Script:
new_learn.pyin repository
📄 Citation
If you use these models in your research, please cite:
@misc{{12janggi_rl_2026,
title={{십이장기 (12-Janggi) Reinforcement Learning Models}},
author={{Soonchunhyang University}},
year={{2026}},
publisher={{Hugging Face}},
url={{https://huggingface.co/SoonchunhyangUniversity/12-RN}}
}}
📜 License
This project is licensed under the MIT License - see the LICENSE file for details.
Developed by: Soonchunhyang University (순천향대학교)
Contact: For questions or collaborations, please open an issue on GitHub
- Downloads last month
- 139