iluman
/

guandan-ai

Reinforcement Learning

deep-monte-carlo

Model card Files Files and versions

Guandan AI (掼蛋AI)

V8 Model — Best Checkpoint

Metric	Value
vs Champion (V7 Human BC)	58.5%
vs V1 (83% Rule)	72.0%
vs Rule Agent	95.0%
State Encoding	716-dim (714 + M-value + Power)
Architecture	LSTM(2-layer) + ResNet MLP
Training Steps	3,810,000 / 5,000,000
Training Method	Deep Monte Carlo (DMC) with N-step TD

Architecture

Input 1: History tensor — LSTM branch
Input 2: State+Action — MLP branch
Output: Q-value

State Encoding (716 dims)

V5 base (714 dims): hand, played cards, history, wild cards, etc.
M-value (dim 714): min plays to empty hand, normalized /14
Power (dim 715): count of control cards, normalized /8

Files

— V8 best (58.5% vs Champion)
— V7 Human BC (92.6% vs Rule, legacy)

Usage (ONNX Runtime)

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

loading