Guandan AI (掼蛋AI)

V8 Model — Best Checkpoint

Metric Value
vs Champion (V7 Human BC) 58.5%
vs V1 (83% Rule) 72.0%
vs Rule Agent 95.0%
State Encoding 716-dim (714 + M-value + Power)
Architecture LSTM(2-layer) + ResNet MLP
Training Steps 3,810,000 / 5,000,000
Training Method Deep Monte Carlo (DMC) with N-step TD

Architecture

  • Input 1: History tensor — LSTM branch
  • Input 2: State+Action — MLP branch
  • Output: Q-value

State Encoding (716 dims)

  • V5 base (714 dims): hand, played cards, history, wild cards, etc.
  • M-value (dim 714): min plays to empty hand, normalized /14
  • Power (dim 715): count of control cards, normalized /8

Files

  • — V8 best (58.5% vs Champion)
  • — V7 Human BC (92.6% vs Rule, legacy)

Usage (ONNX Runtime)

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading