Guandan AI (掼蛋AI)
V8 Model — Best Checkpoint
| Metric | Value |
|---|---|
| vs Champion (V7 Human BC) | 58.5% |
| vs V1 (83% Rule) | 72.0% |
| vs Rule Agent | 95.0% |
| State Encoding | 716-dim (714 + M-value + Power) |
| Architecture | LSTM(2-layer) + ResNet MLP |
| Training Steps | 3,810,000 / 5,000,000 |
| Training Method | Deep Monte Carlo (DMC) with N-step TD |
Architecture
- Input 1: History tensor — LSTM branch
- Input 2: State+Action — MLP branch
- Output: Q-value
State Encoding (716 dims)
- V5 base (714 dims): hand, played cards, history, wild cards, etc.
- M-value (dim 714): min plays to empty hand, normalized /14
- Power (dim 715): count of control cards, normalized /8
Files
- — V8 best (58.5% vs Champion)
- — V7 Human BC (92.6% vs Rule, legacy)