NanoGo
Train and play a strong Go AI on a single GPU.
Elo ~3900 from a single RTX 4090. Beats KataGo b6c64 100% and goes even with KataGo b6c96 (Elo ~3956) using raw policy alone — no search.
Model Files
| File | Description | Size |
|---|---|---|
sl_final_ema.pt |
SL policy (EMA), 56.48% top-1 on KGS expert games | ~263MB |
rl_iter9950_policy.pt |
RL policy (iter 9950), 100% vs KataGo b6c64, 46% vs b6c96-s18M | ~263MB |
rl_iter9950_value.pt |
Value network (co-trained with RL policy) | ~261MB |
Architecture
- Backbone: 12-layer Transformer, dim=384, 21.6M params
- Policy head: Decoupled pass classifier — 361-way board softmax + separate sigmoid pass
- Value head: Same backbone, scalar output in [-1, 1]
- Features: 48-plane AlphaGo-style board encoding (19x19)
- Training: SL on KGS expert games → REINFORCE self-play (10K iters, 3.5 days on RTX 4090)
Usage
import torch
from models.factory import build_policy_net
from training.utils import load_checkpoint
import config as cfg
cfg.BACKBONE = "transformer"
cfg.BACKBONE_DIM = 384
cfg.TRANSFORMER_LAYERS = 12
cfg.NORM_TYPE = "rmsnorm"
cfg.FFN_ACTIVATION = "swiglu"
cfg.POS_EMBED = "learned_2d"
model = build_policy_net(cfg)
load_checkpoint("rl_iter9950_policy.pt", model)
model.eval()
# Returns PolicyOutput(board_logits=(B, 361), pass_logit=(B,))
output = model(features)
Links
- Code: github.com/shaochuan/nanogo
- License: MIT