NanoGo

Train and play a strong Go AI on a single GPU.

Elo ~3900 from a single RTX 4090. Beats KataGo b6c64 100% and goes even with KataGo b6c96 (Elo ~3956) using raw policy alone — no search.

Model Files

File Description Size
sl_final_ema.pt SL policy (EMA), 56.48% top-1 on KGS expert games ~263MB
rl_iter9950_policy.pt RL policy (iter 9950), 100% vs KataGo b6c64, 46% vs b6c96-s18M ~263MB
rl_iter9950_value.pt Value network (co-trained with RL policy) ~261MB

Architecture

  • Backbone: 12-layer Transformer, dim=384, 21.6M params
  • Policy head: Decoupled pass classifier — 361-way board softmax + separate sigmoid pass
  • Value head: Same backbone, scalar output in [-1, 1]
  • Features: 48-plane AlphaGo-style board encoding (19x19)
  • Training: SL on KGS expert games → REINFORCE self-play (10K iters, 3.5 days on RTX 4090)

Usage

import torch
from models.factory import build_policy_net
from training.utils import load_checkpoint
import config as cfg

cfg.BACKBONE = "transformer"
cfg.BACKBONE_DIM = 384
cfg.TRANSFORMER_LAYERS = 12
cfg.NORM_TYPE = "rmsnorm"
cfg.FFN_ACTIVATION = "swiglu"
cfg.POS_EMBED = "learned_2d"

model = build_policy_net(cfg)
load_checkpoint("rl_iter9950_policy.pt", model)
model.eval()

# Returns PolicyOutput(board_logits=(B, 361), pass_logit=(B,))
output = model(features)

Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading