# Training Guide ## Prerequisites - **Rust** (stable) -- required to build the chess engine native extension - **uv** -- Python package manager ([install](https://docs.astral.sh/uv/getting-started/installation/)) - **GPU** with ROCm (AMD) or CUDA (NVIDIA). CPU works only for `--variant toy` ## Installation ```bash # Build the chess engine (one-time, or after engine/ changes) cd engine && uv run --with maturin maturin develop --release && cd .. # Install Python dependencies uv sync --extra rocm # AMD GPUs (ROCm) uv sync --extra cu128 # NVIDIA GPUs (CUDA 12.8) ``` Verify the install: ```bash uv run python -c "import chess_engine; print('engine OK')" uv run python -c "import torch; print(f'CUDA: {torch.cuda.is_available()}')" ``` ## Pretraining from Scratch PAWN pretrains on random chess games generated on-the-fly by the Rust engine. No external datasets are needed. ```bash uv run python scripts/train.py --variant base ``` ### Model variants | Variant | Params | d_model | Layers | Heads | d_ff | |---------|--------|---------|--------|-------|------| | `small` | ~9.5M | 256 | 8 | 4 | 1024 | | `base` | ~36M | 512 | 8 | 8 | 2048 | | `large` | ~68M | 640 | 10 | 8 | 2560 | | `toy` | tiny | 64 | 2 | 4 | 256 | ### Default training configuration - **Total steps**: 100,000 - **Batch size**: 256 - **Optimizer**: [AdamW](https://arxiv.org/abs/1711.05101) (Loshchilov & Hutter, 2017) (lr=3e-4, weight_decay=0.01) - **LR schedule**: [cosine decay](https://arxiv.org/abs/1608.03983) (Loshchilov & Hutter, 2016) with 1,000-step warmup - **Mixed precision**: fp16 [AMP](https://arxiv.org/abs/1710.03740) (Micikevicius et al., 2017) (auto-detected) - **Checkpoints**: saved every 5,000 steps to `checkpoints/` - **Eval**: every 500 steps on 512 held-out random games ### Common overrides ```bash # Resume from a checkpoint uv run python scripts/train.py --variant base --resume checkpoints/step_00050000 # Custom batch size and step count uv run python scripts/train.py --variant base --batch-size 128 --total-steps 200000 # Gradient accumulation (effective batch = batch_size * accumulation_steps) uv run python scripts/train.py --variant base --accumulation-steps 4 # Enable W&B logging uv run python scripts/train.py --variant base --wandb ``` ## Adapter Training (Behavioral Cloning) Adapter training freezes the pretrained PAWN backbone and trains lightweight adapter modules on Lichess games to predict human moves. ### Requirements 1. A pretrained PAWN checkpoint (from pretraining above) 2. A Lichess PGN file filtered to an Elo band Download standard rated game archives from the [Lichess open database](https://database.lichess.org/) ([Lichess](https://lichess.org/)), filtered to your target Elo band. The scripts expect a single `.pgn` file. ### Available adapters | Adapter | Script | Key flag | |--------------|-----------------------------|----------------------| | Bottleneck | `scripts/train_bottleneck.py` | `--bottleneck-dim 8` | | FiLM | `scripts/train_film.py` | | | LoRA | `scripts/train_lora.py` | | | Sparse | `scripts/train_sparse.py` | | | Hybrid | `scripts/train_hybrid.py` | | There is also `scripts/train_tiny.py` for a standalone small transformer baseline (no frozen backbone). ### Example: bottleneck adapter ```bash uv run python scripts/train_bottleneck.py \ --checkpoint checkpoints/pawn-base.pt \ --pgn data/lichess_1800_1900.pgn \ --bottleneck-dim 32 \ --lr 1e-4 ``` ### Adapter training defaults - **Epochs**: 50 (with early stopping, patience=10) - **Batch size**: 64 - **Optimizer**: AdamW (lr=3e-4) - **LR schedule**: cosine with 5% warmup - **Min ply**: 10 (games shorter than 10 plies are skipped) - **Max games**: 12,000 train + 2,000 validation - **Legal masking**: move legality enforced via the Rust engine at every position ### Resuming adapter training ```bash uv run python scripts/train_bottleneck.py \ --checkpoint checkpoints/pawn-base.pt \ --pgn data/lichess_1800_1900.pgn \ --resume logs/bottleneck_20260315_120000/checkpoints/best.pt ``` ### Selective layer placement Adapters can target specific layers or sublayer positions: ```bash # Only FFN adapters on layers 4-7 uv run python scripts/train_bottleneck.py \ --checkpoint checkpoints/pawn-base.pt \ --pgn data/lichess_1800_1900.pgn \ --no-adapt-attn --adapter-layers 4,5,6,7 ``` Use `--attn-layers` / `--ffn-layers` for independent control of which layers get attention vs FFN adapters. ## Cloud Deployment (Runpod) The `deploy/` directory provides scripts for managing GPU pods. ### Pod lifecycle with `pod.sh` ```bash bash deploy/pod.sh create myexp --gpu a5000 # Create a pod bash deploy/pod.sh deploy myexp # Build + transfer + setup bash deploy/pod.sh launch myexp scripts/train.py --variant base # Run training bash deploy/pod.sh ssh myexp # SSH in bash deploy/pod.sh stop myexp # Stop (preserves volume) ``` GPU shortcuts: `a5000`, `a40`, `a6000`, `4090`, `5090`, `l40s`, `h100`. ### Manual deployment If you prefer to deploy manually: ```bash # 1. Build deploy package locally bash deploy/build.sh --checkpoint checkpoints/pawn-base.pt --data-dir data/ # 2. Transfer to pod rsync -avz --progress deploy/pawn-deploy/ root@:/workspace/pawn/ # 3. Run setup on the pod (installs Rust, uv, builds engine, syncs deps) ssh root@ 'cd /workspace/pawn && bash deploy/setup.sh' ``` `setup.sh` handles: Rust installation, uv installation, building the chess engine, `uv sync --extra cu128`, and decompressing any zstd-compressed PGN data. ## GPU Auto-Detection The `pawn.gpu` module auto-detects your GPU and configures: - **torch.compile**: enabled on CUDA, uses inductor backend - **AMP**: fp16 automatic mixed precision on CUDA - **SDPA backend**: flash attention on NVIDIA; MATH backend on AMD (ROCm's flash attention backward has stride mismatches with torch.compile) No manual flags are needed in most cases. Override with `--no-compile`, `--no-amp`, or `--sdpa-math` if needed. ## Monitoring All training scripts log metrics to JSONL files in `logs/`. Each run creates a timestamped directory (e.g., `logs/bottleneck_20260315_120000/metrics.jsonl`). Every log record includes: - Training metrics (loss, accuracy, learning rate) - System resource stats (RAM, GPU VRAM peak/current) - Timestamps and elapsed time The JSONL format is one JSON object per line, readable with standard tools: ```bash # Watch live training progress tail -f logs/*/metrics.jsonl | python -m json.tool ```