======================================================================== WORKER 0 ======================================================================== ======================================================================== EXP075 WORKER 0 on GPU 0 (NVIDIA A40) ======================================================================== Timestamp: 2026-03-28T15:19:06.860207+00:00 VRAM: 47.7 GB Data split: files [0:818) = 818/3275 files, ~208M positions /usr/local/lib/python3.11/dist-packages/torch/nn/modules/transformer.py:307: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.norm_first was True warnings.warn(f"enable_nested_tensor is True, but self.use_nested_tensor is False because {why_not_sparsity_fast_path}") Model: 204.0M params Downloading best_model.pt from avewright/chess-transformer-200m-v2... Loaded v2 weights OK (816 MB) Downloading eval data: data/test-00000-of-00001.parquet... Tokenized 488,309 eval candidates, using 5,000 Eval ready: 5000 positions StreamingHFChessLoader: 818 parquet files (src), est ~208M positions, batch=256, device=cuda:0, rev=a9dfd59e Training: ~208M positions, ~202,902 opt steps Batch: 256 x accum=4 (eff=1024), lr=0.0001 Warmup: 2029 steps, sync every 500 steps Initial evaluation... Loaded model: acc=16.3% top3=41.8% sf_rank=66.6 val=78.5% endgame: 18.1% middlegame: 17.9% opening: 15.9% Saved baseline as best_model.pt (acc=16.3%) ------------------------------------------------------------------------ [W0] Training started (~208M positions, ~202,902 opt steps) ------------------------------------------------------------------------ ======================================================================== WORKER 1 ======================================================================== ======================================================================== EXP075 WORKER 1 on GPU 1 (NVIDIA A40) ======================================================================== Timestamp: 2026-03-28T15:19:09.020936+00:00 VRAM: 47.7 GB Data split: files [818:1636) = 818/3275 files, ~208M positions /usr/local/lib/python3.11/dist-packages/torch/nn/modules/transformer.py:307: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.norm_first was True warnings.warn(f"enable_nested_tensor is True, but self.use_nested_tensor is False because {why_not_sparsity_fast_path}") Model: 204.0M params Downloading best_model.pt from avewright/chess-transformer-200m-v2... Loaded v2 weights OK (816 MB) StreamingHFChessLoader: 818 parquet files (src), est ~208M positions, batch=256, device=cuda:1, rev=a9dfd59e Training: ~208M positions, ~202,902 opt steps Batch: 256 x accum=4 (eff=1024), lr=0.0001 Warmup: 2029 steps, sync every 500 steps ------------------------------------------------------------------------ [W1] Training started (~208M positions, ~202,902 opt steps) ------------------------------------------------------------------------ ======================================================================== WORKER 2 ======================================================================== ======================================================================== EXP075 WORKER 2 on GPU 2 (NVIDIA A40) ======================================================================== Timestamp: 2026-03-28T15:19:11.098770+00:00 VRAM: 47.7 GB Data split: files [1636:2454) = 818/3275 files, ~208M positions /usr/local/lib/python3.11/dist-packages/torch/nn/modules/transformer.py:307: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.norm_first was True warnings.warn(f"enable_nested_tensor is True, but self.use_nested_tensor is False because {why_not_sparsity_fast_path}") Model: 204.0M params Downloading best_model.pt from avewright/chess-transformer-200m-v2... Loaded v2 weights OK (816 MB) StreamingHFChessLoader: 818 parquet files (src), est ~208M positions, batch=256, device=cuda:2, rev=a9dfd59e Training: ~208M positions, ~202,902 opt steps Batch: 256 x accum=4 (eff=1024), lr=0.0001 Warmup: 2029 steps, sync every 500 steps ------------------------------------------------------------------------ [W2] Training started (~208M positions, ~202,902 opt steps) ------------------------------------------------------------------------ ======================================================================== WORKER 3 ======================================================================== ======================================================================== EXP075 WORKER 3 on GPU 3 (NVIDIA A40) ======================================================================== Timestamp: 2026-03-28T15:19:13.165183+00:00 VRAM: 47.7 GB Data split: files [2454:3275) = 821/3275 files, ~209M positions /usr/local/lib/python3.11/dist-packages/torch/nn/modules/transformer.py:307: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.norm_first was True warnings.warn(f"enable_nested_tensor is True, but self.use_nested_tensor is False because {why_not_sparsity_fast_path}") Model: 204.0M params Downloading best_model.pt from avewright/chess-transformer-200m-v2... Loaded v2 weights OK (816 MB) StreamingHFChessLoader: 821 parquet files (src), est ~209M positions, batch=256, device=cuda:3, rev=a9dfd59e Training: ~209M positions, ~203,646 opt steps Batch: 256 x accum=4 (eff=1024), lr=0.0001 Warmup: 2036 steps, sync every 500 steps ------------------------------------------------------------------------ [W3] Training started (~209M positions, ~203,646 opt steps) ------------------------------------------------------------------------