avewright's picture
Update worker logs
7755076 verified
========================================================================
WORKER 0
========================================================================
========================================================================
EXP075 WORKER 0 on GPU 0 (NVIDIA A40)
========================================================================
Timestamp: 2026-03-28T15:19:06.860207+00:00
VRAM: 47.7 GB
Data split: files [0:818) = 818/3275 files, ~208M positions
/usr/local/lib/python3.11/dist-packages/torch/nn/modules/transformer.py:307: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.norm_first was True
warnings.warn(f"enable_nested_tensor is True, but self.use_nested_tensor is False because {why_not_sparsity_fast_path}")
Model: 204.0M params
Downloading best_model.pt from avewright/chess-transformer-200m-v2...
Loaded v2 weights OK (816 MB)
Downloading eval data: data/test-00000-of-00001.parquet...
Tokenized 488,309 eval candidates, using 5,000
Eval ready: 5000 positions
StreamingHFChessLoader: 818 parquet files (src), est ~208M positions, batch=256, device=cuda:0, rev=a9dfd59e
Training: ~208M positions, ~202,902 opt steps
Batch: 256 x accum=4 (eff=1024), lr=0.0001
Warmup: 2029 steps, sync every 500 steps
Initial evaluation...
Loaded model: acc=16.3% top3=41.8% sf_rank=66.6 val=78.5%
endgame: 18.1%
middlegame: 17.9%
opening: 15.9%
Saved baseline as best_model.pt (acc=16.3%)
------------------------------------------------------------------------
[W0] Training started (~208M positions, ~202,902 opt steps)
------------------------------------------------------------------------
========================================================================
WORKER 1
========================================================================
========================================================================
EXP075 WORKER 1 on GPU 1 (NVIDIA A40)
========================================================================
Timestamp: 2026-03-28T15:19:09.020936+00:00
VRAM: 47.7 GB
Data split: files [818:1636) = 818/3275 files, ~208M positions
/usr/local/lib/python3.11/dist-packages/torch/nn/modules/transformer.py:307: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.norm_first was True
warnings.warn(f"enable_nested_tensor is True, but self.use_nested_tensor is False because {why_not_sparsity_fast_path}")
Model: 204.0M params
Downloading best_model.pt from avewright/chess-transformer-200m-v2...
Loaded v2 weights OK (816 MB)
StreamingHFChessLoader: 818 parquet files (src), est ~208M positions, batch=256, device=cuda:1, rev=a9dfd59e
Training: ~208M positions, ~202,902 opt steps
Batch: 256 x accum=4 (eff=1024), lr=0.0001
Warmup: 2029 steps, sync every 500 steps
------------------------------------------------------------------------
[W1] Training started (~208M positions, ~202,902 opt steps)
------------------------------------------------------------------------
========================================================================
WORKER 2
========================================================================
========================================================================
EXP075 WORKER 2 on GPU 2 (NVIDIA A40)
========================================================================
Timestamp: 2026-03-28T15:19:11.098770+00:00
VRAM: 47.7 GB
Data split: files [1636:2454) = 818/3275 files, ~208M positions
/usr/local/lib/python3.11/dist-packages/torch/nn/modules/transformer.py:307: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.norm_first was True
warnings.warn(f"enable_nested_tensor is True, but self.use_nested_tensor is False because {why_not_sparsity_fast_path}")
Model: 204.0M params
Downloading best_model.pt from avewright/chess-transformer-200m-v2...
Loaded v2 weights OK (816 MB)
StreamingHFChessLoader: 818 parquet files (src), est ~208M positions, batch=256, device=cuda:2, rev=a9dfd59e
Training: ~208M positions, ~202,902 opt steps
Batch: 256 x accum=4 (eff=1024), lr=0.0001
Warmup: 2029 steps, sync every 500 steps
------------------------------------------------------------------------
[W2] Training started (~208M positions, ~202,902 opt steps)
------------------------------------------------------------------------
========================================================================
WORKER 3
========================================================================
========================================================================
EXP075 WORKER 3 on GPU 3 (NVIDIA A40)
========================================================================
Timestamp: 2026-03-28T15:19:13.165183+00:00
VRAM: 47.7 GB
Data split: files [2454:3275) = 821/3275 files, ~209M positions
/usr/local/lib/python3.11/dist-packages/torch/nn/modules/transformer.py:307: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.norm_first was True
warnings.warn(f"enable_nested_tensor is True, but self.use_nested_tensor is False because {why_not_sparsity_fast_path}")
Model: 204.0M params
Downloading best_model.pt from avewright/chess-transformer-200m-v2...
Loaded v2 weights OK (816 MB)
StreamingHFChessLoader: 821 parquet files (src), est ~209M positions, batch=256, device=cuda:3, rev=a9dfd59e
Training: ~209M positions, ~203,646 opt steps
Batch: 256 x accum=4 (eff=1024), lr=0.0001
Warmup: 2036 steps, sync every 500 steps
------------------------------------------------------------------------
[W3] Training started (~209M positions, ~203,646 opt steps)
------------------------------------------------------------------------