2025-12-02 03:08:45,421 INFO MainThread:2321 [wandb_setup.py:_flush():80] Current SDK version is 0.23.0 2025-12-02 03:08:45,421 INFO MainThread:2321 [wandb_setup.py:_flush():80] Configure stats pid to 2321 2025-12-02 03:08:45,421 INFO MainThread:2321 [wandb_setup.py:_flush():80] Loading settings from /root/.config/wandb/settings 2025-12-02 03:08:45,421 INFO MainThread:2321 [wandb_setup.py:_flush():80] Loading settings from /notebooks/toy_models/model_training/pile_llama_grid_dataset_name_PL_SeqObserved_NonSeqObserved_L2/wandb/settings 2025-12-02 03:08:45,421 INFO MainThread:2321 [wandb_setup.py:_flush():80] Loading settings from environment variables 2025-12-02 03:08:45,421 INFO MainThread:2321 [wandb_init.py:setup_run_log_directory():713] Logging user logs to /notebooks/toy_models/model_training/pile_llama_grid_dataset_name_PL_SeqObserved_NonSeqObserved_L2/wandb/run-20251202_030845-3hxj1bdv/logs/debug.log 2025-12-02 03:08:45,421 INFO MainThread:2321 [wandb_init.py:setup_run_log_directory():714] Logging internal logs to /notebooks/toy_models/model_training/pile_llama_grid_dataset_name_PL_SeqObserved_NonSeqObserved_L2/wandb/run-20251202_030845-3hxj1bdv/logs/debug-internal.log 2025-12-02 03:08:45,421 INFO MainThread:2321 [wandb_init.py:init():840] calling init triggers 2025-12-02 03:08:45,421 INFO MainThread:2321 [wandb_init.py:init():845] wandb.init called with sweep_config: {} config: {'model_name': 'pile_llama_grid', 'n_layers': 2, 'd_model': 512, 'd_mlp': 2048, 'd_head': 64, 'n_heads': 8, 'attn_only': False, 'layer_norm_eps': 1e-05, 'init_range': 0.02, 'n_ctx': 1024, 'd_vocab': 48262, 'dataset_name': 'eoinf/PL_SeqObserved_NonSeqObserved_L2', 'tokenizer_name': '', 'seed': 10, 'device': 'cuda', 'use_bfloat16_matmul': False, 'batch_size_per_device': 32, 'n_devices': 1, 'batches_per_step': 1, 'max_tokens': 200000000, 'lr_hidden': 0.002, 'lr_vector': 0.001, 'lr_schedule': 'constant_with_warmup', 'warmup_tokens': 30000000, 'weight_decay': 0.05, 'grad_norm_clip': 1.0, 'train_loss_moving_average_beta': 0.99, 'log_interval': 25, 'save_checkpoints': True, 'checkpoint_interval': 500, 'checkpoint_interval_ratio': 1.1, 'save_log_checkpoints': True, 'use_wandb': True, 'batch_size': 32, 'tokens_per_step': 32768, 'warmup_steps': 915, 'max_steps': 6103, '_wandb': {}} 2025-12-02 03:08:45,421 INFO MainThread:2321 [wandb_init.py:init():888] starting backend 2025-12-02 03:08:46,046 INFO MainThread:2321 [wandb_init.py:init():891] sending inform_init request 2025-12-02 03:08:46,056 INFO MainThread:2321 [wandb_init.py:init():899] backend started and connected 2025-12-02 03:08:46,057 INFO MainThread:2321 [wandb_init.py:init():969] updated telemetry 2025-12-02 03:08:46,279 INFO MainThread:2321 [wandb_init.py:init():993] communicating run to backend with 90.0 second timeout 2025-12-02 03:08:46,672 INFO MainThread:2321 [wandb_init.py:init():1040] starting run threads in backend 2025-12-02 03:08:47,416 INFO MainThread:2321 [wandb_run.py:_console_start():2504] atexit reg 2025-12-02 03:08:47,416 INFO MainThread:2321 [wandb_run.py:_redirect():2352] redirect: wrap_raw 2025-12-02 03:08:47,416 INFO MainThread:2321 [wandb_run.py:_redirect():2421] Wrapping output streams. 2025-12-02 03:08:47,416 INFO MainThread:2321 [wandb_run.py:_redirect():2444] Redirects installed. 2025-12-02 03:08:47,430 INFO MainThread:2321 [wandb_init.py:init():1080] run started, returning control to user process 2025-12-02 04:13:14,346 INFO MainThread:2321 [wandb_run.py:_finish():2270] finishing run eoin/toy-transformer-replication/3hxj1bdv 2025-12-02 04:13:14,352 INFO MainThread:2321 [wandb_run.py:_atexit_cleanup():2469] got exitcode: 0 2025-12-02 04:13:14,352 INFO MainThread:2321 [wandb_run.py:_restore():2451] restore 2025-12-02 04:13:14,352 INFO MainThread:2321 [wandb_run.py:_restore():2457] restore done 2025-12-02 04:13:14,899 INFO MainThread:2321 [wandb_run.py:_footer_sync_info():3853] logging synced files