[2025-12-26 08:21:18,309] [INFO] [axolotl.utils.data.sft._load_raw_datasets:320] [PID:5436] Loading raw datasets... [2025-12-26 08:21:21,409] [INFO] [axolotl.utils.data.wrappers.get_dataset_wrapper:87] [PID:5436] Loading dataset: darwinkernelpanic/luau_corpus_axolotl with base_type: completion and prompt_style: None Tokenizing Prompts (num_proc=64): 0%| | 0/22633 [00:002048) (num_proc=64): 0%| | 0/22640 [00:002048) (num_proc=64): 2%|▉ | 354/22640 [00:01<01:14, 299.24 examples/s] Dropping Long Sequences (>2048) (num_proc=64): 6%|███▌ | 1416/22640 [00:01<00:15, 1392.33 examples/s] Dropping Long Sequences (>2048) (num_proc=64): 100%|███████████████████████████████████████████████████████| 22640/22640 [00:01<00:00, 14654.26 examples/s] Drop Samples with Zero Trainable Tokens (num_proc=64): 0%| | 0/22640 [00:00:204: RuntimeWarning: Mean of empty slice [2025-12-26 08:26:22,565] [WARNING] [py.warnings._showwarnmsg:110] [PID:5436] /root/miniconda3/envs/py3.11/lib/python3.11/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:678: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( [2025-12-26 08:26:40,186] [WARNING] [py.warnings._showwarnmsg:110] [PID:5436] /root/miniconda3/envs/py3.11/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:860: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. warnings.warn( [2025-12-26 08:26:40,186] [WARNING] [py.warnings._showwarnmsg:110] [PID:5436] /root/miniconda3/envs/py3.11/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:904: UserWarning: Multiple backends are registered with this ProcessGroup. We cannot determine which one is the default. Returning cpu. Please consider using other APIs. warnings.warn(