[2025-06-30 13:51:22,135] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) Warning: Permanently added '[10.82.142.18]:22988' (ECDSA) to the list of known hosts. [2025-06-30 13:51:24,592] [INFO] [runner.py:463:main] Using IP address of 10.82.142.18 for node 10.82.142.18 [2025-06-30 13:51:24,595] [INFO] [multinode_runner.py:80:get_cmd] Running on the following workers: 10.82.142.18,10.82.142.23,10.82.142.17 [2025-06-30 13:51:24,596] [INFO] [runner.py:568:main] cmd = pdsh -S -f 1024 -w 10.82.142.18,10.82.142.23,10.82.142.17 export PYTHONUNBUFFERED=1; export NCCL_VERSION=2.18.3-1; export NCCL_IB_DISABLE=1; export PYTHONPATH=/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory; cd /share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory; /root/miniconda3/envs/llama_factory/bin/python -u -m deepspeed.launcher.launch --world_info=eyIxMC44Mi4xNDIuMTgiOiBbMCwgMSwgMiwgMywgNCwgNSwgNiwgN10sICIxMC44Mi4xNDIuMjMiOiBbMCwgMSwgMiwgMywgNCwgNSwgNiwgN10sICIxMC44Mi4xNDIuMTciOiBbMCwgMSwgMiwgMywgNCwgNSwgNiwgN119 --node_rank=%n --master_addr=10.82.142.18 --master_port=11000 /share/liangqingyuan/LLaMA-Factory/src/train.py --deepspeed /share/liangqingyuan/LLaMA-Factory/examples/deepspeed/ds_z2_config.json --stage pt --do_train --max_samples 20000000 --model_name_or_path /share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500 --dataset cpt_sft_v5_i1v3 --dataset_dir /share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/data --template default --finetuning_type full --output_dir /share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3 --overwrite_cache --overwrite_output_dir --cutoff_len 2048 --packing False --preprocessing_num_workers 192 --per_device_train_batch_size 1 --gradient_accumulation_steps 10 --lr_scheduler_type cosine --logging_steps 1 --warmup_steps 20 --report_to wandb --run_name CPT14b_v5_I1v3 --save_steps 100 --save_total_limit 1000 --flash_attn fa2 --learning_rate 1e-4 --num_train_epochs 2 --plot_loss --fp16 --eval_on_start False 10.82.142.17: Warning: Permanently added '[10.82.142.17]:22988' (ECDSA) to the list of known hosts. 10.82.142.18: Warning: Permanently added '[10.82.142.18]:22988' (ECDSA) to the list of known hosts. 10.82.142.23: Warning: Permanently added '[10.82.142.23]:22988' (ECDSA) to the list of known hosts. 10.82.142.17: [2025-06-30 13:51:26,616] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) 10.82.142.18: [2025-06-30 13:51:26,643] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) 10.82.142.23: [2025-06-30 13:51:26,701] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) 10.82.142.18: [2025-06-30 13:51:28,158] [INFO] [launch.py:138:main] 0 NCCL_VERSION=2.18.3-1 10.82.142.18: [2025-06-30 13:51:28,159] [INFO] [launch.py:138:main] 0 NCCL_IB_DISABLE=1 10.82.142.18: [2025-06-30 13:51:28,159] [INFO] [launch.py:145:main] WORLD INFO DICT: {'10.82.142.18': [0, 1, 2, 3, 4, 5, 6, 7], '10.82.142.23': [0, 1, 2, 3, 4, 5, 6, 7], '10.82.142.17': [0, 1, 2, 3, 4, 5, 6, 7]} 10.82.142.18: [2025-06-30 13:51:28,159] [INFO] [launch.py:151:main] nnodes=3, num_local_procs=8, node_rank=0 10.82.142.18: [2025-06-30 13:51:28,159] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(, {'10.82.142.18': [0, 1, 2, 3, 4, 5, 6, 7], '10.82.142.23': [8, 9, 10, 11, 12, 13, 14, 15], '10.82.142.17': [16, 17, 18, 19, 20, 21, 22, 23]}) 10.82.142.18: [2025-06-30 13:51:28,159] [INFO] [launch.py:163:main] dist_world_size=24 10.82.142.18: [2025-06-30 13:51:28,159] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 10.82.142.18: [2025-06-30 13:51:28,159] [INFO] [launch.py:253:main] process 510 spawned with command: ['/root/miniconda3/envs/llama_factory/bin/python', '-u', '/share/liangqingyuan/LLaMA-Factory/src/train.py', '--local_rank=0', '--deepspeed', '/share/liangqingyuan/LLaMA-Factory/examples/deepspeed/ds_z2_config.json', '--stage', 'pt', '--do_train', '--max_samples', '20000000', '--model_name_or_path', '/share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500', '--dataset', 'cpt_sft_v5_i1v3', '--dataset_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/data', '--template', 'default', '--finetuning_type', 'full', '--output_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3', '--overwrite_cache', '--overwrite_output_dir', '--cutoff_len', '2048', '--packing', 'False', '--preprocessing_num_workers', '192', '--per_device_train_batch_size', '1', '--gradient_accumulation_steps', '10', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--warmup_steps', '20', '--report_to', 'wandb', '--run_name', 'CPT14b_v5_I1v3', '--save_steps', '100', '--save_total_limit', '1000', '--flash_attn', 'fa2', '--learning_rate', '1e-4', '--num_train_epochs', '2', '--plot_loss', '--fp16', '--eval_on_start', 'False'] 10.82.142.18: [2025-06-30 13:51:28,160] [INFO] [launch.py:253:main] process 511 spawned with command: ['/root/miniconda3/envs/llama_factory/bin/python', '-u', '/share/liangqingyuan/LLaMA-Factory/src/train.py', '--local_rank=1', '--deepspeed', '/share/liangqingyuan/LLaMA-Factory/examples/deepspeed/ds_z2_config.json', '--stage', 'pt', '--do_train', '--max_samples', '20000000', '--model_name_or_path', '/share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500', '--dataset', 'cpt_sft_v5_i1v3', '--dataset_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/data', '--template', 'default', '--finetuning_type', 'full', '--output_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3', '--overwrite_cache', '--overwrite_output_dir', '--cutoff_len', '2048', '--packing', 'False', '--preprocessing_num_workers', '192', '--per_device_train_batch_size', '1', '--gradient_accumulation_steps', '10', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--warmup_steps', '20', '--report_to', 'wandb', '--run_name', 'CPT14b_v5_I1v3', '--save_steps', '100', '--save_total_limit', '1000', '--flash_attn', 'fa2', '--learning_rate', '1e-4', '--num_train_epochs', '2', '--plot_loss', '--fp16', '--eval_on_start', 'False'] 10.82.142.18: [2025-06-30 13:51:28,160] [INFO] [launch.py:253:main] process 512 spawned with command: ['/root/miniconda3/envs/llama_factory/bin/python', '-u', '/share/liangqingyuan/LLaMA-Factory/src/train.py', '--local_rank=2', '--deepspeed', '/share/liangqingyuan/LLaMA-Factory/examples/deepspeed/ds_z2_config.json', '--stage', 'pt', '--do_train', '--max_samples', '20000000', '--model_name_or_path', '/share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500', '--dataset', 'cpt_sft_v5_i1v3', '--dataset_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/data', '--template', 'default', '--finetuning_type', 'full', '--output_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3', '--overwrite_cache', '--overwrite_output_dir', '--cutoff_len', '2048', '--packing', 'False', '--preprocessing_num_workers', '192', '--per_device_train_batch_size', '1', '--gradient_accumulation_steps', '10', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--warmup_steps', '20', '--report_to', 'wandb', '--run_name', 'CPT14b_v5_I1v3', '--save_steps', '100', '--save_total_limit', '1000', '--flash_attn', 'fa2', '--learning_rate', '1e-4', '--num_train_epochs', '2', '--plot_loss', '--fp16', '--eval_on_start', 'False'] 10.82.142.18: [2025-06-30 13:51:28,161] [INFO] [launch.py:253:main] process 513 spawned with command: ['/root/miniconda3/envs/llama_factory/bin/python', '-u', '/share/liangqingyuan/LLaMA-Factory/src/train.py', '--local_rank=3', '--deepspeed', '/share/liangqingyuan/LLaMA-Factory/examples/deepspeed/ds_z2_config.json', '--stage', 'pt', '--do_train', '--max_samples', '20000000', '--model_name_or_path', '/share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500', '--dataset', 'cpt_sft_v5_i1v3', '--dataset_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/data', '--template', 'default', '--finetuning_type', 'full', '--output_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3', '--overwrite_cache', '--overwrite_output_dir', '--cutoff_len', '2048', '--packing', 'False', '--preprocessing_num_workers', '192', '--per_device_train_batch_size', '1', '--gradient_accumulation_steps', '10', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--warmup_steps', '20', '--report_to', 'wandb', '--run_name', 'CPT14b_v5_I1v3', '--save_steps', '100', '--save_total_limit', '1000', '--flash_attn', 'fa2', '--learning_rate', '1e-4', '--num_train_epochs', '2', '--plot_loss', '--fp16', '--eval_on_start', 'False'] 10.82.142.18: [2025-06-30 13:51:28,161] [INFO] [launch.py:253:main] process 514 spawned with command: ['/root/miniconda3/envs/llama_factory/bin/python', '-u', '/share/liangqingyuan/LLaMA-Factory/src/train.py', '--local_rank=4', '--deepspeed', '/share/liangqingyuan/LLaMA-Factory/examples/deepspeed/ds_z2_config.json', '--stage', 'pt', '--do_train', '--max_samples', '20000000', '--model_name_or_path', '/share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500', '--dataset', 'cpt_sft_v5_i1v3', '--dataset_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/data', '--template', 'default', '--finetuning_type', 'full', '--output_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3', '--overwrite_cache', '--overwrite_output_dir', '--cutoff_len', '2048', '--packing', 'False', '--preprocessing_num_workers', '192', '--per_device_train_batch_size', '1', '--gradient_accumulation_steps', '10', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--warmup_steps', '20', '--report_to', 'wandb', '--run_name', 'CPT14b_v5_I1v3', '--save_steps', '100', '--save_total_limit', '1000', '--flash_attn', 'fa2', '--learning_rate', '1e-4', '--num_train_epochs', '2', '--plot_loss', '--fp16', '--eval_on_start', 'False'] 10.82.142.18: [2025-06-30 13:51:28,162] [INFO] [launch.py:253:main] process 515 spawned with command: ['/root/miniconda3/envs/llama_factory/bin/python', '-u', '/share/liangqingyuan/LLaMA-Factory/src/train.py', '--local_rank=5', '--deepspeed', '/share/liangqingyuan/LLaMA-Factory/examples/deepspeed/ds_z2_config.json', '--stage', 'pt', '--do_train', '--max_samples', '20000000', '--model_name_or_path', '/share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500', '--dataset', 'cpt_sft_v5_i1v3', '--dataset_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/data', '--template', 'default', '--finetuning_type', 'full', '--output_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3', '--overwrite_cache', '--overwrite_output_dir', '--cutoff_len', '2048', '--packing', 'False', '--preprocessing_num_workers', '192', '--per_device_train_batch_size', '1', '--gradient_accumulation_steps', '10', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--warmup_steps', '20', '--report_to', 'wandb', '--run_name', 'CPT14b_v5_I1v3', '--save_steps', '100', '--save_total_limit', '1000', '--flash_attn', 'fa2', '--learning_rate', '1e-4', '--num_train_epochs', '2', '--plot_loss', '--fp16', '--eval_on_start', 'False'] 10.82.142.18: [2025-06-30 13:51:28,162] [INFO] [launch.py:253:main] process 516 spawned with command: ['/root/miniconda3/envs/llama_factory/bin/python', '-u', '/share/liangqingyuan/LLaMA-Factory/src/train.py', '--local_rank=6', '--deepspeed', '/share/liangqingyuan/LLaMA-Factory/examples/deepspeed/ds_z2_config.json', '--stage', 'pt', '--do_train', '--max_samples', '20000000', '--model_name_or_path', '/share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500', '--dataset', 'cpt_sft_v5_i1v3', '--dataset_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/data', '--template', 'default', '--finetuning_type', 'full', '--output_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3', '--overwrite_cache', '--overwrite_output_dir', '--cutoff_len', '2048', '--packing', 'False', '--preprocessing_num_workers', '192', '--per_device_train_batch_size', '1', '--gradient_accumulation_steps', '10', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--warmup_steps', '20', '--report_to', 'wandb', '--run_name', 'CPT14b_v5_I1v3', '--save_steps', '100', '--save_total_limit', '1000', '--flash_attn', 'fa2', '--learning_rate', '1e-4', '--num_train_epochs', '2', '--plot_loss', '--fp16', '--eval_on_start', 'False'] 10.82.142.18: [2025-06-30 13:51:28,162] [INFO] [launch.py:253:main] process 517 spawned with command: ['/root/miniconda3/envs/llama_factory/bin/python', '-u', '/share/liangqingyuan/LLaMA-Factory/src/train.py', '--local_rank=7', '--deepspeed', '/share/liangqingyuan/LLaMA-Factory/examples/deepspeed/ds_z2_config.json', '--stage', 'pt', '--do_train', '--max_samples', '20000000', '--model_name_or_path', '/share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500', '--dataset', 'cpt_sft_v5_i1v3', '--dataset_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/data', '--template', 'default', '--finetuning_type', 'full', '--output_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3', '--overwrite_cache', '--overwrite_output_dir', '--cutoff_len', '2048', '--packing', 'False', '--preprocessing_num_workers', '192', '--per_device_train_batch_size', '1', '--gradient_accumulation_steps', '10', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--warmup_steps', '20', '--report_to', 'wandb', '--run_name', 'CPT14b_v5_I1v3', '--save_steps', '100', '--save_total_limit', '1000', '--flash_attn', 'fa2', '--learning_rate', '1e-4', '--num_train_epochs', '2', '--plot_loss', '--fp16', '--eval_on_start', 'False'] 10.82.142.17: [2025-06-30 13:51:28,824] [INFO] [launch.py:138:main] 2 NCCL_VERSION=2.18.3-1 10.82.142.17: [2025-06-30 13:51:28,824] [INFO] [launch.py:138:main] 2 NCCL_IB_DISABLE=1 10.82.142.17: [2025-06-30 13:51:28,824] [INFO] [launch.py:145:main] WORLD INFO DICT: {'10.82.142.18': [0, 1, 2, 3, 4, 5, 6, 7], '10.82.142.23': [0, 1, 2, 3, 4, 5, 6, 7], '10.82.142.17': [0, 1, 2, 3, 4, 5, 6, 7]} 10.82.142.17: [2025-06-30 13:51:28,824] [INFO] [launch.py:151:main] nnodes=3, num_local_procs=8, node_rank=2 10.82.142.17: [2025-06-30 13:51:28,824] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(, {'10.82.142.18': [0, 1, 2, 3, 4, 5, 6, 7], '10.82.142.23': [8, 9, 10, 11, 12, 13, 14, 15], '10.82.142.17': [16, 17, 18, 19, 20, 21, 22, 23]}) 10.82.142.17: [2025-06-30 13:51:28,824] [INFO] [launch.py:163:main] dist_world_size=24 10.82.142.17: [2025-06-30 13:51:28,824] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 10.82.142.17: [2025-06-30 13:51:28,825] [INFO] [launch.py:253:main] process 324 spawned with command: ['/root/miniconda3/envs/llama_factory/bin/python', '-u', '/share/liangqingyuan/LLaMA-Factory/src/train.py', '--local_rank=0', '--deepspeed', '/share/liangqingyuan/LLaMA-Factory/examples/deepspeed/ds_z2_config.json', '--stage', 'pt', '--do_train', '--max_samples', '20000000', '--model_name_or_path', '/share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500', '--dataset', 'cpt_sft_v5_i1v3', '--dataset_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/data', '--template', 'default', '--finetuning_type', 'full', '--output_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3', '--overwrite_cache', '--overwrite_output_dir', '--cutoff_len', '2048', '--packing', 'False', '--preprocessing_num_workers', '192', '--per_device_train_batch_size', '1', '--gradient_accumulation_steps', '10', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--warmup_steps', '20', '--report_to', 'wandb', '--run_name', 'CPT14b_v5_I1v3', '--save_steps', '100', '--save_total_limit', '1000', '--flash_attn', 'fa2', '--learning_rate', '1e-4', '--num_train_epochs', '2', '--plot_loss', '--fp16', '--eval_on_start', 'False'] 10.82.142.17: [2025-06-30 13:51:28,825] [INFO] [launch.py:253:main] process 325 spawned with command: ['/root/miniconda3/envs/llama_factory/bin/python', '-u', '/share/liangqingyuan/LLaMA-Factory/src/train.py', '--local_rank=1', '--deepspeed', '/share/liangqingyuan/LLaMA-Factory/examples/deepspeed/ds_z2_config.json', '--stage', 'pt', '--do_train', '--max_samples', '20000000', '--model_name_or_path', '/share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500', '--dataset', 'cpt_sft_v5_i1v3', '--dataset_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/data', '--template', 'default', '--finetuning_type', 'full', '--output_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3', '--overwrite_cache', '--overwrite_output_dir', '--cutoff_len', '2048', '--packing', 'False', '--preprocessing_num_workers', '192', '--per_device_train_batch_size', '1', '--gradient_accumulation_steps', '10', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--warmup_steps', '20', '--report_to', 'wandb', '--run_name', 'CPT14b_v5_I1v3', '--save_steps', '100', '--save_total_limit', '1000', '--flash_attn', 'fa2', '--learning_rate', '1e-4', '--num_train_epochs', '2', '--plot_loss', '--fp16', '--eval_on_start', 'False'] 10.82.142.17: [2025-06-30 13:51:28,825] [INFO] [launch.py:253:main] process 326 spawned with command: ['/root/miniconda3/envs/llama_factory/bin/python', '-u', '/share/liangqingyuan/LLaMA-Factory/src/train.py', '--local_rank=2', '--deepspeed', '/share/liangqingyuan/LLaMA-Factory/examples/deepspeed/ds_z2_config.json', '--stage', 'pt', '--do_train', '--max_samples', '20000000', '--model_name_or_path', '/share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500', '--dataset', 'cpt_sft_v5_i1v3', '--dataset_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/data', '--template', 'default', '--finetuning_type', 'full', '--output_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3', '--overwrite_cache', '--overwrite_output_dir', '--cutoff_len', '2048', '--packing', 'False', '--preprocessing_num_workers', '192', '--per_device_train_batch_size', '1', '--gradient_accumulation_steps', '10', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--warmup_steps', '20', '--report_to', 'wandb', '--run_name', 'CPT14b_v5_I1v3', '--save_steps', '100', '--save_total_limit', '1000', '--flash_attn', 'fa2', '--learning_rate', '1e-4', '--num_train_epochs', '2', '--plot_loss', '--fp16', '--eval_on_start', 'False'] 10.82.142.17: [2025-06-30 13:51:28,826] [INFO] [launch.py:253:main] process 327 spawned with command: ['/root/miniconda3/envs/llama_factory/bin/python', '-u', '/share/liangqingyuan/LLaMA-Factory/src/train.py', '--local_rank=3', '--deepspeed', '/share/liangqingyuan/LLaMA-Factory/examples/deepspeed/ds_z2_config.json', '--stage', 'pt', '--do_train', '--max_samples', '20000000', '--model_name_or_path', '/share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500', '--dataset', 'cpt_sft_v5_i1v3', '--dataset_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/data', '--template', 'default', '--finetuning_type', 'full', '--output_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3', '--overwrite_cache', '--overwrite_output_dir', '--cutoff_len', '2048', '--packing', 'False', '--preprocessing_num_workers', '192', '--per_device_train_batch_size', '1', '--gradient_accumulation_steps', '10', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--warmup_steps', '20', '--report_to', 'wandb', '--run_name', 'CPT14b_v5_I1v3', '--save_steps', '100', '--save_total_limit', '1000', '--flash_attn', 'fa2', '--learning_rate', '1e-4', '--num_train_epochs', '2', '--plot_loss', '--fp16', '--eval_on_start', 'False'] 10.82.142.17: [2025-06-30 13:51:28,826] [INFO] [launch.py:253:main] process 328 spawned with command: ['/root/miniconda3/envs/llama_factory/bin/python', '-u', '/share/liangqingyuan/LLaMA-Factory/src/train.py', '--local_rank=4', '--deepspeed', '/share/liangqingyuan/LLaMA-Factory/examples/deepspeed/ds_z2_config.json', '--stage', 'pt', '--do_train', '--max_samples', '20000000', '--model_name_or_path', '/share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500', '--dataset', 'cpt_sft_v5_i1v3', '--dataset_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/data', '--template', 'default', '--finetuning_type', 'full', '--output_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3', '--overwrite_cache', '--overwrite_output_dir', '--cutoff_len', '2048', '--packing', 'False', '--preprocessing_num_workers', '192', '--per_device_train_batch_size', '1', '--gradient_accumulation_steps', '10', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--warmup_steps', '20', '--report_to', 'wandb', '--run_name', 'CPT14b_v5_I1v3', '--save_steps', '100', '--save_total_limit', '1000', '--flash_attn', 'fa2', '--learning_rate', '1e-4', '--num_train_epochs', '2', '--plot_loss', '--fp16', '--eval_on_start', 'False'] 10.82.142.17: [2025-06-30 13:51:28,826] [INFO] [launch.py:253:main] process 329 spawned with command: ['/root/miniconda3/envs/llama_factory/bin/python', '-u', '/share/liangqingyuan/LLaMA-Factory/src/train.py', '--local_rank=5', '--deepspeed', '/share/liangqingyuan/LLaMA-Factory/examples/deepspeed/ds_z2_config.json', '--stage', 'pt', '--do_train', '--max_samples', '20000000', '--model_name_or_path', '/share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500', '--dataset', 'cpt_sft_v5_i1v3', '--dataset_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/data', '--template', 'default', '--finetuning_type', 'full', '--output_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3', '--overwrite_cache', '--overwrite_output_dir', '--cutoff_len', '2048', '--packing', 'False', '--preprocessing_num_workers', '192', '--per_device_train_batch_size', '1', '--gradient_accumulation_steps', '10', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--warmup_steps', '20', '--report_to', 'wandb', '--run_name', 'CPT14b_v5_I1v3', '--save_steps', '100', '--save_total_limit', '1000', '--flash_attn', 'fa2', '--learning_rate', '1e-4', '--num_train_epochs', '2', '--plot_loss', '--fp16', '--eval_on_start', 'False'] 10.82.142.17: [2025-06-30 13:51:28,827] [INFO] [launch.py:253:main] process 330 spawned with command: ['/root/miniconda3/envs/llama_factory/bin/python', '-u', '/share/liangqingyuan/LLaMA-Factory/src/train.py', '--local_rank=6', '--deepspeed', '/share/liangqingyuan/LLaMA-Factory/examples/deepspeed/ds_z2_config.json', '--stage', 'pt', '--do_train', '--max_samples', '20000000', '--model_name_or_path', '/share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500', '--dataset', 'cpt_sft_v5_i1v3', '--dataset_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/data', '--template', 'default', '--finetuning_type', 'full', '--output_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3', '--overwrite_cache', '--overwrite_output_dir', '--cutoff_len', '2048', '--packing', 'False', '--preprocessing_num_workers', '192', '--per_device_train_batch_size', '1', '--gradient_accumulation_steps', '10', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--warmup_steps', '20', '--report_to', 'wandb', '--run_name', 'CPT14b_v5_I1v3', '--save_steps', '100', '--save_total_limit', '1000', '--flash_attn', 'fa2', '--learning_rate', '1e-4', '--num_train_epochs', '2', '--plot_loss', '--fp16', '--eval_on_start', 'False'] 10.82.142.17: [2025-06-30 13:51:28,827] [INFO] [launch.py:253:main] process 331 spawned with command: ['/root/miniconda3/envs/llama_factory/bin/python', '-u', '/share/liangqingyuan/LLaMA-Factory/src/train.py', '--local_rank=7', '--deepspeed', '/share/liangqingyuan/LLaMA-Factory/examples/deepspeed/ds_z2_config.json', '--stage', 'pt', '--do_train', '--max_samples', '20000000', '--model_name_or_path', '/share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500', '--dataset', 'cpt_sft_v5_i1v3', '--dataset_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/data', '--template', 'default', '--finetuning_type', 'full', '--output_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3', '--overwrite_cache', '--overwrite_output_dir', '--cutoff_len', '2048', '--packing', 'False', '--preprocessing_num_workers', '192', '--per_device_train_batch_size', '1', '--gradient_accumulation_steps', '10', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--warmup_steps', '20', '--report_to', 'wandb', '--run_name', 'CPT14b_v5_I1v3', '--save_steps', '100', '--save_total_limit', '1000', '--flash_attn', 'fa2', '--learning_rate', '1e-4', '--num_train_epochs', '2', '--plot_loss', '--fp16', '--eval_on_start', 'False'] 10.82.142.23: [2025-06-30 13:51:28,881] [INFO] [launch.py:138:main] 1 NCCL_VERSION=2.18.3-1 10.82.142.23: [2025-06-30 13:51:28,881] [INFO] [launch.py:138:main] 1 NCCL_IB_DISABLE=1 10.82.142.23: [2025-06-30 13:51:28,881] [INFO] [launch.py:145:main] WORLD INFO DICT: {'10.82.142.18': [0, 1, 2, 3, 4, 5, 6, 7], '10.82.142.23': [0, 1, 2, 3, 4, 5, 6, 7], '10.82.142.17': [0, 1, 2, 3, 4, 5, 6, 7]} 10.82.142.23: [2025-06-30 13:51:28,881] [INFO] [launch.py:151:main] nnodes=3, num_local_procs=8, node_rank=1 10.82.142.23: [2025-06-30 13:51:28,881] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(, {'10.82.142.18': [0, 1, 2, 3, 4, 5, 6, 7], '10.82.142.23': [8, 9, 10, 11, 12, 13, 14, 15], '10.82.142.17': [16, 17, 18, 19, 20, 21, 22, 23]}) 10.82.142.23: [2025-06-30 13:51:28,881] [INFO] [launch.py:163:main] dist_world_size=24 10.82.142.23: [2025-06-30 13:51:28,881] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 10.82.142.23: [2025-06-30 13:51:28,882] [INFO] [launch.py:253:main] process 324 spawned with command: ['/root/miniconda3/envs/llama_factory/bin/python', '-u', '/share/liangqingyuan/LLaMA-Factory/src/train.py', '--local_rank=0', '--deepspeed', '/share/liangqingyuan/LLaMA-Factory/examples/deepspeed/ds_z2_config.json', '--stage', 'pt', '--do_train', '--max_samples', '20000000', '--model_name_or_path', '/share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500', '--dataset', 'cpt_sft_v5_i1v3', '--dataset_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/data', '--template', 'default', '--finetuning_type', 'full', '--output_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3', '--overwrite_cache', '--overwrite_output_dir', '--cutoff_len', '2048', '--packing', 'False', '--preprocessing_num_workers', '192', '--per_device_train_batch_size', '1', '--gradient_accumulation_steps', '10', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--warmup_steps', '20', '--report_to', 'wandb', '--run_name', 'CPT14b_v5_I1v3', '--save_steps', '100', '--save_total_limit', '1000', '--flash_attn', 'fa2', '--learning_rate', '1e-4', '--num_train_epochs', '2', '--plot_loss', '--fp16', '--eval_on_start', 'False'] 10.82.142.23: [2025-06-30 13:51:28,882] [INFO] [launch.py:253:main] process 325 spawned with command: ['/root/miniconda3/envs/llama_factory/bin/python', '-u', '/share/liangqingyuan/LLaMA-Factory/src/train.py', '--local_rank=1', '--deepspeed', '/share/liangqingyuan/LLaMA-Factory/examples/deepspeed/ds_z2_config.json', '--stage', 'pt', '--do_train', '--max_samples', '20000000', '--model_name_or_path', '/share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500', '--dataset', 'cpt_sft_v5_i1v3', '--dataset_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/data', '--template', 'default', '--finetuning_type', 'full', '--output_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3', '--overwrite_cache', '--overwrite_output_dir', '--cutoff_len', '2048', '--packing', 'False', '--preprocessing_num_workers', '192', '--per_device_train_batch_size', '1', '--gradient_accumulation_steps', '10', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--warmup_steps', '20', '--report_to', 'wandb', '--run_name', 'CPT14b_v5_I1v3', '--save_steps', '100', '--save_total_limit', '1000', '--flash_attn', 'fa2', '--learning_rate', '1e-4', '--num_train_epochs', '2', '--plot_loss', '--fp16', '--eval_on_start', 'False'] 10.82.142.23: [2025-06-30 13:51:28,882] [INFO] [launch.py:253:main] process 326 spawned with command: ['/root/miniconda3/envs/llama_factory/bin/python', '-u', '/share/liangqingyuan/LLaMA-Factory/src/train.py', '--local_rank=2', '--deepspeed', '/share/liangqingyuan/LLaMA-Factory/examples/deepspeed/ds_z2_config.json', '--stage', 'pt', '--do_train', '--max_samples', '20000000', '--model_name_or_path', '/share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500', '--dataset', 'cpt_sft_v5_i1v3', '--dataset_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/data', '--template', 'default', '--finetuning_type', 'full', '--output_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3', '--overwrite_cache', '--overwrite_output_dir', '--cutoff_len', '2048', '--packing', 'False', '--preprocessing_num_workers', '192', '--per_device_train_batch_size', '1', '--gradient_accumulation_steps', '10', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--warmup_steps', '20', '--report_to', 'wandb', '--run_name', 'CPT14b_v5_I1v3', '--save_steps', '100', '--save_total_limit', '1000', '--flash_attn', 'fa2', '--learning_rate', '1e-4', '--num_train_epochs', '2', '--plot_loss', '--fp16', '--eval_on_start', 'False'] 10.82.142.23: [2025-06-30 13:51:28,883] [INFO] [launch.py:253:main] process 327 spawned with command: ['/root/miniconda3/envs/llama_factory/bin/python', '-u', '/share/liangqingyuan/LLaMA-Factory/src/train.py', '--local_rank=3', '--deepspeed', '/share/liangqingyuan/LLaMA-Factory/examples/deepspeed/ds_z2_config.json', '--stage', 'pt', '--do_train', '--max_samples', '20000000', '--model_name_or_path', '/share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500', '--dataset', 'cpt_sft_v5_i1v3', '--dataset_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/data', '--template', 'default', '--finetuning_type', 'full', '--output_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3', '--overwrite_cache', '--overwrite_output_dir', '--cutoff_len', '2048', '--packing', 'False', '--preprocessing_num_workers', '192', '--per_device_train_batch_size', '1', '--gradient_accumulation_steps', '10', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--warmup_steps', '20', '--report_to', 'wandb', '--run_name', 'CPT14b_v5_I1v3', '--save_steps', '100', '--save_total_limit', '1000', '--flash_attn', 'fa2', '--learning_rate', '1e-4', '--num_train_epochs', '2', '--plot_loss', '--fp16', '--eval_on_start', 'False'] 10.82.142.23: [2025-06-30 13:51:28,883] [INFO] [launch.py:253:main] process 328 spawned with command: ['/root/miniconda3/envs/llama_factory/bin/python', '-u', '/share/liangqingyuan/LLaMA-Factory/src/train.py', '--local_rank=4', '--deepspeed', '/share/liangqingyuan/LLaMA-Factory/examples/deepspeed/ds_z2_config.json', '--stage', 'pt', '--do_train', '--max_samples', '20000000', '--model_name_or_path', '/share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500', '--dataset', 'cpt_sft_v5_i1v3', '--dataset_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/data', '--template', 'default', '--finetuning_type', 'full', '--output_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3', '--overwrite_cache', '--overwrite_output_dir', '--cutoff_len', '2048', '--packing', 'False', '--preprocessing_num_workers', '192', '--per_device_train_batch_size', '1', '--gradient_accumulation_steps', '10', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--warmup_steps', '20', '--report_to', 'wandb', '--run_name', 'CPT14b_v5_I1v3', '--save_steps', '100', '--save_total_limit', '1000', '--flash_attn', 'fa2', '--learning_rate', '1e-4', '--num_train_epochs', '2', '--plot_loss', '--fp16', '--eval_on_start', 'False'] 10.82.142.23: [2025-06-30 13:51:28,884] [INFO] [launch.py:253:main] process 329 spawned with command: ['/root/miniconda3/envs/llama_factory/bin/python', '-u', '/share/liangqingyuan/LLaMA-Factory/src/train.py', '--local_rank=5', '--deepspeed', '/share/liangqingyuan/LLaMA-Factory/examples/deepspeed/ds_z2_config.json', '--stage', 'pt', '--do_train', '--max_samples', '20000000', '--model_name_or_path', '/share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500', '--dataset', 'cpt_sft_v5_i1v3', '--dataset_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/data', '--template', 'default', '--finetuning_type', 'full', '--output_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3', '--overwrite_cache', '--overwrite_output_dir', '--cutoff_len', '2048', '--packing', 'False', '--preprocessing_num_workers', '192', '--per_device_train_batch_size', '1', '--gradient_accumulation_steps', '10', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--warmup_steps', '20', '--report_to', 'wandb', '--run_name', 'CPT14b_v5_I1v3', '--save_steps', '100', '--save_total_limit', '1000', '--flash_attn', 'fa2', '--learning_rate', '1e-4', '--num_train_epochs', '2', '--plot_loss', '--fp16', '--eval_on_start', 'False'] 10.82.142.23: [2025-06-30 13:51:28,884] [INFO] [launch.py:253:main] process 330 spawned with command: ['/root/miniconda3/envs/llama_factory/bin/python', '-u', '/share/liangqingyuan/LLaMA-Factory/src/train.py', '--local_rank=6', '--deepspeed', '/share/liangqingyuan/LLaMA-Factory/examples/deepspeed/ds_z2_config.json', '--stage', 'pt', '--do_train', '--max_samples', '20000000', '--model_name_or_path', '/share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500', '--dataset', 'cpt_sft_v5_i1v3', '--dataset_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/data', '--template', 'default', '--finetuning_type', 'full', '--output_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3', '--overwrite_cache', '--overwrite_output_dir', '--cutoff_len', '2048', '--packing', 'False', '--preprocessing_num_workers', '192', '--per_device_train_batch_size', '1', '--gradient_accumulation_steps', '10', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--warmup_steps', '20', '--report_to', 'wandb', '--run_name', 'CPT14b_v5_I1v3', '--save_steps', '100', '--save_total_limit', '1000', '--flash_attn', 'fa2', '--learning_rate', '1e-4', '--num_train_epochs', '2', '--plot_loss', '--fp16', '--eval_on_start', 'False'] 10.82.142.23: [2025-06-30 13:51:28,884] [INFO] [launch.py:253:main] process 331 spawned with command: ['/root/miniconda3/envs/llama_factory/bin/python', '-u', '/share/liangqingyuan/LLaMA-Factory/src/train.py', '--local_rank=7', '--deepspeed', '/share/liangqingyuan/LLaMA-Factory/examples/deepspeed/ds_z2_config.json', '--stage', 'pt', '--do_train', '--max_samples', '20000000', '--model_name_or_path', '/share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500', '--dataset', 'cpt_sft_v5_i1v3', '--dataset_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/data', '--template', 'default', '--finetuning_type', 'full', '--output_dir', '/share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3', '--overwrite_cache', '--overwrite_output_dir', '--cutoff_len', '2048', '--packing', 'False', '--preprocessing_num_workers', '192', '--per_device_train_batch_size', '1', '--gradient_accumulation_steps', '10', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--warmup_steps', '20', '--report_to', 'wandb', '--run_name', 'CPT14b_v5_I1v3', '--save_steps', '100', '--save_total_limit', '1000', '--flash_attn', 'fa2', '--learning_rate', '1e-4', '--num_train_epochs', '2', '--plot_loss', '--fp16', '--eval_on_start', 'False'] 10.82.142.18: [2025-06-30 13:51:32,500] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) 10.82.142.18: [2025-06-30 13:51:32,522] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) 10.82.142.18: [2025-06-30 13:51:32,684] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) 10.82.142.18: [2025-06-30 13:51:32,690] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) 10.82.142.18: [2025-06-30 13:51:32,710] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) 10.82.142.18: [2025-06-30 13:51:32,733] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) 10.82.142.18: [2025-06-30 13:51:32,769] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) 10.82.142.18: [2025-06-30 13:51:32,817] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) 10.82.142.17: [2025-06-30 13:51:33,286] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) 10.82.142.17: [2025-06-30 13:51:33,295] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) 10.82.142.17: [2025-06-30 13:51:33,297] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) 10.82.142.17: [2025-06-30 13:51:33,445] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) 10.82.142.23: [2025-06-30 13:51:33,480] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) 10.82.142.17: [2025-06-30 13:51:33,482] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) 10.82.142.23: [2025-06-30 13:51:33,522] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) 10.82.142.23: [2025-06-30 13:51:33,549] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) 10.82.142.17: [2025-06-30 13:51:33,551] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) 10.82.142.23: [2025-06-30 13:51:33,556] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) 10.82.142.23: [2025-06-30 13:51:33,590] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) 10.82.142.17: [2025-06-30 13:51:33,592] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) 10.82.142.23: [2025-06-30 13:51:33,596] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) 10.82.142.23: [2025-06-30 13:51:33,603] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) 10.82.142.17: [2025-06-30 13:51:33,608] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) 10.82.142.23: [2025-06-30 13:51:33,631] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) 10.82.142.18: [2025-06-30 13:51:33,716] [INFO] [comm.py:637:init_distributed] cdb=None 10.82.142.18: [2025-06-30 13:51:33,816] [INFO] [comm.py:637:init_distributed] cdb=None 10.82.142.18: [2025-06-30 13:51:33,819] [INFO] [comm.py:637:init_distributed] cdb=None 10.82.142.18: [2025-06-30 13:51:33,825] [INFO] [comm.py:637:init_distributed] cdb=None 10.82.142.18: [2025-06-30 13:51:33,840] [INFO] [comm.py:637:init_distributed] cdb=None 10.82.142.18: [2025-06-30 13:51:33,883] [INFO] [comm.py:637:init_distributed] cdb=None 10.82.142.18: [2025-06-30 13:51:33,883] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl 10.82.142.18: [2025-06-30 13:51:33,912] [INFO] [comm.py:637:init_distributed] cdb=None 10.82.142.18: [2025-06-30 13:51:33,957] [INFO] [comm.py:637:init_distributed] cdb=None 10.82.142.18: [INFO|2025-06-30 13:51:34] llamafactory.hparams.parser:384 >> Process rank: 4, device: cuda:4, n_gpu: 1, distributed training: True, compute dtype: torch.float16 10.82.142.18: [INFO|2025-06-30 13:51:34] llamafactory.hparams.parser:384 >> Process rank: 3, device: cuda:3, n_gpu: 1, distributed training: True, compute dtype: torch.float16 10.82.142.17: [2025-06-30 13:51:34,396] [INFO] [comm.py:637:init_distributed] cdb=None 10.82.142.17: [2025-06-30 13:51:34,396] [INFO] [comm.py:637:init_distributed] cdb=None 10.82.142.17: [2025-06-30 13:51:34,396] [INFO] [comm.py:637:init_distributed] cdb=None 10.82.142.17: [2025-06-30 13:51:34,551] [INFO] [comm.py:637:init_distributed] cdb=None 10.82.142.23: [2025-06-30 13:51:34,603] [INFO] [comm.py:637:init_distributed] cdb=None 10.82.142.17: [2025-06-30 13:51:34,611] [INFO] [comm.py:637:init_distributed] cdb=None 10.82.142.17: [2025-06-30 13:51:34,642] [INFO] [comm.py:637:init_distributed] cdb=None 10.82.142.23: [2025-06-30 13:51:34,646] [INFO] [comm.py:637:init_distributed] cdb=None 10.82.142.23: [2025-06-30 13:51:34,673] [INFO] [comm.py:637:init_distributed] cdb=None 10.82.142.17: [2025-06-30 13:51:34,691] [INFO] [comm.py:637:init_distributed] cdb=None 10.82.142.23: [2025-06-30 13:51:34,694] [INFO] [comm.py:637:init_distributed] cdb=None 10.82.142.23: [2025-06-30 13:51:34,705] [INFO] [comm.py:637:init_distributed] cdb=None 10.82.142.23: [2025-06-30 13:51:34,705] [INFO] [comm.py:637:init_distributed] cdb=None 10.82.142.23: [2025-06-30 13:51:34,711] [INFO] [comm.py:637:init_distributed] cdb=None 10.82.142.17: [2025-06-30 13:51:34,745] [INFO] [comm.py:637:init_distributed] cdb=None 10.82.142.23: [2025-06-30 13:51:34,844] [INFO] [comm.py:637:init_distributed] cdb=None 10.82.142.18: [INFO|2025-06-30 13:51:35] llamafactory.hparams.parser:384 >> Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: True, compute dtype: torch.float16 10.82.142.18: [INFO|configuration_utils.py:697] 2025-06-30 13:51:35,309 >> loading configuration file /share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500/config.json 10.82.142.18: [INFO|configuration_utils.py:771] 2025-06-30 13:51:35,310 >> Model config Qwen2Config { 10.82.142.18: "_name_or_path": "/share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500", 10.82.142.18: "architectures": [ 10.82.142.18: "Qwen2ForCausalLM" 10.82.142.18: ], 10.82.142.18: "attention_dropout": 0.0, 10.82.142.18: "bos_token_id": 151643, 10.82.142.18: "eos_token_id": 151643, 10.82.142.18: "hidden_act": "silu", 10.82.142.18: "hidden_size": 5120, 10.82.142.18: "initializer_range": 0.02, 10.82.142.18: "intermediate_size": 13824, 10.82.142.18: "max_position_embeddings": 32768, 10.82.142.18: "max_window_layers": 48, 10.82.142.18: "model_type": "qwen2", 10.82.142.18: "num_attention_heads": 40, 10.82.142.18: "num_hidden_layers": 48, 10.82.142.18: "num_key_value_heads": 8, 10.82.142.18: "rms_norm_eps": 1e-06, 10.82.142.18: "rope_scaling": null, 10.82.142.18: "rope_theta": 1000000.0, 10.82.142.18: "sliding_window": 131072, 10.82.142.18: "tie_word_embeddings": false, 10.82.142.18: "torch_dtype": "float16", 10.82.142.18: "transformers_version": "4.49.0", 10.82.142.18: "use_cache": true, 10.82.142.18: "use_sliding_window": false, 10.82.142.18: "vocab_size": 153078 10.82.142.18: } 10.82.142.18: 10.82.142.18: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,314 >> loading file vocab.json 10.82.142.18: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,314 >> loading file merges.txt 10.82.142.18: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,314 >> loading file tokenizer.json 10.82.142.18: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,314 >> loading file added_tokens.json 10.82.142.18: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,314 >> loading file special_tokens_map.json 10.82.142.18: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,314 >> loading file tokenizer_config.json 10.82.142.18: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,314 >> loading file chat_template.jinja 10.82.142.17: [INFO|2025-06-30 13:51:35] llamafactory.hparams.parser:384 >> Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: True, compute dtype: torch.float16 10.82.142.17: [INFO|configuration_utils.py:697] 2025-06-30 13:51:35,371 >> loading configuration file /share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500/config.json 10.82.142.17: [INFO|configuration_utils.py:771] 2025-06-30 13:51:35,373 >> Model config Qwen2Config { 10.82.142.17: "_name_or_path": "/share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500", 10.82.142.17: "architectures": [ 10.82.142.17: "Qwen2ForCausalLM" 10.82.142.17: ], 10.82.142.17: "attention_dropout": 0.0, 10.82.142.17: "bos_token_id": 151643, 10.82.142.17: "eos_token_id": 151643, 10.82.142.17: "hidden_act": "silu", 10.82.142.17: "hidden_size": 5120, 10.82.142.17: "initializer_range": 0.02, 10.82.142.17: "intermediate_size": 13824, 10.82.142.17: "max_position_embeddings": 32768, 10.82.142.17: "max_window_layers": 48, 10.82.142.17: "model_type": "qwen2", 10.82.142.17: "num_attention_heads": 40, 10.82.142.17: "num_hidden_layers": 48, 10.82.142.17: "num_key_value_heads": 8, 10.82.142.17: "rms_norm_eps": 1e-06, 10.82.142.17: "rope_scaling": null, 10.82.142.17: "rope_theta": 1000000.0, 10.82.142.17: "sliding_window": 131072, 10.82.142.17: "tie_word_embeddings": false, 10.82.142.17: "torch_dtype": "float16", 10.82.142.17: "transformers_version": "4.49.0", 10.82.142.17: "use_cache": true, 10.82.142.17: "use_sliding_window": false, 10.82.142.17: "vocab_size": 153078 10.82.142.17: } 10.82.142.17: 10.82.142.17: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,381 >> loading file vocab.json 10.82.142.17: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,381 >> loading file merges.txt 10.82.142.17: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,381 >> loading file tokenizer.json 10.82.142.17: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,381 >> loading file added_tokens.json 10.82.142.17: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,381 >> loading file special_tokens_map.json 10.82.142.17: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,381 >> loading file tokenizer_config.json 10.82.142.17: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,381 >> loading file chat_template.jinja 10.82.142.18: [INFO|2025-06-30 13:51:35] llamafactory.hparams.parser:384 >> Process rank: 1, device: cuda:1, n_gpu: 1, distributed training: True, compute dtype: torch.float16 10.82.142.18: [INFO|2025-06-30 13:51:35] llamafactory.hparams.parser:384 >> Process rank: 2, device: cuda:2, n_gpu: 1, distributed training: True, compute dtype: torch.float16 10.82.142.18: [INFO|2025-06-30 13:51:35] llamafactory.hparams.parser:384 >> Process rank: 7, device: cuda:7, n_gpu: 1, distributed training: True, compute dtype: torch.float16 10.82.142.23: [INFO|2025-06-30 13:51:35] llamafactory.hparams.parser:384 >> Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: True, compute dtype: torch.float16 10.82.142.23: [INFO|configuration_utils.py:697] 2025-06-30 13:51:35,462 >> loading configuration file /share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500/config.json 10.82.142.23: [INFO|configuration_utils.py:771] 2025-06-30 13:51:35,463 >> Model config Qwen2Config { 10.82.142.23: "_name_or_path": "/share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500", 10.82.142.23: "architectures": [ 10.82.142.23: "Qwen2ForCausalLM" 10.82.142.23: ], 10.82.142.23: "attention_dropout": 0.0, 10.82.142.23: "bos_token_id": 151643, 10.82.142.23: "eos_token_id": 151643, 10.82.142.23: "hidden_act": "silu", 10.82.142.23: "hidden_size": 5120, 10.82.142.23: "initializer_range": 0.02, 10.82.142.23: "intermediate_size": 13824, 10.82.142.23: "max_position_embeddings": 32768, 10.82.142.23: "max_window_layers": 48, 10.82.142.23: "model_type": "qwen2", 10.82.142.23: "num_attention_heads": 40, 10.82.142.23: "num_hidden_layers": 48, 10.82.142.23: "num_key_value_heads": 8, 10.82.142.23: "rms_norm_eps": 1e-06, 10.82.142.23: "rope_scaling": null, 10.82.142.23: "rope_theta": 1000000.0, 10.82.142.23: "sliding_window": 131072, 10.82.142.23: "tie_word_embeddings": false, 10.82.142.23: "torch_dtype": "float16", 10.82.142.23: "transformers_version": "4.49.0", 10.82.142.23: "use_cache": true, 10.82.142.23: "use_sliding_window": false, 10.82.142.23: "vocab_size": 153078 10.82.142.23: } 10.82.142.23: 10.82.142.23: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,471 >> loading file vocab.json 10.82.142.23: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,471 >> loading file merges.txt 10.82.142.23: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,471 >> loading file tokenizer.json 10.82.142.23: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,471 >> loading file added_tokens.json 10.82.142.23: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,471 >> loading file special_tokens_map.json 10.82.142.23: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,471 >> loading file tokenizer_config.json 10.82.142.23: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,471 >> loading file chat_template.jinja 10.82.142.17: [INFO|2025-06-30 13:51:35] llamafactory.hparams.parser:384 >> Process rank: 7, device: cuda:7, n_gpu: 1, distributed training: True, compute dtype: torch.float16 10.82.142.17: [INFO|2025-06-30 13:51:35] llamafactory.hparams.parser:384 >> Process rank: 2, device: cuda:2, n_gpu: 1, distributed training: True, compute dtype: torch.float16 10.82.142.18: [INFO|2025-06-30 13:51:35] llamafactory.hparams.parser:384 >> Process rank: 5, device: cuda:5, n_gpu: 1, distributed training: True, compute dtype: torch.float16 10.82.142.18: [INFO|2025-06-30 13:51:35] llamafactory.hparams.parser:384 >> Process rank: 6, device: cuda:6, n_gpu: 1, distributed training: True, compute dtype: torch.float16 10.82.142.17: [INFO|2025-06-30 13:51:35] llamafactory.hparams.parser:384 >> Process rank: 1, device: cuda:1, n_gpu: 1, distributed training: True, compute dtype: torch.float16 10.82.142.17: [INFO|2025-06-30 13:51:35] llamafactory.hparams.parser:384 >> Process rank: 5, device: cuda:5, n_gpu: 1, distributed training: True, compute dtype: torch.float16 10.82.142.17: [INFO|2025-06-30 13:51:35] llamafactory.hparams.parser:384 >> Process rank: 4, device: cuda:4, n_gpu: 1, distributed training: True, compute dtype: torch.float16 10.82.142.17: [INFO|2025-06-30 13:51:35] llamafactory.hparams.parser:384 >> Process rank: 3, device: cuda:3, n_gpu: 1, distributed training: True, compute dtype: torch.float16 10.82.142.17: [INFO|2025-06-30 13:51:35] llamafactory.hparams.parser:384 >> Process rank: 6, device: cuda:6, n_gpu: 1, distributed training: True, compute dtype: torch.float16 10.82.142.18: [INFO|tokenization_utils_base.py:2313] 2025-06-30 13:51:35,651 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 10.82.142.18: [INFO|configuration_utils.py:697] 2025-06-30 13:51:35,654 >> loading configuration file /share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500/config.json 10.82.142.18: [INFO|configuration_utils.py:771] 2025-06-30 13:51:35,656 >> Model config Qwen2Config { 10.82.142.18: "_name_or_path": "/share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500", 10.82.142.18: "architectures": [ 10.82.142.18: "Qwen2ForCausalLM" 10.82.142.18: ], 10.82.142.18: "attention_dropout": 0.0, 10.82.142.18: "bos_token_id": 151643, 10.82.142.18: "eos_token_id": 151643, 10.82.142.18: "hidden_act": "silu", 10.82.142.18: "hidden_size": 5120, 10.82.142.18: "initializer_range": 0.02, 10.82.142.18: "intermediate_size": 13824, 10.82.142.18: "max_position_embeddings": 32768, 10.82.142.18: "max_window_layers": 48, 10.82.142.18: "model_type": "qwen2", 10.82.142.18: "num_attention_heads": 40, 10.82.142.18: "num_hidden_layers": 48, 10.82.142.18: "num_key_value_heads": 8, 10.82.142.18: "rms_norm_eps": 1e-06, 10.82.142.18: "rope_scaling": null, 10.82.142.18: "rope_theta": 1000000.0, 10.82.142.18: "sliding_window": 131072, 10.82.142.18: "tie_word_embeddings": false, 10.82.142.18: "torch_dtype": "float16", 10.82.142.18: "transformers_version": "4.49.0", 10.82.142.18: "use_cache": true, 10.82.142.18: "use_sliding_window": false, 10.82.142.18: "vocab_size": 153078 10.82.142.18: } 10.82.142.18: 10.82.142.18: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,659 >> loading file vocab.json 10.82.142.18: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,659 >> loading file merges.txt 10.82.142.18: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,659 >> loading file tokenizer.json 10.82.142.18: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,659 >> loading file added_tokens.json 10.82.142.18: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,659 >> loading file special_tokens_map.json 10.82.142.18: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,659 >> loading file tokenizer_config.json 10.82.142.18: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,659 >> loading file chat_template.jinja 10.82.142.23: [INFO|2025-06-30 13:51:35] llamafactory.hparams.parser:384 >> Process rank: 4, device: cuda:4, n_gpu: 1, distributed training: True, compute dtype: torch.float16 10.82.142.23: [INFO|2025-06-30 13:51:35] llamafactory.hparams.parser:384 >> Process rank: 7, device: cuda:7, n_gpu: 1, distributed training: True, compute dtype: torch.float16 10.82.142.23: [INFO|2025-06-30 13:51:35] llamafactory.hparams.parser:384 >> Process rank: 3, device: cuda:3, n_gpu: 1, distributed training: True, compute dtype: torch.float16 10.82.142.23: [INFO|2025-06-30 13:51:35] llamafactory.hparams.parser:384 >> Process rank: 5, device: cuda:5, n_gpu: 1, distributed training: True, compute dtype: torch.float16 10.82.142.23: [INFO|2025-06-30 13:51:35] llamafactory.hparams.parser:384 >> Process rank: 1, device: cuda:1, n_gpu: 1, distributed training: True, compute dtype: torch.float16 10.82.142.23: [INFO|2025-06-30 13:51:35] llamafactory.hparams.parser:384 >> Process rank: 2, device: cuda:2, n_gpu: 1, distributed training: True, compute dtype: torch.float16 10.82.142.23: [INFO|2025-06-30 13:51:35] llamafactory.hparams.parser:384 >> Process rank: 6, device: cuda:6, n_gpu: 1, distributed training: True, compute dtype: torch.float16 10.82.142.17: [INFO|tokenization_utils_base.py:2313] 2025-06-30 13:51:35,784 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 10.82.142.17: [INFO|configuration_utils.py:697] 2025-06-30 13:51:35,787 >> loading configuration file /share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500/config.json 10.82.142.17: [INFO|configuration_utils.py:771] 2025-06-30 13:51:35,788 >> Model config Qwen2Config { 10.82.142.17: "_name_or_path": "/share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500", 10.82.142.17: "architectures": [ 10.82.142.17: "Qwen2ForCausalLM" 10.82.142.17: ], 10.82.142.17: "attention_dropout": 0.0, 10.82.142.17: "bos_token_id": 151643, 10.82.142.17: "eos_token_id": 151643, 10.82.142.17: "hidden_act": "silu", 10.82.142.17: "hidden_size": 5120, 10.82.142.17: "initializer_range": 0.02, 10.82.142.17: "intermediate_size": 13824, 10.82.142.17: "max_position_embeddings": 32768, 10.82.142.17: "max_window_layers": 48, 10.82.142.17: "model_type": "qwen2", 10.82.142.17: "num_attention_heads": 40, 10.82.142.17: "num_hidden_layers": 48, 10.82.142.17: "num_key_value_heads": 8, 10.82.142.17: "rms_norm_eps": 1e-06, 10.82.142.17: "rope_scaling": null, 10.82.142.17: "rope_theta": 1000000.0, 10.82.142.17: "sliding_window": 131072, 10.82.142.17: "tie_word_embeddings": false, 10.82.142.17: "torch_dtype": "float16", 10.82.142.17: "transformers_version": "4.49.0", 10.82.142.17: "use_cache": true, 10.82.142.17: "use_sliding_window": false, 10.82.142.17: "vocab_size": 153078 10.82.142.17: } 10.82.142.17: 10.82.142.17: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,791 >> loading file vocab.json 10.82.142.17: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,791 >> loading file merges.txt 10.82.142.17: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,791 >> loading file tokenizer.json 10.82.142.17: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,791 >> loading file added_tokens.json 10.82.142.17: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,791 >> loading file special_tokens_map.json 10.82.142.17: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,791 >> loading file tokenizer_config.json 10.82.142.17: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,792 >> loading file chat_template.jinja 10.82.142.23: [INFO|tokenization_utils_base.py:2313] 2025-06-30 13:51:35,812 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 10.82.142.23: [INFO|configuration_utils.py:697] 2025-06-30 13:51:35,816 >> loading configuration file /share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500/config.json 10.82.142.23: [INFO|configuration_utils.py:771] 2025-06-30 13:51:35,817 >> Model config Qwen2Config { 10.82.142.23: "_name_or_path": "/share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500", 10.82.142.23: "architectures": [ 10.82.142.23: "Qwen2ForCausalLM" 10.82.142.23: ], 10.82.142.23: "attention_dropout": 0.0, 10.82.142.23: "bos_token_id": 151643, 10.82.142.23: "eos_token_id": 151643, 10.82.142.23: "hidden_act": "silu", 10.82.142.23: "hidden_size": 5120, 10.82.142.23: "initializer_range": 0.02, 10.82.142.23: "intermediate_size": 13824, 10.82.142.23: "max_position_embeddings": 32768, 10.82.142.23: "max_window_layers": 48, 10.82.142.23: "model_type": "qwen2", 10.82.142.23: "num_attention_heads": 40, 10.82.142.23: "num_hidden_layers": 48, 10.82.142.23: "num_key_value_heads": 8, 10.82.142.23: "rms_norm_eps": 1e-06, 10.82.142.23: "rope_scaling": null, 10.82.142.23: "rope_theta": 1000000.0, 10.82.142.23: "sliding_window": 131072, 10.82.142.23: "tie_word_embeddings": false, 10.82.142.23: "torch_dtype": "float16", 10.82.142.23: "transformers_version": "4.49.0", 10.82.142.23: "use_cache": true, 10.82.142.23: "use_sliding_window": false, 10.82.142.23: "vocab_size": 153078 10.82.142.23: } 10.82.142.23: 10.82.142.23: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,820 >> loading file vocab.json 10.82.142.23: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,820 >> loading file merges.txt 10.82.142.23: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,820 >> loading file tokenizer.json 10.82.142.23: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,820 >> loading file added_tokens.json 10.82.142.23: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,820 >> loading file special_tokens_map.json 10.82.142.23: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,820 >> loading file tokenizer_config.json 10.82.142.23: [INFO|tokenization_utils_base.py:2048] 2025-06-30 13:51:35,820 >> loading file chat_template.jinja 10.82.142.18: [INFO|tokenization_utils_base.py:2313] 2025-06-30 13:51:36,003 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 10.82.142.18: [INFO|2025-06-30 13:51:36] llamafactory.data.loader:157 >> Loading dataset /share/liangqingyuan/GrammarCoder14B/data/processed/DataFilter/data_process/sft_data_i1_ResV3_05_V2.json... 10.82.142.17: [INFO|tokenization_utils_base.py:2313] 2025-06-30 13:51:36,137 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 10.82.142.23: [INFO|tokenization_utils_base.py:2313] 2025-06-30 13:51:36,157 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 10.82.142.17: [INFO|2025-06-30 13:51:36] llamafactory.data.loader:157 >> Loading dataset /share/liangqingyuan/GrammarCoder14B/data/processed/DataFilter/data_process/sft_data_i1_ResV3_05_V2.json... 10.82.142.23: [INFO|2025-06-30 13:51:36] llamafactory.data.loader:157 >> Loading dataset /share/liangqingyuan/GrammarCoder14B/data/processed/DataFilter/data_process/sft_data_i1_ResV3_05_V2.json... 10.82.142.18: Setting num_proc from 192 back to 1 for the train split to disable multiprocessing as it only contains one shard. 10.82.142.23: Setting num_proc from 192 back to 1 for the train split to disable multiprocessing as it only contains one shard. 10.82.142.17: Setting num_proc from 192 back to 1 for the train split to disable multiprocessing as it only contains one shard. 10.82.142.17: Generating train split: 0 examples [00:00, ? examples/s] Generating train split: 12788 examples [00:00, 104041.77 examples/s] Generating train split: 25629 examples [00:00, 107402.16 examples/s] Generating train split: 36867 examples [00:00, 108356.80 examples/s] Generating train split: 48144 examples [00:00, 108263.77 examples/s] Generating train split: 60842 examples [00:00, 111145.62 examples/s] Generating train split: 72407 examples [00:00, 97038.65 examples/s] Generating train split: 86748 examples [00:00, 104417.72 examples/s] Generating train split: 99484 examples [00:00, 109111.93 examples/s] Generating train split: 115301 examples [00:01, 104623.72 examples/s] Generating train split: 128035 examples [00:01, 103168.24 examples/s] Generating train split: 140865 examples [00:01, 101110.96 examples/s] Generating train split: 153757 examples [00:01, 102295.23 examples/s] Generating train split: 169457 examples [00:01, 99917.16 examples/s] Generating train split: 185192 examples [00:01, 96725.65 examples/s] Generating train split: 199451 examples [00:01, 90993.91 examples/s] Generating train split: 210738 examples [00:02, 89179.99 examples/s] Generating train split: 222257 examples [00:02, 89712.53 examples/s] Generating train split: 233583 examples [00:02, 91086.69 examples/s] Generating train split: 244816 examples [00:02, 85624.46 examples/s] Generating train split: 256070 examples [00:02, 88987.68 examples/s] Generating train split: 267488 examples [00:02, 89013.60 examples/s] Generating train split: 278797 examples [00:02, 88700.24 examples/s] Generating train split: 290252 examples [00:03, 89154.39 examples/s] Generating train split: 301683 examples [00:03, 88021.02 examples/s] Generating train split: 313229 examples [00:03, 90411.55 examples/s] Generating train split: 326275 examples [00:03, 97082.36 examples/s] Generating train split: 336328 examples [00:03, 94021.14 examples/s] Generating train split: 347683 examples [00:03, 95847.93 examples/s] Generating train split: 360346 examples [00:03, 97279.78 examples/s] Generating train split: 370373 examples [00:03, 93210.73 examples/s] Generating train split: 381740 examples [00:03, 91371.42 examples/s] Generating train split: 391540 examples [00:04, 90275.93 examples/s] Generating train split: 402985 examples [00:04, 92771.98 examples/s] Generating train split: 414474 examples [00:04, 92018.14 examples/s] Generating train split: 424554 examples [00:04, 91229.30 examples/s] Generating train split: 434536 examples [00:04, 90184.93 examples/s] Generating train split: 445839 examples [00:04, 94256.00 examples/s] Generating train split: 457200 examples [00:04, 96858.01 examples/s] Generating train split: 471256 examples [00:04, 91488.46 examples/s] Generating train split: 482716 examples [00:05, 91528.41 examples/s] Generating train split: 494335 examples [00:05, 92524.58 examples/s] Generating train split: 505922 examples [00:05, 95026.01 examples/s] Generating train split: 517312 examples [00:05, 92318.92 examples/s] Generating train split: 528576 examples [00:05, 89136.38 examples/s] Generating train split: 539893 examples [00:05, 90286.58 examples/s] Generating train split: 551161 examples [00:05, 92913.30 examples/s] Generating train split: 562594 examples [00:05, 93451.87 examples/s] Generating train split: 574066 examples [00:06, 92288.56 examples/s] Generating train split: 585620 examples [00:06, 90884.93 examples/s] Generating train split: 596812 examples [00:06, 90481.08 examples/s] Generating train split: 608111 examples [00:06, 89677.82 examples/s] Generating train split: 619335 examples [00:06, 91834.68 examples/s] Generating train split: 630875 examples [00:06, 94232.00 examples/s] Generating train split: 642294 examples [00:06, 91963.87 examples/s] Generating train split: 653450 examples [00:06, 92091.45 examples/s] Generating train split: 666471 examples [00:07, 96825.20 examples/s] Generating train split: 677810 examples [00:07, 99876.72 examples/s] Generating train split: 694816 examples [00:07, 98518.66 examples/s] Generating train split: 710300 examples [00:07, 95438.36 examples/s] Generating train split: 721779 examples [00:07, 97228.19 examples/s] Generating train split: 737361 examples [00:07, 96612.13 examples/s] Generating train split: 747367 examples [00:07, 96017.94 examples/s] Generating train split: 758847 examples [00:08, 98540.85 examples/s] Generating train split: 768818 examples [00:08, 97334.93 examples/s] Generating train split: 778711 examples [00:08, 97408.03 examples/s] Generating train split: 790007 examples [00:08, 98284.73 examples/s] Generating train split: 801465 examples [00:08, 99612.89 examples/s] Generating train split: 817067 examples [00:08, 96329.15 examples/s] Generating train split: 828594 examples [00:08, 92612.92 examples/s] Generating train split: 839843 examples [00:08, 93500.45 examples/s] Generating train split: 851250 examples [00:08, 92991.94 examples/s] Generating train split: 862661 examples [00:09, 95533.46 examples/s] Generating train split: 874213 examples [00:09, 95050.96 examples/s] Generating train split: 885489 examples [00:09, 95293.76 examples/s] Generating train split: 898327 examples [00:09, 97861.02 examples/s] Generating train split: 908251 examples [00:09, 96815.40 examples/s] Generating train split: 919586 examples [00:09, 97185.56 examples/s] Generating train split: 931139 examples [00:09, 96817.51 examples/s] Generating train split: 942354 examples [00:09, 95825.19 examples/s] Generating train split: 953486 examples [00:10, 98609.30 examples/s] Generating train split: 965052 examples [00:10, 98817.81 examples/s] Generating train split: 976473 examples [00:10, 99055.21 examples/s] Generating train split: 989208 examples [00:10, 98039.14 examples/s] Generating train split: 999100 examples [00:10, 93718.21 examples/s] Generating train split: 1010573 examples [00:10, 94429.68 examples/s] Generating train split: 1021932 examples [00:10, 96875.05 examples/s] Generating train split: 1033386 examples [00:10, 98866.50 examples/s] Generating train split: 1044715 examples [00:10, 97274.92 examples/s] Generating train split: 1056005 examples [00:11, 93604.28 examples/s] Generating train split: 1067521 examples [00:11, 94505.17 examples/s] Generating train split: 1078893 examples [00:11, 92897.85 examples/s] Generating train split: 1093285 examples [00:11, 100991.82 examples/s] Generating train split: 1104676 examples [00:11, 100084.57 examples/s] Generating train split: 1115980 examples [00:11, 101260.65 examples/s] Generating train split: 1130122 examples [00:11, 95209.31 examples/s] Generating train split: 1141548 examples [00:11, 97480.72 examples/s] Generating train split: 1151434 examples [00:12, 96718.15 examples/s] Generating train split: 1164307 examples [00:12, 101501.16 examples/s] Generating train split: 1177228 examples [00:12, 103124.88 examples/s] Generating train split: 1191555 examples [00:12, 96231.08 examples/s] Generating train split: 1202911 examples [00:12, 99973.61 examples/s] Generating train split: 1214154 examples [00:12, 96653.19 examples/s] Generating train split: 1225465 examples [00:12, 94958.36 examples/s] Generating train split: 1236646 examples [00:12, 94330.01 examples/s] Generating train split: 1250678 examples [00:13, 93469.62 examples/s] Generating train split: 1262106 examples [00:13, 95474.33 examples/s] Generating train split: 1273545 examples [00:13, 96069.89 examples/s] Generating train split: 1284945 examples [00:13, 96160.69 examples/s] Generating train split: 1297779 examples [00:13, 99489.66 examples/s] Generating train split: 1307765 examples [00:13, 97594.79 examples/s] Generating train split: 1317878 examples [00:13, 95984.18 examples/s] Generating train split: 1329371 examples [00:13, 96467.29 examples/s] Generating train split: 1343490 examples [00:14, 92179.21 examples/s] Generating train split: 1356218 examples [00:14, 96801.59 examples/s] Generating train split: 1367685 examples [00:14, 96738.82 examples/s] Generating train split: 1380479 examples [00:14, 100001.77 examples/s] Generating train split: 1393351 examples [00:14, 103495.78 examples/s] Generating train split: 1406142 examples [00:14, 95494.82 examples/s] Generating train split: 1417502 examples [00:14, 99070.03 examples/s] Generating train split: 1428898 examples [00:14, 98797.69 examples/s] Generating train split: 1440291 examples [00:15, 98465.68 examples/s] Generating train split: 1451926 examples [00:15, 98588.44 examples/s] Generating train split: 1464673 examples [00:15, 100700.41 examples/s] Generating train split: 1475989 examples [00:15, 101827.12 examples/s] Generating train split: 1491471 examples [00:15, 97537.90 examples/s] Generating train split: 1502994 examples [00:15, 93675.30 examples/s] Generating train split: 1513165 examples [00:15, 93640.41 examples/s] Generating train split: 1526355 examples [00:15, 97303.69 examples/s] Generating train split: 1537708 examples [00:16, 99609.30 examples/s] Generating train split: 1549278 examples [00:16, 98898.36 examples/s] Generating train split: 1559220 examples [00:16, 96552.17 examples/s] Generating train split: 1572097 examples [00:16, 98270.04 examples/s] Generating train split: 1583533 examples [00:16, 94910.55 examples/s] Generating train split: 1594833 examples [00:16, 94556.76 examples/s] Generating train split: 1606061 examples [00:16, 90423.78 examples/s] Generating train split: 1617409 examples [00:16, 90554.76 examples/s] Generating train split: 1628582 examples [00:17, 91148.15 examples/s] Generating train split: 1640181 examples [00:17, 88617.76 examples/s] Generating train split: 1651638 examples [00:17, 90514.79 examples/s] Generating train split: 1663149 examples [00:17, 93279.06 examples/s] Generating train split: 1674557 examples [00:17, 93613.46 examples/s] Generating train split: 1686111 examples [00:17, 96026.49 examples/s] Generating train split: 1697468 examples [00:17, 97303.48 examples/s] Generating train split: 1708913 examples [00:17, 97927.16 examples/s] Generating train split: 1720362 examples [00:17, 98774.30 examples/s] Generating train split: 1731969 examples [00:18, 102511.50 examples/s] Generating train split: 1743470 examples [00:18, 102007.41 examples/s] Generating train split: 1754825 examples [00:18, 101737.59 examples/s] Generating train split: 1765131 examples [00:18, 97465.19 examples/s] Generating train split: 1776466 examples [00:18, 98792.28 examples/s] Generating train split: 1787804 examples [00:18, 95222.48 examples/s] Generating train split: 1799399 examples [00:18, 97021.16 examples/s] Generating train split: 1810750 examples [00:18, 95503.00 examples/s] Generating train split: 1822159 examples [00:19, 96435.99 examples/s] Generating train split: 1833677 examples [00:19, 94613.54 examples/s] Generating train split: 1845195 examples [00:19, 95118.22 examples/s] Generating train split: 1856326 examples [00:19, 94191.46 examples/s] Generating train split: 1867638 examples [00:19, 92749.10 examples/s] Generating train split: 1879030 examples [00:19, 93085.69 examples/s] Generating train split: 1890386 examples [00:19, 93410.15 examples/s] Generating train split: 1901801 examples [00:19, 95785.10 examples/s] Generating train split: 1913104 examples [00:19, 97442.50 examples/s] Generating train split: 1924597 examples [00:20, 94418.60 examples/s] Generating train split: 1935778 examples [00:20, 93354.53 examples/s] Generating train split: 1946943 examples [00:20, 90578.53 examples/s] Generating train split: 1958529 examples [00:20, 95144.44 examples/s] Generating train split: 1971321 examples [00:20, 96338.88 examples/s] Generating train split: 1982811 examples [00:20, 96211.44 examples/s] Generating train split: 1994091 examples [00:20, 97199.09 examples/s] Generating train split: 2004071 examples [00:20, 94725.98 examples/s] Generating train split: 2015274 examples [00:21, 95346.78 examples/s] Generating train split: 2026688 examples [00:21, 92857.56 examples/s] Generating train split: 2039327 examples [00:21, 99447.02 examples/s] Generating train split: 2055048 examples [00:21, 97763.61 examples/s] Generating train split: 2066374 examples [00:21, 93036.53 examples/s] Generating train split: 2077835 examples [00:21, 94699.84 examples/s] Generating train split: 2090661 examples [00:21, 99914.08 examples/s] Generating train split: 2102069 examples [00:21, 98356.23 examples/s] Generating train split: 2113287 examples [00:22, 97197.87 examples/s] Generating train split: 2123297 examples [00:22, 96044.13 examples/s] Generating train split: 2136204 examples [00:22, 99762.34 examples/s] Generating train split: 2146218 examples [00:22, 95433.68 examples/s] Generating train split: 2157767 examples [00:22, 97901.56 examples/s] Generating train split: 2169082 examples [00:22, 98787.86 examples/s] Generating train split: 2180364 examples [00:22, 99350.20 examples/s] Generating train split: 2195770 examples [00:22, 92003.00 examples/s] Generating train split: 2208475 examples [00:23, 94475.43 examples/s] Generating train split: 2218423 examples [00:23, 93816.04 examples/s] Generating train split: 2229923 examples [00:23, 91687.44 examples/s] Generating train split: 2241423 examples [00:23, 94177.03 examples/s] Generating train split: 2252891 examples [00:23, 93152.16 examples/s] Generating train split: 2264175 examples [00:23, 94960.81 examples/s] Generating train split: 2275515 examples [00:23, 98551.83 examples/s] Generating train split: 2286907 examples [00:23, 99718.86 examples/s] Generating train split: 2298271 examples [00:23, 98747.91 examples/s] Generating train split: 2309743 examples [00:24, 99048.39 examples/s] Generating train split: 2322628 examples [00:24, 98198.66 examples/s] Generating train split: 2334077 examples [00:24, 98255.06 examples/s] Generating train split: 2346777 examples [00:24, 101232.41 examples/s] Generating train split: 2359519 examples [00:24, 99396.50 examples/s] Generating train split: 2372149 examples [00:24, 100178.95 examples/s] Generating train split: 2384705 examples [00:24, 101439.01 examples/s] Generating train split: 2400305 examples [00:25, 97945.78 examples/s] Generating train split: 2412991 examples [00:25, 99972.03 examples/s] Generating train split: 2427215 examples [00:25, 95706.88 examples/s] Generating train split: 2439972 examples [00:25, 93021.20 examples/s] Generating train split: 2451214 examples [00:25, 95588.28 examples/s] Generating train split: 2462654 examples [00:25, 97046.06 examples/s] Generating train split: 2473910 examples [00:25, 94251.17 examples/s] Generating train split: 2485333 examples [00:25, 95906.61 examples/s] Generating train split: 2496764 examples [00:26, 97233.24 examples/s] Generating train split: 2509655 examples [00:26, 100675.70 examples/s] Generating train split: 2525422 examples [00:26, 96525.07 examples/s] Generating train split: 2535422 examples [00:26, 95618.77 examples/s] Generating train split: 2549626 examples [00:26, 101412.71 examples/s] Generating train split: 2565288 examples [00:26, 99493.89 examples/s] Generating train split: 2575394 examples [00:26, 97330.99 examples/s] Generating train split: 2589527 examples [00:26, 103380.53 examples/s] Generating train split: 2601004 examples [00:27, 102598.80 examples/s] Generating train split: 2612356 examples [00:27, 102064.18 examples/s] Generating train split: 2623684 examples [00:27, 102286.18 examples/s] Generating train split: 2635296 examples [00:27, 97127.08 examples/s] Generating train split: 2645218 examples [00:27, 94519.64 examples/s] Generating train split: 2656657 examples [00:27, 95106.69 examples/s] Generating train split: 2668041 examples [00:27, 99060.49 examples/s] Generating train split: 2683548 examples [00:27, 92238.88 examples/s] Generating train split: 2694830 examples [00:28, 95057.78 examples/s] Generating train split: 2707520 examples [00:28, 95478.67 examples/s] Generating train split: 2718882 examples [00:28, 96144.88 examples/s] Generating train split: 2729018 examples [00:28, 93160.17 examples/s] Generating train split: 2740218 examples [00:28, 92659.44 examples/s] Generating train split: 2751700 examples [00:28, 91823.48 examples/s] Generating train split: 2764527 examples [00:28, 96956.72 examples/s] Generating train split: 2777347 examples [00:28, 101386.23 examples/s] Generating train split: 2792904 examples [00:29, 100760.97 examples/s] Generating train split: 2808211 examples [00:29, 98100.41 examples/s] Generating train split: 2821086 examples [00:29, 101985.96 examples/s] Generating train split: 2833782 examples [00:29, 100428.84 examples/s] Generating train split: 2845191 examples [00:29, 101327.67 examples/s] Generating train split: 2860753 examples [00:29, 96178.97 examples/s] Generating train split: 2870935 examples [00:29, 95357.72 examples/s] Generating train split: 2883683 examples [00:29, 98685.23 examples/s] Generating train split: 2897981 examples [00:30, 96088.33 examples/s] Generating train split: 2909282 examples [00:30, 96703.61 examples/s] Generating train split: 2920678 examples [00:30, 95244.71 examples/s] Generating train split: 2930600 examples [00:30, 93817.05 examples/s] Generating train split: 2943371 examples [00:30, 96774.02 examples/s] Generating train split: 2954846 examples [00:30, 97178.86 examples/s] Generating train split: 2966253 examples [00:30, 97979.09 examples/s] Generating train split: 2977708 examples [00:30, 98337.85 examples/s] Generating train split: 2989166 examples [00:31, 96771.20 examples/s] Generating train split: 3000578 examples [00:31, 98815.91 examples/s] Generating train split: 3013238 examples [00:31, 103500.83 examples/s] Generating train split: 3028926 examples [00:31, 96967.98 examples/s] Generating train split: 3040077 examples [00:31, 92679.18 examples/s] Generating train split: 3050054 examples [00:31, 91741.92 examples/s] Generating train split: 3062799 examples [00:31, 95010.04 examples/s] Generating train split: 3074315 examples [00:31, 96928.47 examples/s] Generating train split: 3085753 examples [00:32, 99578.12 examples/s] Generating train split: 3097135 examples [00:32, 96289.14 examples/s] Generating train split: 3108444 examples [00:32, 97061.63 examples/s] Generating train split: 3119674 examples [00:32, 94542.90 examples/s] Generating train split: 3130962 examples [00:32, 95401.90 examples/s] Generating train split: 3142460 examples [00:32, 98452.50 examples/s] Generating train split: 3153789 examples [00:32, 95815.11 examples/s] Generating train split: 3165109 examples [00:32, 93289.17 examples/s] Generating train split: 3176668 examples [00:33, 92992.30 examples/s] Generating train split: 3187753 examples [00:33, 94309.88 examples/s] Generating train split: 3199316 examples [00:33, 97756.74 examples/s] Generating train split: 3210632 examples [00:33, 94756.13 examples/s] Generating train split: 3220420 examples [00:33, 90303.06 examples/s] Generating train split: 3233122 examples [00:33, 95352.87 examples/s] Generating train split: 3244489 examples [00:33, 97068.65 examples/s] Generating train split: 3254256 examples [00:33, 94774.09 examples/s] Generating train split: 3265605 examples [00:33, 93688.03 examples/s] Generating train split: 3278517 examples [00:34, 98302.75 examples/s] Generating train split: 3291543 examples [00:34, 100454.75 examples/s] Generating train split: 3307143 examples [00:34, 97805.03 examples/s] Generating train split: 3319851 examples [00:34, 99701.82 examples/s] Generating train split: 3335549 examples [00:34, 94937.29 examples/s] Generating train split: 3346904 examples [00:34, 96639.61 examples/s] Generating train split: 3358498 examples [00:34, 99613.33 examples/s] Generating train split: 3369820 examples [00:35, 97132.40 examples/s] Generating train split: 3379805 examples [00:35, 95615.47 examples/s] Generating train split: 3391215 examples [00:35, 93607.33 examples/s] Generating train split: 3402424 examples [00:35, 93572.75 examples/s] Generating train split: 3413961 examples [00:35, 90113.17 examples/s] Generating train split: 3425285 examples [00:35, 93687.54 examples/s] Generating train split: 3436385 examples [00:35, 90873.40 examples/s] Generating train split: 3447888 examples [00:35, 91647.38 examples/s] Generating train split: 3459309 examples [00:35, 92830.08 examples/s] Generating train split: 3470673 examples [00:36, 95623.15 examples/s] Generating train split: 3482064 examples [00:36, 97076.11 examples/s] Generating train split: 3493415 examples [00:36, 94943.72 examples/s] Generating train split: 3504616 examples [00:36, 92023.87 examples/s] Generating train split: 3515987 examples [00:36, 90210.99 examples/s] Generating train split: 3527255 examples [00:36, 91168.50 examples/s] Generating train split: 3538780 examples [00:36, 94355.83 examples/s] Generating train split: 3550112 examples [00:36, 93857.10 examples/s] Generating train split: 3561440 examples [00:37, 95562.59 examples/s] Generating train split: 3571175 examples [00:37, 92849.15 examples/s] Generating train split: 3582394 examples [00:37, 93603.45 examples/s] Generating train split: 3595081 examples [00:37, 88936.66 examples/s] Generating train split: 3606384 examples [00:37, 92880.19 examples/s] Generating train split: 3616175 examples [00:37, 91237.11 examples/s] Generating train split: 3627572 examples [00:37, 91693.30 examples/s] Generating train split: 3638906 examples [00:37, 93957.95 examples/s] Generating train split: 3650256 examples [00:38, 93920.61 examples/s] Generating train split: 3661659 examples [00:38, 94213.35 examples/s] Generating train split: 3673032 examples [00:38, 91684.74 examples/s] Generating train split: 3683009 examples [00:38, 92568.22 examples/s] Generating train split: 3695980 examples [00:38, 96367.68 examples/s] Generating train split: 3707226 examples [00:38, 96271.95 examples/s] Generating train split: 3718542 examples [00:38, 91818.67 examples/s] Generating train split: 3730036 examples [00:38, 91852.78 examples/s] Generating train split: 3741576 examples [00:39, 93187.53 examples/s] Generating train split: 3753048 examples [00:39, 92433.84 examples/s] Generating train split: 3764486 examples [00:39, 92371.90 examples/s] Generating train split: 3776244 examples [00:39, 92194.41 examples/s] Generating train split: 3787508 examples [00:39, 89868.92 examples/s] Generating train split: 3798948 examples [00:39, 92446.41 examples/s] Generating train split: 3810178 examples [00:39, 93200.75 examples/s] Generating train split: 3821637 examples [00:39, 93648.18 examples/s] Generating train split: 3832990 examples [00:40, 93363.44 examples/s] Generating train split: 3844270 examples [00:40, 90494.45 examples/s] Generating train split: 3855643 examples [00:40, 90576.71 examples/s] Generating train split: 3866915 examples [00:40, 92376.84 examples/s] Generating train split: 3878305 examples [00:40, 94099.94 examples/s] Generating train split: 3889732 examples [00:40, 92661.16 examples/s] Generating train split: 3900990 examples [00:40, 92400.37 examples/s] Generating train split: 3912102 examples [00:40, 90254.87 examples/s] Generating train split: 3923329 examples [00:41, 90729.17 examples/s] Generating train split: 3934452 examples [00:41, 87244.11 examples/s] Generating train split: 3945882 examples [00:41, 90614.07 examples/s] Generating train split: 3955913 examples [00:41, 90807.18 examples/s] Generating train split: 3968380 examples [00:41, 94698.05 examples/s] Generating train split: 3979715 examples [00:41, 93598.47 examples/s] Generating train split: 3989763 examples [00:41, 85930.25 examples/s] Generating train split: 4001227 examples [00:41, 89767.12 examples/s] Generating train split: 4012808 examples [00:42, 87991.98 examples/s] Generating train split: 4024270 examples [00:42, 87069.91 examples/s] Generating train split: 4035840 examples [00:42, 90023.42 examples/s] Generating train split: 4047280 examples [00:42, 93733.05 examples/s] Generating train split: 4058700 examples [00:42, 95372.90 examples/s] Generating train split: 4070223 examples [00:42, 95289.74 examples/s] Generating train split: 4082003 examples [00:42, 95236.18 examples/s] Generating train split: 4093630 examples [00:42, 94234.94 examples/s] Generating train split: 4105043 examples [00:42, 92874.88 examples/s] Generating train split: 4116566 examples [00:43, 93068.30 examples/s] Generating train split: 4127928 examples [00:43, 95924.97 examples/s] Generating train split: 4139176 examples [00:43, 95483.88 examples/s] Generating train split: 4150429 examples [00:43, 94524.20 examples/s] Generating train split: 4161689 examples [00:43, 94190.23 examples/s] Generating train split: 4173121 examples [00:43, 93312.76 examples/s] Generating train split: 4184496 examples [00:43, 95959.70 examples/s] Generating train split: 4195815 examples [00:43, 95096.46 examples/s] Generating train split: 4207149 examples [00:44, 95406.17 examples/s] Generating train split: 4218546 examples [00:44, 93660.00 examples/s] Generating train split: 4229994 examples [00:44, 92189.07 examples/s] Generating train split: 4241463 examples [00:44, 95485.59 examples/s] Generating train split: 4252915 examples [00:44, 96465.01 examples/s] Generating train split: 4264485 examples [00:44, 96978.02 examples/s] Generating train split: 4274551 examples [00:44, 94700.36 examples/s] Generating train split: 4286089 examples [00:44, 95201.64 examples/s] Generating train split: 4297705 examples [00:45, 97475.77 examples/s] Generating train split: 4309032 examples [00:45, 97537.25 examples/s] Generating train split: 4320284 examples [00:45, 95056.36 examples/s] Generating train split: 4331696 examples [00:45, 91520.60 examples/s] Generating train split: 4343200 examples [00:45, 93815.86 examples/s] Generating train split: 4354640 examples [00:45, 96600.26 examples/s] Generating train split: 4366091 examples [00:45, 89965.12 examples/s] Generating train split: 4379077 examples [00:45, 93811.94 examples/s] Generating train split: 4388885 examples [00:45, 92593.39 examples/s] Generating train split: 4400238 examples [00:46, 93002.56 examples/s] Generating train split: 4411713 examples [00:46, 95271.04 examples/s] Generating train split: 4423243 examples [00:46, 92175.70 examples/s] Generating train split: 4434620 examples [00:46, 88304.03 examples/s] Generating train split: 4445948 examples [00:46, 92510.60 examples/s] Generating train split: 4457251 examples [00:46, 95789.66 examples/s] Generating train split: 4467206 examples [00:46, 95116.11 examples/s] Generating train split: 4478607 examples [00:46, 94639.18 examples/s] Generating train split: 4489972 examples [00:47, 92032.59 examples/s] Generating train split: 4499718 examples [00:47, 90604.63 examples/s] Generating train split: 4499718 examples [00:47, 95355.23 examples/s] 10.82.142.23: Generating train split: 0 examples [00:00, ? examples/s] Generating train split: 11361 examples [00:00, 102071.11 examples/s] Generating train split: 24195 examples [00:00, 112215.93 examples/s] Generating train split: 36867 examples [00:00, 109839.88 examples/s] Generating train split: 48144 examples [00:00, 108316.82 examples/s] Generating train split: 62285 examples [00:00, 112233.74 examples/s] Generating train split: 73811 examples [00:00, 110484.22 examples/s] Generating train split: 85362 examples [00:00, 106085.81 examples/s] Generating train split: 96696 examples [00:00, 101206.09 examples/s] Generating train split: 108166 examples [00:01, 97600.04 examples/s] Generating train split: 120927 examples [00:01, 101825.28 examples/s] Generating train split: 136572 examples [00:01, 100289.89 examples/s] Generating train split: 148012 examples [00:01, 99336.85 examples/s] Generating train split: 159490 examples [00:01, 96905.12 examples/s] Generating train split: 170901 examples [00:01, 97513.67 examples/s] Generating train split: 182337 examples [00:01, 95130.91 examples/s] Generating train split: 193712 examples [00:01, 91212.28 examples/s] Generating train split: 203711 examples [00:02, 90741.21 examples/s] Generating train split: 213653 examples [00:02, 89799.37 examples/s] Generating train split: 223652 examples [00:02, 89315.03 examples/s] Generating train split: 235005 examples [00:02, 91928.28 examples/s] Generating train split: 244816 examples [00:02, 83455.04 examples/s] Generating train split: 256070 examples [00:02, 87840.76 examples/s] Generating train split: 267488 examples [00:02, 87559.38 examples/s] Generating train split: 278797 examples [00:02, 87412.16 examples/s] Generating train split: 290252 examples [00:03, 89352.24 examples/s] Generating train split: 301683 examples [00:03, 87630.88 examples/s] Generating train split: 313229 examples [00:03, 89555.18 examples/s] Generating train split: 326275 examples [00:03, 96648.41 examples/s] Generating train split: 336328 examples [00:03, 95456.95 examples/s] Generating train split: 347683 examples [00:03, 96236.55 examples/s] Generating train split: 360346 examples [00:03, 95497.90 examples/s] Generating train split: 370373 examples [00:03, 93606.04 examples/s] Generating train split: 383094 examples [00:04, 90076.83 examples/s] Generating train split: 394446 examples [00:04, 92519.51 examples/s] Generating train split: 405867 examples [00:04, 95407.66 examples/s] Generating train split: 420273 examples [00:04, 93251.68 examples/s] Generating train split: 434536 examples [00:04, 89155.37 examples/s] Generating train split: 445839 examples [00:04, 93421.39 examples/s] Generating train split: 457200 examples [00:04, 96198.58 examples/s] Generating train split: 471256 examples [00:04, 91175.01 examples/s] Generating train split: 482716 examples [00:05, 90656.03 examples/s] Generating train split: 494335 examples [00:05, 93320.21 examples/s] Generating train split: 505922 examples [00:05, 94899.89 examples/s] Generating train split: 517312 examples [00:05, 92256.34 examples/s] Generating train split: 528576 examples [00:05, 87602.79 examples/s] Generating train split: 539893 examples [00:05, 92447.13 examples/s] Generating train split: 551161 examples [00:05, 91788.26 examples/s] Generating train split: 562594 examples [00:05, 94203.67 examples/s] Generating train split: 574066 examples [00:06, 92320.73 examples/s] Generating train split: 585620 examples [00:06, 89823.28 examples/s] Generating train split: 596812 examples [00:06, 90905.43 examples/s] Generating train split: 608111 examples [00:06, 89456.30 examples/s] Generating train split: 619335 examples [00:06, 91860.31 examples/s] Generating train split: 630875 examples [00:06, 93850.64 examples/s] Generating train split: 642294 examples [00:06, 92057.45 examples/s] Generating train split: 653450 examples [00:06, 92391.80 examples/s] Generating train split: 665030 examples [00:07, 96018.55 examples/s] Generating train split: 677810 examples [00:07, 99672.91 examples/s] Generating train split: 694816 examples [00:07, 97964.26 examples/s] Generating train split: 704727 examples [00:07, 97378.79 examples/s] Generating train split: 716008 examples [00:07, 96133.63 examples/s] Generating train split: 726067 examples [00:07, 95209.32 examples/s] Generating train split: 736005 examples [00:07, 95759.37 examples/s] Generating train split: 747367 examples [00:07, 96324.54 examples/s] Generating train split: 760279 examples [00:08, 97315.12 examples/s] Generating train split: 771700 examples [00:08, 99273.40 examples/s] Generating train split: 782931 examples [00:08, 98404.06 examples/s] Generating train split: 794231 examples [00:08, 99220.79 examples/s] Generating train split: 805693 examples [00:08, 98554.85 examples/s] Generating train split: 817067 examples [00:08, 96544.35 examples/s] Generating train split: 828594 examples [00:08, 93422.57 examples/s] Generating train split: 839843 examples [00:08, 93086.77 examples/s] Generating train split: 851250 examples [00:09, 92593.25 examples/s] Generating train split: 862661 examples [00:09, 95017.25 examples/s] Generating train split: 874213 examples [00:09, 95821.85 examples/s] Generating train split: 885489 examples [00:09, 95398.02 examples/s] Generating train split: 896884 examples [00:09, 98241.61 examples/s] Generating train split: 908251 examples [00:09, 96430.84 examples/s] Generating train split: 919586 examples [00:09, 97214.76 examples/s] Generating train split: 931139 examples [00:09, 96812.23 examples/s] Generating train split: 942354 examples [00:09, 95691.20 examples/s] Generating train split: 954855 examples [00:10, 99204.92 examples/s] Generating train split: 965052 examples [00:10, 96875.12 examples/s] Generating train split: 976473 examples [00:10, 99066.75 examples/s] Generating train split: 989208 examples [00:10, 98218.27 examples/s] Generating train split: 999100 examples [00:10, 93876.28 examples/s] Generating train split: 1010573 examples [00:10, 95497.65 examples/s] Generating train split: 1021932 examples [00:10, 97036.34 examples/s] Generating train split: 1033386 examples [00:10, 98092.56 examples/s] Generating train split: 1044715 examples [00:10, 97219.78 examples/s] Generating train split: 1056005 examples [00:11, 94052.04 examples/s] Generating train split: 1067521 examples [00:11, 93919.28 examples/s] Generating train split: 1078893 examples [00:11, 95192.32 examples/s] Generating train split: 1090457 examples [00:11, 99237.56 examples/s] Generating train split: 1101778 examples [00:11, 99419.07 examples/s] Generating train split: 1113079 examples [00:11, 100035.23 examples/s] Generating train split: 1128679 examples [00:11, 95006.81 examples/s] Generating train split: 1140065 examples [00:11, 96743.07 examples/s] Generating train split: 1151434 examples [00:12, 97541.30 examples/s] Generating train split: 1164307 examples [00:12, 101733.95 examples/s] Generating train split: 1175810 examples [00:12, 104109.34 examples/s] Generating train split: 1191555 examples [00:12, 95705.91 examples/s] Generating train split: 1204368 examples [00:12, 100314.94 examples/s] Generating train split: 1219815 examples [00:12, 96310.85 examples/s] Generating train split: 1231074 examples [00:12, 94737.22 examples/s] Generating train split: 1240912 examples [00:13, 92986.07 examples/s] Generating train split: 1252111 examples [00:13, 94144.50 examples/s] Generating train split: 1263636 examples [00:13, 97355.70 examples/s] Generating train split: 1273545 examples [00:13, 95994.25 examples/s] Generating train split: 1284945 examples [00:13, 96247.43 examples/s] Generating train split: 1296368 examples [00:13, 98562.68 examples/s] Generating train split: 1306300 examples [00:13, 97583.44 examples/s] Generating train split: 1317878 examples [00:13, 95852.06 examples/s] Generating train split: 1330730 examples [00:13, 95404.19 examples/s] Generating train split: 1342059 examples [00:14, 96316.59 examples/s] Generating train split: 1353414 examples [00:14, 96377.07 examples/s] Generating train split: 1364786 examples [00:14, 96516.35 examples/s] Generating train split: 1376183 examples [00:14, 99537.89 examples/s] Generating train split: 1387659 examples [00:14, 100091.88 examples/s] Generating train split: 1400558 examples [00:14, 98177.46 examples/s] Generating train split: 1411920 examples [00:14, 97438.16 examples/s] Generating train split: 1424745 examples [00:14, 102248.57 examples/s] Generating train split: 1440291 examples [00:15, 98064.11 examples/s] Generating train split: 1451926 examples [00:15, 99003.71 examples/s] Generating train split: 1463233 examples [00:15, 98593.80 examples/s] Generating train split: 1474555 examples [00:15, 100032.38 examples/s] Generating train split: 1485878 examples [00:15, 96711.09 examples/s] Generating train split: 1497245 examples [00:15, 97399.76 examples/s] Generating train split: 1508767 examples [00:15, 94359.15 examples/s] Generating train split: 1521983 examples [00:15, 99279.93 examples/s] Generating train split: 1533476 examples [00:16, 98433.33 examples/s] Generating train split: 1544897 examples [00:16, 100333.36 examples/s] Generating train split: 1555075 examples [00:16, 95460.21 examples/s] Generating train split: 1567844 examples [00:16, 97947.23 examples/s] Generating train split: 1579134 examples [00:16, 98377.50 examples/s] Generating train split: 1589185 examples [00:16, 91997.61 examples/s] Generating train split: 1600458 examples [00:16, 91471.50 examples/s] Generating train split: 1611710 examples [00:16, 91015.91 examples/s] Generating train split: 1623046 examples [00:16, 89476.01 examples/s] Generating train split: 1634347 examples [00:17, 88828.63 examples/s] Generating train split: 1645877 examples [00:17, 91345.53 examples/s] Generating train split: 1657415 examples [00:17, 92463.05 examples/s] Generating train split: 1668969 examples [00:17, 93124.62 examples/s] Generating train split: 1680318 examples [00:17, 93375.73 examples/s] Generating train split: 1691795 examples [00:17, 95771.72 examples/s] Generating train split: 1703175 examples [00:17, 96634.00 examples/s] Generating train split: 1714736 examples [00:17, 100103.88 examples/s] Generating train split: 1726138 examples [00:18, 99966.67 examples/s] Generating train split: 1737785 examples [00:18, 99711.06 examples/s] Generating train split: 1749188 examples [00:18, 102757.40 examples/s] Generating train split: 1765131 examples [00:18, 99154.51 examples/s] Generating train split: 1776466 examples [00:18, 98730.80 examples/s] Generating train split: 1787804 examples [00:18, 95347.08 examples/s] Generating train split: 1799399 examples [00:18, 97455.79 examples/s] Generating train split: 1810750 examples [00:18, 95562.24 examples/s] Generating train split: 1822159 examples [00:19, 97315.30 examples/s] Generating train split: 1833677 examples [00:19, 95134.30 examples/s] Generating train split: 1845195 examples [00:19, 94580.19 examples/s] Generating train split: 1856326 examples [00:19, 94152.19 examples/s] Generating train split: 1867638 examples [00:19, 92941.26 examples/s] Generating train split: 1879030 examples [00:19, 92541.91 examples/s] Generating train split: 1890386 examples [00:19, 93687.42 examples/s] Generating train split: 1901801 examples [00:19, 95730.63 examples/s] Generating train split: 1914551 examples [00:19, 98347.75 examples/s] Generating train split: 1924597 examples [00:20, 94517.73 examples/s] Generating train split: 1935778 examples [00:20, 93229.03 examples/s] Generating train split: 1946943 examples [00:20, 90418.48 examples/s] Generating train split: 1958529 examples [00:20, 95086.43 examples/s] Generating train split: 1971321 examples [00:20, 95389.80 examples/s] Generating train split: 1982811 examples [00:20, 97864.02 examples/s] Generating train split: 1998387 examples [00:20, 93846.53 examples/s] Generating train split: 2009664 examples [00:21, 95909.67 examples/s] Generating train split: 2020985 examples [00:21, 94428.02 examples/s] Generating train split: 2033675 examples [00:21, 98062.87 examples/s] Generating train split: 2043622 examples [00:21, 97453.63 examples/s] Generating train split: 2055048 examples [00:21, 97918.70 examples/s] Generating train split: 2066374 examples [00:21, 92287.09 examples/s] Generating train split: 2077835 examples [00:21, 94707.77 examples/s] Generating train split: 2089248 examples [00:21, 97982.03 examples/s] Generating train split: 2102069 examples [00:21, 100251.20 examples/s] Generating train split: 2117577 examples [00:22, 97763.91 examples/s] Generating train split: 2128989 examples [00:22, 96558.99 examples/s] Generating train split: 2140463 examples [00:22, 96253.10 examples/s] Generating train split: 2153386 examples [00:22, 100306.20 examples/s] Generating train split: 2169082 examples [00:22, 97190.29 examples/s] Generating train split: 2180364 examples [00:22, 97732.91 examples/s] Generating train split: 2195770 examples [00:22, 92664.80 examples/s] Generating train split: 2208475 examples [00:23, 95181.95 examples/s] Generating train split: 2218423 examples [00:23, 94213.90 examples/s] Generating train split: 2229923 examples [00:23, 91559.43 examples/s] Generating train split: 2241423 examples [00:23, 94138.88 examples/s] Generating train split: 2252891 examples [00:23, 93866.58 examples/s] Generating train split: 2264175 examples [00:23, 94996.28 examples/s] Generating train split: 2275515 examples [00:23, 98769.48 examples/s] Generating train split: 2286907 examples [00:23, 99529.55 examples/s] Generating train split: 2298271 examples [00:23, 98284.34 examples/s] Generating train split: 2309743 examples [00:24, 98472.99 examples/s] Generating train split: 2321269 examples [00:24, 100199.45 examples/s] Generating train split: 2332589 examples [00:24, 97291.26 examples/s] Generating train split: 2345395 examples [00:24, 101446.21 examples/s] Generating train split: 2356721 examples [00:24, 100560.98 examples/s] Generating train split: 2367990 examples [00:24, 101282.59 examples/s] Generating train split: 2379099 examples [00:24, 100665.24 examples/s] Generating train split: 2394683 examples [00:24, 96894.14 examples/s] Generating train split: 2406009 examples [00:25, 98895.14 examples/s] Generating train split: 2417231 examples [00:25, 96796.05 examples/s] Generating train split: 2428551 examples [00:25, 98561.65 examples/s] Generating train split: 2439972 examples [00:25, 93642.07 examples/s] Generating train split: 2451214 examples [00:25, 95071.43 examples/s] Generating train split: 2462654 examples [00:25, 95934.12 examples/s] Generating train split: 2473910 examples [00:25, 93535.54 examples/s] Generating train split: 2485333 examples [00:25, 95562.50 examples/s] Generating train split: 2496764 examples [00:26, 97218.49 examples/s] Generating train split: 2509655 examples [00:26, 100943.13 examples/s] Generating train split: 2526802 examples [00:26, 97168.64 examples/s] Generating train split: 2538240 examples [00:26, 97334.74 examples/s] Generating train split: 2551093 examples [00:26, 102605.93 examples/s] Generating train split: 2565288 examples [00:26, 98193.48 examples/s] Generating train split: 2575394 examples [00:26, 96803.52 examples/s] Generating train split: 2586709 examples [00:26, 100630.26 examples/s] Generating train split: 2598083 examples [00:27, 100969.12 examples/s] Generating train split: 2609481 examples [00:27, 101096.01 examples/s] Generating train split: 2622277 examples [00:27, 102959.70 examples/s] Generating train split: 2638077 examples [00:27, 98459.79 examples/s] Generating train split: 2648113 examples [00:27, 96730.42 examples/s] Generating train split: 2659584 examples [00:27, 97617.42 examples/s] Generating train split: 2672172 examples [00:27, 92446.90 examples/s] Generating train split: 2683548 examples [00:27, 93401.11 examples/s] Generating train split: 2694830 examples [00:28, 96119.44 examples/s] Generating train split: 2707520 examples [00:28, 96056.48 examples/s] Generating train split: 2718882 examples [00:28, 96998.72 examples/s] Generating train split: 2729018 examples [00:28, 93219.24 examples/s] Generating train split: 2740218 examples [00:28, 92764.77 examples/s] Generating train split: 2751700 examples [00:28, 92011.02 examples/s] Generating train split: 2764527 examples [00:28, 96748.77 examples/s] Generating train split: 2777347 examples [00:28, 100744.75 examples/s] Generating train split: 2788722 examples [00:29, 100298.61 examples/s] Generating train split: 2799929 examples [00:29, 100518.75 examples/s] Generating train split: 2815354 examples [00:29, 100432.83 examples/s] Generating train split: 2826820 examples [00:29, 100124.13 examples/s] Generating train split: 2838154 examples [00:29, 101088.52 examples/s] Generating train split: 2849454 examples [00:29, 100696.41 examples/s] Generating train split: 2860753 examples [00:29, 94849.18 examples/s] Generating train split: 2870935 examples [00:29, 94652.29 examples/s] Generating train split: 2882249 examples [00:29, 97367.31 examples/s] Generating train split: 2892290 examples [00:30, 95152.73 examples/s] Generating train split: 2903684 examples [00:30, 94997.87 examples/s] Generating train split: 2914961 examples [00:30, 95004.10 examples/s] Generating train split: 2926464 examples [00:30, 97346.90 examples/s] Generating train split: 2936226 examples [00:30, 95578.20 examples/s] Generating train split: 2949139 examples [00:30, 97025.60 examples/s] Generating train split: 2960517 examples [00:30, 97668.34 examples/s] Generating train split: 2971905 examples [00:30, 98661.88 examples/s] Generating train split: 2983384 examples [00:31, 96562.63 examples/s] Generating train split: 2994836 examples [00:31, 97096.17 examples/s] Generating train split: 3006288 examples [00:31, 101362.64 examples/s] Generating train split: 3018970 examples [00:31, 103597.10 examples/s] Generating train split: 3034476 examples [00:31, 93629.83 examples/s] Generating train split: 3045818 examples [00:31, 93959.89 examples/s] Generating train split: 3058580 examples [00:31, 98046.14 examples/s] Generating train split: 3074315 examples [00:31, 95472.55 examples/s] Generating train split: 3085753 examples [00:32, 97317.92 examples/s] Generating train split: 3097135 examples [00:32, 95580.42 examples/s] Generating train split: 3108444 examples [00:32, 96673.96 examples/s] Generating train split: 3119674 examples [00:32, 94230.23 examples/s] Generating train split: 3130962 examples [00:32, 95489.15 examples/s] Generating train split: 3142460 examples [00:32, 97814.54 examples/s] Generating train split: 3153789 examples [00:32, 95567.27 examples/s] Generating train split: 3165109 examples [00:32, 92564.53 examples/s] Generating train split: 3176668 examples [00:33, 92501.34 examples/s] Generating train split: 3187753 examples [00:33, 94554.41 examples/s] Generating train split: 3199316 examples [00:33, 97625.85 examples/s] Generating train split: 3210632 examples [00:33, 95315.48 examples/s] Generating train split: 3220420 examples [00:33, 92682.74 examples/s] Generating train split: 3231756 examples [00:33, 94456.14 examples/s] Generating train split: 3243061 examples [00:33, 96150.29 examples/s] Generating train split: 3254256 examples [00:33, 94665.10 examples/s] Generating train split: 3265605 examples [00:33, 93592.86 examples/s] Generating train split: 3278517 examples [00:34, 98745.38 examples/s] Generating train split: 3290082 examples [00:34, 98650.72 examples/s] Generating train split: 3300035 examples [00:34, 96078.54 examples/s] Generating train split: 3311338 examples [00:34, 97129.89 examples/s] Generating train split: 3322766 examples [00:34, 98089.50 examples/s] Generating train split: 3334045 examples [00:34, 97337.11 examples/s] Generating train split: 3345527 examples [00:34, 96005.46 examples/s] Generating train split: 3357103 examples [00:34, 98649.37 examples/s] Generating train split: 3369820 examples [00:35, 98131.26 examples/s] Generating train split: 3379805 examples [00:35, 95923.38 examples/s] Generating train split: 3391215 examples [00:35, 94659.71 examples/s] Generating train split: 3402424 examples [00:35, 94161.56 examples/s] Generating train split: 3412461 examples [00:35, 92646.02 examples/s] Generating train split: 3422548 examples [00:35, 92745.16 examples/s] Generating train split: 3433765 examples [00:35, 96149.20 examples/s] Generating train split: 3447888 examples [00:35, 89667.68 examples/s] Generating train split: 3459309 examples [00:36, 91211.87 examples/s] Generating train split: 3470673 examples [00:36, 94025.32 examples/s] Generating train split: 3482064 examples [00:36, 96369.85 examples/s] Generating train split: 3493415 examples [00:36, 94764.56 examples/s] Generating train split: 3504616 examples [00:36, 92132.23 examples/s] Generating train split: 3515987 examples [00:36, 89837.60 examples/s] Generating train split: 3527255 examples [00:36, 91282.91 examples/s] Generating train split: 3538780 examples [00:36, 94761.80 examples/s] Generating train split: 3550112 examples [00:36, 93326.36 examples/s] Generating train split: 3561440 examples [00:37, 95751.53 examples/s] Generating train split: 3571175 examples [00:37, 92431.69 examples/s] Generating train split: 3582394 examples [00:37, 93281.71 examples/s] Generating train split: 3596429 examples [00:37, 89575.22 examples/s] Generating train split: 3607815 examples [00:37, 92578.98 examples/s] Generating train split: 3619007 examples [00:37, 90783.61 examples/s] Generating train split: 3629004 examples [00:37, 91897.43 examples/s] Generating train split: 3640274 examples [00:37, 93699.73 examples/s] Generating train split: 3651695 examples [00:38, 94495.23 examples/s] Generating train split: 3661659 examples [00:38, 94262.42 examples/s] Generating train split: 3673032 examples [00:38, 91475.66 examples/s] Generating train split: 3684433 examples [00:38, 93756.44 examples/s] Generating train split: 3695980 examples [00:38, 96934.99 examples/s] Generating train split: 3707226 examples [00:38, 95865.09 examples/s] Generating train split: 3718542 examples [00:38, 91030.78 examples/s] Generating train split: 3730036 examples [00:38, 91691.58 examples/s] Generating train split: 3741576 examples [00:39, 92960.92 examples/s] Generating train split: 3753048 examples [00:39, 92331.65 examples/s] Generating train split: 3764486 examples [00:39, 92447.84 examples/s] Generating train split: 3776244 examples [00:39, 91892.50 examples/s] Generating train split: 3787508 examples [00:39, 90138.10 examples/s] Generating train split: 3798948 examples [00:39, 92251.46 examples/s] Generating train split: 3810178 examples [00:39, 92925.75 examples/s] Generating train split: 3821637 examples [00:39, 94025.34 examples/s] Generating train split: 3832990 examples [00:40, 93407.52 examples/s] Generating train split: 3844270 examples [00:40, 90314.02 examples/s] Generating train split: 3855643 examples [00:40, 89828.62 examples/s] Generating train split: 3866915 examples [00:40, 92615.30 examples/s] Generating train split: 3878305 examples [00:40, 93945.43 examples/s] Generating train split: 3889732 examples [00:40, 93072.80 examples/s] Generating train split: 3900990 examples [00:40, 92637.60 examples/s] Generating train split: 3912102 examples [00:40, 90150.13 examples/s] Generating train split: 3923329 examples [00:41, 90494.96 examples/s] Generating train split: 3934452 examples [00:41, 88359.58 examples/s] Generating train split: 3945882 examples [00:41, 88894.39 examples/s] Generating train split: 3957302 examples [00:41, 92012.70 examples/s] Generating train split: 3968380 examples [00:41, 94446.05 examples/s] Generating train split: 3978338 examples [00:41, 92996.32 examples/s] Generating train split: 3992540 examples [00:41, 89695.89 examples/s] Generating train split: 4002642 examples [00:41, 89774.62 examples/s] Generating train split: 4012808 examples [00:42, 87230.35 examples/s] Generating train split: 4024270 examples [00:42, 86328.09 examples/s] Generating train split: 4035840 examples [00:42, 89948.60 examples/s] Generating train split: 4047280 examples [00:42, 93825.41 examples/s] Generating train split: 4058700 examples [00:42, 94921.25 examples/s] Generating train split: 4070223 examples [00:42, 94589.15 examples/s] Generating train split: 4082003 examples [00:42, 95235.84 examples/s] Generating train split: 4093630 examples [00:42, 94108.70 examples/s] Generating train split: 4105043 examples [00:42, 93481.75 examples/s] Generating train split: 4116566 examples [00:43, 92467.69 examples/s] Generating train split: 4127928 examples [00:43, 95894.98 examples/s] Generating train split: 4139176 examples [00:43, 94944.24 examples/s] Generating train split: 4150429 examples [00:43, 93950.38 examples/s] Generating train split: 4161689 examples [00:43, 95165.87 examples/s] Generating train split: 4173121 examples [00:43, 94390.04 examples/s] Generating train split: 4184496 examples [00:43, 96004.00 examples/s] Generating train split: 4195815 examples [00:43, 95241.30 examples/s] Generating train split: 4207149 examples [00:44, 96087.60 examples/s] Generating train split: 4218546 examples [00:44, 93089.86 examples/s] Generating train split: 4229994 examples [00:44, 91914.46 examples/s] Generating train split: 4241463 examples [00:44, 95418.97 examples/s] Generating train split: 4252915 examples [00:44, 96600.94 examples/s] Generating train split: 4264485 examples [00:44, 97337.04 examples/s] Generating train split: 4274551 examples [00:44, 94412.44 examples/s] Generating train split: 4286089 examples [00:44, 94671.97 examples/s] Generating train split: 4297705 examples [00:45, 97245.60 examples/s] Generating train split: 4309032 examples [00:45, 96856.38 examples/s] Generating train split: 4320284 examples [00:45, 95385.89 examples/s] Generating train split: 4331696 examples [00:45, 91848.51 examples/s] Generating train split: 4343200 examples [00:45, 93480.95 examples/s] Generating train split: 4354640 examples [00:45, 96715.88 examples/s] Generating train split: 4366091 examples [00:45, 91024.88 examples/s] Generating train split: 4377652 examples [00:45, 92859.46 examples/s] Generating train split: 4388885 examples [00:46, 92758.47 examples/s] Generating train split: 4400238 examples [00:46, 92640.41 examples/s] Generating train split: 4411713 examples [00:46, 94878.96 examples/s] Generating train split: 4423243 examples [00:46, 92386.55 examples/s] Generating train split: 4434620 examples [00:46, 89687.54 examples/s] Generating train split: 4445948 examples [00:46, 92145.43 examples/s] Generating train split: 4457251 examples [00:46, 95565.97 examples/s] Generating train split: 4467206 examples [00:46, 95873.10 examples/s] Generating train split: 4477162 examples [00:46, 94038.21 examples/s] Generating train split: 4487128 examples [00:47, 92808.08 examples/s] Generating train split: 4497202 examples [00:47, 89001.41 examples/s] Generating train split: 4499718 examples [00:47, 95325.46 examples/s] 10.82.142.18: Generating train split: 0 examples [00:00, ? examples/s] Generating train split: 8514 examples [00:00, 79633.36 examples/s] Generating train split: 18554 examples [00:00, 86274.09 examples/s] Generating train split: 28476 examples [00:00, 86851.88 examples/s] Generating train split: 39695 examples [00:00, 88989.34 examples/s] Generating train split: 51037 examples [00:00, 91389.74 examples/s] Generating train split: 62285 examples [00:00, 92150.16 examples/s] Generating train split: 73811 examples [00:00, 89481.56 examples/s] Generating train split: 85362 examples [00:00, 90022.66 examples/s] Generating train split: 96696 examples [00:01, 93350.95 examples/s] Generating train split: 108166 examples [00:01, 91676.38 examples/s] Generating train split: 120927 examples [00:01, 97535.97 examples/s] Generating train split: 130894 examples [00:01, 95686.53 examples/s] Generating train split: 142278 examples [00:01, 96437.65 examples/s] Generating train split: 153757 examples [00:01, 97853.76 examples/s] Generating train split: 163787 examples [00:01, 97218.89 examples/s] Generating train split: 173670 examples [00:01, 95580.26 examples/s] Generating train split: 185192 examples [00:01, 93579.89 examples/s] Generating train split: 195216 examples [00:02, 88438.31 examples/s] Generating train split: 205082 examples [00:02, 89464.58 examples/s] Generating train split: 216556 examples [00:02, 88339.30 examples/s] Generating train split: 227821 examples [00:02, 89301.66 examples/s] Generating train split: 237730 examples [00:02, 89693.77 examples/s] Generating train split: 247647 examples [00:02, 85700.33 examples/s] Generating train split: 259017 examples [00:02, 88272.19 examples/s] Generating train split: 268881 examples [00:02, 87650.66 examples/s] Generating train split: 278797 examples [00:03, 88524.75 examples/s] Generating train split: 290252 examples [00:03, 89360.32 examples/s] Generating train split: 301683 examples [00:03, 87654.91 examples/s] Generating train split: 313229 examples [00:03, 90305.07 examples/s] Generating train split: 326275 examples [00:03, 96414.92 examples/s] Generating train split: 336328 examples [00:03, 94500.31 examples/s] Generating train split: 347683 examples [00:03, 95301.40 examples/s] Generating train split: 360346 examples [00:03, 97336.75 examples/s] Generating train split: 370373 examples [00:04, 93732.51 examples/s] Generating train split: 380301 examples [00:04, 89446.26 examples/s] Generating train split: 392994 examples [00:04, 91868.10 examples/s] Generating train split: 404447 examples [00:04, 95224.52 examples/s] Generating train split: 414474 examples [00:04, 91809.93 examples/s] Generating train split: 424554 examples [00:04, 91089.61 examples/s] Generating train split: 434536 examples [00:04, 90286.13 examples/s] Generating train split: 445839 examples [00:04, 94001.72 examples/s] Generating train split: 457200 examples [00:04, 96214.83 examples/s] Generating train split: 467024 examples [00:05, 93078.08 examples/s] Generating train split: 477075 examples [00:05, 91463.93 examples/s] Generating train split: 488474 examples [00:05, 92444.03 examples/s] Generating train split: 500148 examples [00:05, 93244.76 examples/s] Generating train split: 511621 examples [00:05, 92910.31 examples/s] Generating train split: 522977 examples [00:05, 90077.61 examples/s] Generating train split: 534117 examples [00:05, 89384.32 examples/s] Generating train split: 545524 examples [00:05, 91096.91 examples/s] Generating train split: 556816 examples [00:06, 93915.89 examples/s] Generating train split: 568247 examples [00:06, 93385.94 examples/s] Generating train split: 579940 examples [00:06, 94077.16 examples/s] Generating train split: 591219 examples [00:06, 89165.55 examples/s] Generating train split: 602506 examples [00:06, 91959.94 examples/s] Generating train split: 613656 examples [00:06, 90101.45 examples/s] Generating train split: 625085 examples [00:06, 93392.93 examples/s] Generating train split: 636543 examples [00:06, 90752.75 examples/s] Generating train split: 647846 examples [00:07, 91910.76 examples/s] Generating train split: 659144 examples [00:07, 94936.97 examples/s] Generating train split: 672217 examples [00:07, 98313.01 examples/s] Generating train split: 687746 examples [00:07, 96130.22 examples/s] Generating train split: 698982 examples [00:07, 98759.03 examples/s] Generating train split: 710300 examples [00:07, 95859.58 examples/s] Generating train split: 721779 examples [00:07, 96073.44 examples/s] Generating train split: 733172 examples [00:07, 95797.20 examples/s] Generating train split: 744509 examples [00:08, 98208.15 examples/s] Generating train split: 755921 examples [00:08, 98733.66 examples/s] Generating train split: 767355 examples [00:08, 97769.98 examples/s] Generating train split: 777306 examples [00:08, 96304.81 examples/s] Generating train split: 788515 examples [00:08, 98326.02 examples/s] Generating train split: 800060 examples [00:08, 99493.10 examples/s] Generating train split: 811402 examples [00:08, 97801.41 examples/s] Generating train split: 824412 examples [00:08, 98947.68 examples/s] Generating train split: 835626 examples [00:08, 92730.05 examples/s] Generating train split: 847004 examples [00:09, 92833.58 examples/s] Generating train split: 858477 examples [00:09, 93859.33 examples/s] Generating train split: 869879 examples [00:09, 93766.65 examples/s] Generating train split: 879731 examples [00:09, 92837.71 examples/s] Generating train split: 891127 examples [00:09, 97696.71 examples/s] Generating train split: 902637 examples [00:09, 96520.31 examples/s] Generating train split: 913960 examples [00:09, 97563.47 examples/s] Generating train split: 925302 examples [00:09, 95857.02 examples/s] Generating train split: 936817 examples [00:10, 97126.99 examples/s] Generating train split: 949323 examples [00:10, 100241.60 examples/s] Generating train split: 965052 examples [00:10, 97733.17 examples/s] Generating train split: 976473 examples [00:10, 98416.17 examples/s] Generating train split: 989208 examples [00:10, 97655.85 examples/s] Generating train split: 999100 examples [00:10, 94095.56 examples/s] Generating train split: 1010573 examples [00:10, 94439.15 examples/s] Generating train split: 1021932 examples [00:10, 96775.93 examples/s] Generating train split: 1034787 examples [00:11, 98332.07 examples/s] Generating train split: 1046138 examples [00:11, 98732.95 examples/s] Generating train split: 1063105 examples [00:11, 96309.72 examples/s] Generating train split: 1073244 examples [00:11, 92735.35 examples/s] Generating train split: 1086176 examples [00:11, 96893.38 examples/s] Generating train split: 1097408 examples [00:11, 99047.70 examples/s] Generating train split: 1108862 examples [00:11, 99646.57 examples/s] Generating train split: 1120185 examples [00:11, 96758.98 examples/s] Generating train split: 1130122 examples [00:12, 95216.78 examples/s] Generating train split: 1141548 examples [00:12, 98118.43 examples/s] Generating train split: 1151434 examples [00:12, 96283.43 examples/s] Generating train split: 1164307 examples [00:12, 101972.04 examples/s] Generating train split: 1175810 examples [00:12, 104384.46 examples/s] Generating train split: 1191555 examples [00:12, 96154.25 examples/s] Generating train split: 1204368 examples [00:12, 99469.88 examples/s] Generating train split: 1219815 examples [00:12, 96008.41 examples/s] Generating train split: 1231074 examples [00:13, 94727.30 examples/s] Generating train split: 1240912 examples [00:13, 92765.70 examples/s] Generating train split: 1256406 examples [00:13, 91414.13 examples/s] Generating train split: 1269229 examples [00:13, 96628.44 examples/s] Generating train split: 1280598 examples [00:13, 97780.35 examples/s] Generating train split: 1291989 examples [00:13, 97429.65 examples/s] Generating train split: 1303545 examples [00:13, 100106.79 examples/s] Generating train split: 1319316 examples [00:13, 96655.57 examples/s] Generating train split: 1330730 examples [00:14, 95022.39 examples/s] Generating train split: 1342059 examples [00:14, 95598.71 examples/s] Generating train split: 1353414 examples [00:14, 95622.43 examples/s] Generating train split: 1364786 examples [00:14, 96546.44 examples/s] Generating train split: 1376183 examples [00:14, 99816.32 examples/s] Generating train split: 1387659 examples [00:14, 99530.21 examples/s] Generating train split: 1400558 examples [00:14, 96629.75 examples/s] Generating train split: 1411920 examples [00:14, 98457.98 examples/s] Generating train split: 1423262 examples [00:15, 101136.42 examples/s] Generating train split: 1434543 examples [00:15, 97946.87 examples/s] Generating train split: 1446050 examples [00:15, 100014.84 examples/s] Generating train split: 1458940 examples [00:15, 99095.73 examples/s] Generating train split: 1471697 examples [00:15, 102733.04 examples/s] Generating train split: 1483036 examples [00:15, 99399.97 examples/s] Generating train split: 1494387 examples [00:15, 99144.92 examples/s] Generating train split: 1508767 examples [00:15, 93180.56 examples/s] Generating train split: 1520525 examples [00:16, 96740.99 examples/s] Generating train split: 1533476 examples [00:16, 98217.96 examples/s] Generating train split: 1544897 examples [00:16, 99891.80 examples/s] Generating train split: 1555075 examples [00:16, 96356.17 examples/s] Generating train split: 1567844 examples [00:16, 98215.17 examples/s] Generating train split: 1583533 examples [00:16, 95029.01 examples/s] Generating train split: 1594833 examples [00:16, 92583.03 examples/s] Generating train split: 1606061 examples [00:16, 90684.20 examples/s] Generating train split: 1617409 examples [00:17, 90114.54 examples/s] Generating train split: 1628582 examples [00:17, 90039.47 examples/s] Generating train split: 1640181 examples [00:17, 88339.73 examples/s] Generating train split: 1651638 examples [00:17, 90600.77 examples/s] Generating train split: 1663149 examples [00:17, 93200.94 examples/s] Generating train split: 1674557 examples [00:17, 92873.99 examples/s] Generating train split: 1686111 examples [00:17, 96365.70 examples/s] Generating train split: 1697468 examples [00:17, 97241.30 examples/s] Generating train split: 1708913 examples [00:18, 97647.50 examples/s] Generating train split: 1720362 examples [00:18, 99355.99 examples/s] Generating train split: 1733457 examples [00:18, 102051.11 examples/s] Generating train split: 1744951 examples [00:18, 102841.46 examples/s] Generating train split: 1757795 examples [00:18, 100599.89 examples/s] Generating train split: 1769264 examples [00:18, 99768.59 examples/s] Generating train split: 1779317 examples [00:18, 95796.27 examples/s] Generating train split: 1790696 examples [00:18, 97743.65 examples/s] Generating train split: 1802210 examples [00:18, 97131.46 examples/s] Generating train split: 1812237 examples [00:19, 95824.47 examples/s] Generating train split: 1823578 examples [00:19, 98122.67 examples/s] Generating train split: 1833677 examples [00:19, 94202.48 examples/s] Generating train split: 1845195 examples [00:19, 93512.15 examples/s] Generating train split: 1856326 examples [00:19, 93820.07 examples/s] Generating train split: 1867638 examples [00:19, 91855.23 examples/s] Generating train split: 1879030 examples [00:19, 94440.96 examples/s] Generating train split: 1890386 examples [00:19, 93076.28 examples/s] Generating train split: 1901801 examples [00:20, 94472.29 examples/s] Generating train split: 1914551 examples [00:20, 98063.16 examples/s] Generating train split: 1924597 examples [00:20, 94378.98 examples/s] Generating train split: 1935778 examples [00:20, 92585.47 examples/s] Generating train split: 1946943 examples [00:20, 90339.16 examples/s] Generating train split: 1958529 examples [00:20, 95202.60 examples/s] Generating train split: 1971321 examples [00:20, 95414.29 examples/s] Generating train split: 1982811 examples [00:20, 98478.54 examples/s] Generating train split: 1998387 examples [00:21, 93174.38 examples/s] Generating train split: 2009664 examples [00:21, 96428.63 examples/s] Generating train split: 2020985 examples [00:21, 96162.85 examples/s] Generating train split: 2032295 examples [00:21, 96372.87 examples/s] Generating train split: 2043622 examples [00:21, 97373.35 examples/s] Generating train split: 2056343 examples [00:21, 96836.57 examples/s] Generating train split: 2066374 examples [00:21, 91800.80 examples/s] Generating train split: 2077835 examples [00:21, 95073.65 examples/s] Generating train split: 2090661 examples [00:21, 99877.55 examples/s] Generating train split: 2102069 examples [00:22, 99833.58 examples/s] Generating train split: 2113287 examples [00:22, 97337.84 examples/s] Generating train split: 2124774 examples [00:22, 97748.77 examples/s] Generating train split: 2136204 examples [00:22, 99507.31 examples/s] Generating train split: 2146218 examples [00:22, 95496.70 examples/s] Generating train split: 2157767 examples [00:22, 97152.42 examples/s] Generating train split: 2169082 examples [00:22, 98007.73 examples/s] Generating train split: 2178929 examples [00:22, 96735.67 examples/s] Generating train split: 2190102 examples [00:23, 92550.13 examples/s] Generating train split: 2201406 examples [00:23, 92350.67 examples/s] Generating train split: 2212719 examples [00:23, 95859.16 examples/s] Generating train split: 2224152 examples [00:23, 91901.98 examples/s] Generating train split: 2235713 examples [00:23, 92049.37 examples/s] Generating train split: 2247172 examples [00:23, 96733.09 examples/s] Generating train split: 2258507 examples [00:23, 93211.87 examples/s] Generating train split: 2269830 examples [00:23, 97208.65 examples/s] Generating train split: 2281256 examples [00:23, 98288.41 examples/s] Generating train split: 2292580 examples [00:24, 101289.99 examples/s] Generating train split: 2303991 examples [00:24, 98070.34 examples/s] Generating train split: 2315495 examples [00:24, 97898.46 examples/s] Generating train split: 2328344 examples [00:24, 99754.91 examples/s] Generating train split: 2341147 examples [00:24, 100882.42 examples/s] Generating train split: 2352471 examples [00:24, 100256.86 examples/s] Generating train split: 2363723 examples [00:24, 100810.01 examples/s] Generating train split: 2374907 examples [00:24, 100157.05 examples/s] Generating train split: 2387436 examples [00:25, 98389.55 examples/s] Generating train split: 2398853 examples [00:25, 98874.60 examples/s] Generating train split: 2410239 examples [00:25, 100514.60 examples/s] Generating train split: 2421488 examples [00:25, 98314.19 examples/s] Generating train split: 2431450 examples [00:25, 96692.20 examples/s] Generating train split: 2445604 examples [00:25, 94378.92 examples/s] Generating train split: 2456965 examples [00:25, 93298.34 examples/s] Generating train split: 2468300 examples [00:25, 94486.61 examples/s] Generating train split: 2479512 examples [00:26, 93509.67 examples/s] Generating train split: 2492483 examples [00:26, 96210.44 examples/s] Generating train split: 2503954 examples [00:26, 98407.02 examples/s] Generating train split: 2513894 examples [00:26, 98093.55 examples/s] Generating train split: 2526802 examples [00:26, 97915.00 examples/s] Generating train split: 2538240 examples [00:26, 97813.09 examples/s] Generating train split: 2551093 examples [00:26, 102861.37 examples/s] Generating train split: 2562453 examples [00:26, 97301.65 examples/s] Generating train split: 2575394 examples [00:26, 97128.37 examples/s] Generating train split: 2588091 examples [00:27, 99740.15 examples/s] Generating train split: 2601004 examples [00:27, 103955.84 examples/s] Generating train split: 2612356 examples [00:27, 102209.30 examples/s] Generating train split: 2623684 examples [00:27, 103113.70 examples/s] Generating train split: 2638077 examples [00:27, 98008.23 examples/s] Generating train split: 2648113 examples [00:27, 95501.21 examples/s] Generating train split: 2659584 examples [00:27, 96925.21 examples/s] Generating train split: 2670863 examples [00:27, 97173.50 examples/s] Generating train split: 2684935 examples [00:28, 92329.20 examples/s] Generating train split: 2696238 examples [00:28, 94461.20 examples/s] Generating train split: 2706085 examples [00:28, 94366.29 examples/s] Generating train split: 2717495 examples [00:28, 94872.22 examples/s] Generating train split: 2729018 examples [00:28, 93481.38 examples/s] Generating train split: 2740218 examples [00:28, 93571.34 examples/s] Generating train split: 2751700 examples [00:28, 92115.48 examples/s] Generating train split: 2763073 examples [00:28, 92125.23 examples/s] Generating train split: 2777347 examples [00:29, 102229.30 examples/s] Generating train split: 2788722 examples [00:29, 100932.71 examples/s] Generating train split: 2799929 examples [00:29, 101319.41 examples/s] Generating train split: 2815354 examples [00:29, 100834.71 examples/s] Generating train split: 2826820 examples [00:29, 100610.82 examples/s] Generating train split: 2839616 examples [00:29, 102110.32 examples/s] Generating train split: 2850898 examples [00:29, 101095.77 examples/s] Generating train split: 2865152 examples [00:29, 93694.46 examples/s] Generating train split: 2876666 examples [00:30, 97212.44 examples/s] Generating train split: 2892290 examples [00:30, 95233.52 examples/s] Generating train split: 2903684 examples [00:30, 96122.36 examples/s] Generating train split: 2914961 examples [00:30, 94966.58 examples/s] Generating train split: 2926464 examples [00:30, 96630.02 examples/s] Generating train split: 2937624 examples [00:30, 96120.09 examples/s] Generating train split: 2949139 examples [00:30, 97411.08 examples/s] Generating train split: 2960517 examples [00:30, 96900.38 examples/s] Generating train split: 2971905 examples [00:31, 99235.73 examples/s] Generating train split: 2983384 examples [00:31, 95519.54 examples/s] Generating train split: 2994836 examples [00:31, 97801.72 examples/s] Generating train split: 3007712 examples [00:31, 102462.19 examples/s] Generating train split: 3020328 examples [00:31, 102951.67 examples/s] Generating train split: 3034476 examples [00:31, 93834.18 examples/s] Generating train split: 3045818 examples [00:31, 93279.49 examples/s] Generating train split: 3057176 examples [00:31, 95421.44 examples/s] Generating train split: 3068493 examples [00:32, 95097.23 examples/s] Generating train split: 3079981 examples [00:32, 97259.66 examples/s] Generating train split: 3091505 examples [00:32, 95887.81 examples/s] Generating train split: 3102834 examples [00:32, 96070.19 examples/s] Generating train split: 3114046 examples [00:32, 96879.80 examples/s] Generating train split: 3125348 examples [00:32, 94419.76 examples/s] Generating train split: 3136801 examples [00:32, 96918.33 examples/s] Generating train split: 3148286 examples [00:32, 99320.28 examples/s] Generating train split: 3159316 examples [00:33, 90926.30 examples/s] Generating train split: 3170988 examples [00:33, 93415.49 examples/s] Generating train split: 3182185 examples [00:33, 92823.69 examples/s] Generating train split: 3193544 examples [00:33, 95781.37 examples/s] Generating train split: 3205025 examples [00:33, 96077.00 examples/s] Generating train split: 3216225 examples [00:33, 95401.86 examples/s] Generating train split: 3227488 examples [00:33, 96583.62 examples/s] Generating train split: 3237327 examples [00:33, 94913.75 examples/s] Generating train split: 3248671 examples [00:33, 94379.46 examples/s] Generating train split: 3259963 examples [00:34, 92763.91 examples/s] Generating train split: 3272683 examples [00:34, 98073.93 examples/s] Generating train split: 3284343 examples [00:34, 97550.71 examples/s] Generating train split: 3294328 examples [00:34, 96184.15 examples/s] Generating train split: 3305758 examples [00:34, 97756.03 examples/s] Generating train split: 3318348 examples [00:34, 100407.90 examples/s] Generating train split: 3328472 examples [00:34, 97009.45 examples/s] Generating train split: 3339819 examples [00:34, 93707.08 examples/s] Generating train split: 3351196 examples [00:35, 98251.30 examples/s] Generating train split: 3362773 examples [00:35, 101415.94 examples/s] Generating train split: 3374161 examples [00:35, 98264.49 examples/s] Generating train split: 3385511 examples [00:35, 93277.27 examples/s] Generating train split: 3396761 examples [00:35, 92695.60 examples/s] Generating train split: 3408167 examples [00:35, 93930.46 examples/s] Generating train split: 3419704 examples [00:35, 91387.13 examples/s] Generating train split: 3430806 examples [00:35, 94025.35 examples/s] Generating train split: 3442074 examples [00:35, 89652.48 examples/s] Generating train split: 3453610 examples [00:36, 91801.68 examples/s] Generating train split: 3465093 examples [00:36, 87374.66 examples/s] Generating train split: 3476314 examples [00:36, 92643.71 examples/s] Generating train split: 3487698 examples [00:36, 97195.93 examples/s] Generating train split: 3498926 examples [00:36, 98484.99 examples/s] Generating train split: 3510335 examples [00:36, 92533.84 examples/s] Generating train split: 3521618 examples [00:36, 90897.78 examples/s] Generating train split: 3532923 examples [00:36, 93444.33 examples/s] Generating train split: 3544409 examples [00:37, 95026.99 examples/s] Generating train split: 3555818 examples [00:37, 94458.90 examples/s] Generating train split: 3567006 examples [00:37, 95639.18 examples/s] Generating train split: 3576687 examples [00:37, 92865.93 examples/s] Generating train split: 3586545 examples [00:37, 91349.57 examples/s] Generating train split: 3596429 examples [00:37, 89456.74 examples/s] Generating train split: 3607815 examples [00:37, 93151.11 examples/s] Generating train split: 3617605 examples [00:37, 91522.59 examples/s] Generating train split: 3629004 examples [00:37, 92181.79 examples/s] Generating train split: 3640274 examples [00:38, 93646.67 examples/s] Generating train split: 3651695 examples [00:38, 94769.17 examples/s] Generating train split: 3661659 examples [00:38, 93662.04 examples/s] Generating train split: 3673032 examples [00:38, 90588.27 examples/s] Generating train split: 3684433 examples [00:38, 93869.26 examples/s] Generating train split: 3697469 examples [00:38, 97592.97 examples/s] Generating train split: 3708646 examples [00:38, 96866.33 examples/s] Generating train split: 3718542 examples [00:38, 90942.53 examples/s] Generating train split: 3730036 examples [00:39, 91519.92 examples/s] Generating train split: 3741576 examples [00:39, 91962.00 examples/s] Generating train split: 3753048 examples [00:39, 92410.07 examples/s] Generating train split: 3764486 examples [00:39, 92565.38 examples/s] Generating train split: 3776244 examples [00:39, 92406.00 examples/s] Generating train split: 3787508 examples [00:39, 90255.12 examples/s] Generating train split: 3798948 examples [00:39, 92261.32 examples/s] Generating train split: 3810178 examples [00:39, 93894.57 examples/s] Generating train split: 3821637 examples [00:40, 93229.31 examples/s] Generating train split: 3832990 examples [00:40, 92822.01 examples/s] Generating train split: 3844270 examples [00:40, 90188.53 examples/s] Generating train split: 3855643 examples [00:40, 90795.51 examples/s] Generating train split: 3866915 examples [00:40, 92201.01 examples/s] Generating train split: 3878305 examples [00:40, 93466.81 examples/s] Generating train split: 3889732 examples [00:40, 92732.56 examples/s] Generating train split: 3900990 examples [00:40, 92556.08 examples/s] Generating train split: 3912102 examples [00:41, 90394.26 examples/s] Generating train split: 3923329 examples [00:41, 90437.27 examples/s] Generating train split: 3934452 examples [00:41, 87842.40 examples/s] Generating train split: 3945882 examples [00:41, 90264.07 examples/s] Generating train split: 3957302 examples [00:41, 92358.61 examples/s] Generating train split: 3968380 examples [00:41, 94036.52 examples/s] Generating train split: 3978338 examples [00:41, 92301.83 examples/s] Generating train split: 3992540 examples [00:41, 89305.54 examples/s] Generating train split: 4002642 examples [00:42, 89904.94 examples/s] Generating train split: 4012808 examples [00:42, 86940.98 examples/s] Generating train split: 4024270 examples [00:42, 86198.33 examples/s] Generating train split: 4035840 examples [00:42, 90214.51 examples/s] Generating train split: 4047280 examples [00:42, 93611.61 examples/s] Generating train split: 4058700 examples [00:42, 94874.73 examples/s] Generating train split: 4070223 examples [00:42, 95760.49 examples/s] Generating train split: 4082003 examples [00:42, 94848.45 examples/s] Generating train split: 4093630 examples [00:43, 94443.07 examples/s] Generating train split: 4105043 examples [00:43, 93237.39 examples/s] Generating train split: 4116566 examples [00:43, 92926.81 examples/s] Generating train split: 4127928 examples [00:43, 95874.89 examples/s] Generating train split: 4139176 examples [00:43, 95464.51 examples/s] Generating train split: 4150429 examples [00:43, 94048.49 examples/s] Generating train split: 4161689 examples [00:43, 93930.13 examples/s] Generating train split: 4173121 examples [00:43, 93439.93 examples/s] Generating train split: 4184496 examples [00:43, 95533.84 examples/s] Generating train split: 4195815 examples [00:44, 95217.01 examples/s] Generating train split: 4207149 examples [00:44, 96169.30 examples/s] Generating train split: 4218546 examples [00:44, 93722.89 examples/s] Generating train split: 4229994 examples [00:44, 92084.33 examples/s] Generating train split: 4241463 examples [00:44, 95413.02 examples/s] Generating train split: 4252915 examples [00:44, 96349.25 examples/s] Generating train split: 4264485 examples [00:44, 97138.69 examples/s] Generating train split: 4274551 examples [00:44, 94502.07 examples/s] Generating train split: 4286089 examples [00:45, 94629.97 examples/s] Generating train split: 4297705 examples [00:45, 97027.14 examples/s] Generating train split: 4309032 examples [00:45, 97184.11 examples/s] Generating train split: 4320284 examples [00:45, 95277.25 examples/s] Generating train split: 4331696 examples [00:45, 91737.69 examples/s] Generating train split: 4343200 examples [00:45, 93518.52 examples/s] Generating train split: 4354640 examples [00:45, 96977.17 examples/s] Generating train split: 4366091 examples [00:45, 90722.71 examples/s] Generating train split: 4376160 examples [00:46, 88711.62 examples/s] Generating train split: 4388885 examples [00:46, 94020.52 examples/s] Generating train split: 4400238 examples [00:46, 94674.23 examples/s] Generating train split: 4411713 examples [00:46, 95328.12 examples/s] Generating train split: 4423243 examples [00:46, 92941.74 examples/s] Generating train split: 4434620 examples [00:46, 89978.56 examples/s] Generating train split: 4445948 examples [00:46, 92042.45 examples/s] Generating train split: 4457251 examples [00:46, 95714.56 examples/s] Generating train split: 4467206 examples [00:46, 95075.06 examples/s] Generating train split: 4478607 examples [00:47, 95222.76 examples/s] Generating train split: 4488523 examples [00:47, 92211.81 examples/s] Generating train split: 4499718 examples [00:47, 90224.32 examples/s] Generating train split: 4499718 examples [00:47, 95009.33 examples/s] 10.82.142.23: Converting format of dataset (num_proc=192): 0%| | 0/4499718 [00:00= len(arr): 10.82.142.23: print("Error: Index is out of bounds.") 10.82.142.23: return arr 10.82.142.23: 10.82.142.23: # Shift the remaining elements to fill the empty space 10.82.142.23: for i in range(index, len(arr) - 1): 10.82.142.23: arr[i] = arr[i + 1] 10.82.142.23: 10.82.142.23: # Delete the last element 10.82.142.23: arr.pop() 10.82.142.23: 10.82.142.23: return arr 10.82.142.23: 10.82.142.23: 10.82.142.23: # Test the script 10.82.142.23: arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] 10.82.142.23: index = 4 10.82.142.23: result = delete_element(arr, index) 10.82.142.23: print(result) 10.82.142.23: ``` 10.82.142.23: 10.82.142.23: This script checks if the array is empty and if the index is out of bounds. If any of these conditions are true, it displays an appropriate error message and returns the original array. If the index is valid, the script shifts the remaining elements to fill the empty space left by the deleted element and then removes the last element. Finally, it returns the updated array. 10.82.142.23: 10.82.142.23: The time complexity of this script is O(n), where n is the length of the array, because shifting the remaining elements requires iterating through the array once. The space complexity is O(1) because the script only uses a constant amount of additional memory to store the index and temporary variables.<|endoftext|> 10.82.142.17: training example: 10.82.142.17: input_ids: 10.82.142.17: [5501, 3270, 264, 5316, 311, 3698, 458, 2392, 504, 458, 1334, 2661, 1181, 1922, 13, 576, 1922, 1265, 387, 23933, 311, 387, 2878, 279, 14262, 315, 279, 1334, 1573, 16380, 279, 36066, 13, 1416, 279, 1922, 374, 700, 315, 14262, 11, 458, 8311, 1465, 1943, 1265, 387, 12596, 13, 576, 9664, 5424, 304, 279, 1334, 1265, 387, 28973, 311, 5155, 279, 4287, 3550, 2115, 553, 279, 11062, 2392, 382, 22043, 1334, 25, 508, 16, 11, 220, 17, 11, 220, 18, 11, 220, 19, 11, 220, 20, 11, 220, 21, 11, 220, 22, 11, 220, 23, 11, 220, 24, 11, 220, 16, 15, 921, 1552, 315, 2392, 311, 3698, 25, 220, 19, 271, 18896, 2550, 25, 508, 16, 11, 220, 17, 11, 220, 18, 11, 220, 19, 11, 220, 21, 11, 220, 22, 11, 220, 23, 11, 220, 24, 11, 220, 16, 15, 2533, 29019, 87473, 510, 12, 576, 5316, 1265, 3705, 2176, 6785, 323, 8225, 14937, 624, 12, 576, 5316, 1265, 3705, 4287, 18386, 624, 12, 576, 5316, 1265, 1083, 3705, 279, 1142, 1380, 279, 1922, 374, 264, 19057, 16574, 1372, 624, 12, 576, 5316, 1265, 614, 264, 882, 23094, 315, 506, 1445, 701, 1380, 308, 374, 279, 3084, 315, 279, 1334, 624, 12, 576, 5316, 1265, 614, 264, 3550, 23094, 315, 506, 7, 16, 4292, 7, 2610, 1231, 1191, 448, 2987, 10135, 476, 4510, 41094, 340, 8420, 374, 264, 5316, 429, 5599, 66167, 279, 2661, 8502, 1447, 73594, 12669, 198, 750, 3698, 7894, 10939, 11, 1922, 982, 262, 671, 4248, 421, 279, 1334, 374, 4287, 198, 262, 421, 2422, 10939, 8, 621, 220, 15, 510, 286, 1173, 445, 1454, 25, 2910, 374, 4287, 13053, 286, 470, 2890, 271, 262, 671, 4248, 421, 279, 1922, 374, 700, 315, 14262, 198, 262, 421, 1922, 366, 220, 15, 476, 1922, 2604, 2422, 10939, 982, 286, 1173, 445, 1454, 25, 8008, 374, 700, 315, 14262, 13053, 286, 470, 2890, 271, 262, 671, 26510, 279, 9664, 5424, 311, 5155, 279, 4287, 3550, 198, 262, 369, 600, 304, 2088, 7195, 11, 2422, 10939, 8, 481, 220, 16, 982, 286, 2890, 989, 60, 284, 2890, 989, 488, 220, 16, 2533, 262, 671, 10428, 279, 1537, 2392, 198, 262, 2890, 8288, 2822, 262, 470, 2890, 1406, 2, 3393, 279, 5316, 198, 1118, 284, 508, 16, 11, 220, 17, 11, 220, 18, 11, 220, 19, 11, 220, 20, 11, 220, 21, 11, 220, 22, 11, 220, 23, 11, 220, 24, 11, 220, 16, 15, 921, 1252, 284, 220, 19, 198, 1382, 284, 3698, 7894, 10939, 11, 1922, 340, 1350, 4456, 340, 13874, 19324, 1986, 5316, 12341, 421, 279, 1334, 374, 4287, 323, 421, 279, 1922, 374, 700, 315, 14262, 13, 1416, 894, 315, 1493, 4682, 525, 830, 11, 432, 18689, 458, 8311, 1465, 1943, 323, 4675, 279, 4024, 1334, 13, 1416, 279, 1922, 374, 2697, 11, 279, 5316, 28635, 279, 9664, 5424, 311, 5155, 279, 4287, 3550, 2115, 553, 279, 11062, 2392, 323, 1221, 28160, 279, 1537, 2392, 13, 17375, 11, 432, 4675, 279, 6049, 1334, 382, 785, 882, 23094, 315, 419, 5316, 374, 506, 1445, 701, 1380, 308, 374, 279, 3084, 315, 279, 1334, 11, 1576, 31831, 279, 9664, 5424, 7460, 87235, 1526, 279, 1334, 3055, 13, 576, 3550, 23094, 374, 506, 7, 16, 8, 1576, 279, 5316, 1172, 5711, 264, 6783, 3311, 315, 5107, 4938, 311, 3553, 279, 1922, 323, 13340, 7332, 13, 151643] 10.82.142.17: inputs: 10.82.142.17: Please write a script to delete an element from an array given its index. The index should be verified to be within the bounds of the array before performing the deletion. If the index is out of bounds, an appropriate error message should be displayed. The remaining elements in the array should be shifted to fill the empty space left by the deleted element. 10.82.142.17: 10.82.142.17: Given array: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] 10.82.142.17: Index of element to delete: 4 10.82.142.17: 10.82.142.17: Expected output: [1, 2, 3, 4, 6, 7, 8, 9, 10] 10.82.142.17: 10.82.142.17: Additional Constraints: 10.82.142.17: - The script should handle both positive and negative indices. 10.82.142.17: - The script should handle empty arrays. 10.82.142.17: - The script should also handle the case where the index is a floating-point number. 10.82.142.17: - The script should have a time complexity of O(n), where n is the length of the array. 10.82.142.17: - The script should have a space complexity of O(1). 10.82.142.17: (You may start with either python or pygrammar) 10.82.142.17: Here is a script that fulfills the given requirements: 10.82.142.17: 10.82.142.17: ```python 10.82.142.17: def delete_element(arr, index): 10.82.142.17: # Check if the array is empty 10.82.142.17: if len(arr) == 0: 10.82.142.17: print("Error: Array is empty.") 10.82.142.17: return arr 10.82.142.17: 10.82.142.17: # Check if the index is out of bounds 10.82.142.17: if index < 0 or index >= len(arr): 10.82.142.17: print("Error: Index is out of bounds.") 10.82.142.17: return arr 10.82.142.17: 10.82.142.17: # Shift the remaining elements to fill the empty space 10.82.142.17: for i in range(index, len(arr) - 1): 10.82.142.17: arr[i] = arr[i + 1] 10.82.142.17: 10.82.142.17: # Delete the last element 10.82.142.17: arr.pop() 10.82.142.17: 10.82.142.17: return arr 10.82.142.17: 10.82.142.17: 10.82.142.17: # Test the script 10.82.142.17: arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] 10.82.142.17: index = 4 10.82.142.17: result = delete_element(arr, index) 10.82.142.17: print(result) 10.82.142.17: ``` 10.82.142.17: 10.82.142.17: This script checks if the array is empty and if the index is out of bounds. If any of these conditions are true, it displays an appropriate error message and returns the original array. If the index is valid, the script shifts the remaining elements to fill the empty space left by the deleted element and then removes the last element. Finally, it returns the updated array. 10.82.142.17: 10.82.142.17: The time complexity of this script is O(n), where n is the length of the array, because shifting the remaining elements requires iterating through the array once. The space complexity is O(1) because the script only uses a constant amount of additional memory to store the index and temporary variables.<|endoftext|> 10.82.142.18: training example: 10.82.142.18: input_ids: 10.82.142.18: [5501, 3270, 264, 5316, 311, 3698, 458, 2392, 504, 458, 1334, 2661, 1181, 1922, 13, 576, 1922, 1265, 387, 23933, 311, 387, 2878, 279, 14262, 315, 279, 1334, 1573, 16380, 279, 36066, 13, 1416, 279, 1922, 374, 700, 315, 14262, 11, 458, 8311, 1465, 1943, 1265, 387, 12596, 13, 576, 9664, 5424, 304, 279, 1334, 1265, 387, 28973, 311, 5155, 279, 4287, 3550, 2115, 553, 279, 11062, 2392, 382, 22043, 1334, 25, 508, 16, 11, 220, 17, 11, 220, 18, 11, 220, 19, 11, 220, 20, 11, 220, 21, 11, 220, 22, 11, 220, 23, 11, 220, 24, 11, 220, 16, 15, 921, 1552, 315, 2392, 311, 3698, 25, 220, 19, 271, 18896, 2550, 25, 508, 16, 11, 220, 17, 11, 220, 18, 11, 220, 19, 11, 220, 21, 11, 220, 22, 11, 220, 23, 11, 220, 24, 11, 220, 16, 15, 2533, 29019, 87473, 510, 12, 576, 5316, 1265, 3705, 2176, 6785, 323, 8225, 14937, 624, 12, 576, 5316, 1265, 3705, 4287, 18386, 624, 12, 576, 5316, 1265, 1083, 3705, 279, 1142, 1380, 279, 1922, 374, 264, 19057, 16574, 1372, 624, 12, 576, 5316, 1265, 614, 264, 882, 23094, 315, 506, 1445, 701, 1380, 308, 374, 279, 3084, 315, 279, 1334, 624, 12, 576, 5316, 1265, 614, 264, 3550, 23094, 315, 506, 7, 16, 4292, 7, 2610, 1231, 1191, 448, 2987, 10135, 476, 4510, 41094, 340, 8420, 374, 264, 5316, 429, 5599, 66167, 279, 2661, 8502, 1447, 73594, 12669, 198, 750, 3698, 7894, 10939, 11, 1922, 982, 262, 671, 4248, 421, 279, 1334, 374, 4287, 198, 262, 421, 2422, 10939, 8, 621, 220, 15, 510, 286, 1173, 445, 1454, 25, 2910, 374, 4287, 13053, 286, 470, 2890, 271, 262, 671, 4248, 421, 279, 1922, 374, 700, 315, 14262, 198, 262, 421, 1922, 366, 220, 15, 476, 1922, 2604, 2422, 10939, 982, 286, 1173, 445, 1454, 25, 8008, 374, 700, 315, 14262, 13053, 286, 470, 2890, 271, 262, 671, 26510, 279, 9664, 5424, 311, 5155, 279, 4287, 3550, 198, 262, 369, 600, 304, 2088, 7195, 11, 2422, 10939, 8, 481, 220, 16, 982, 286, 2890, 989, 60, 284, 2890, 989, 488, 220, 16, 2533, 262, 671, 10428, 279, 1537, 2392, 198, 262, 2890, 8288, 2822, 262, 470, 2890, 1406, 2, 3393, 279, 5316, 198, 1118, 284, 508, 16, 11, 220, 17, 11, 220, 18, 11, 220, 19, 11, 220, 20, 11, 220, 21, 11, 220, 22, 11, 220, 23, 11, 220, 24, 11, 220, 16, 15, 921, 1252, 284, 220, 19, 198, 1382, 284, 3698, 7894, 10939, 11, 1922, 340, 1350, 4456, 340, 13874, 19324, 1986, 5316, 12341, 421, 279, 1334, 374, 4287, 323, 421, 279, 1922, 374, 700, 315, 14262, 13, 1416, 894, 315, 1493, 4682, 525, 830, 11, 432, 18689, 458, 8311, 1465, 1943, 323, 4675, 279, 4024, 1334, 13, 1416, 279, 1922, 374, 2697, 11, 279, 5316, 28635, 279, 9664, 5424, 311, 5155, 279, 4287, 3550, 2115, 553, 279, 11062, 2392, 323, 1221, 28160, 279, 1537, 2392, 13, 17375, 11, 432, 4675, 279, 6049, 1334, 382, 785, 882, 23094, 315, 419, 5316, 374, 506, 1445, 701, 1380, 308, 374, 279, 3084, 315, 279, 1334, 11, 1576, 31831, 279, 9664, 5424, 7460, 87235, 1526, 279, 1334, 3055, 13, 576, 3550, 23094, 374, 506, 7, 16, 8, 1576, 279, 5316, 1172, 5711, 264, 6783, 3311, 315, 5107, 4938, 311, 3553, 279, 1922, 323, 13340, 7332, 13, 151643] 10.82.142.18: inputs: 10.82.142.18: Please write a script to delete an element from an array given its index. The index should be verified to be within the bounds of the array before performing the deletion. If the index is out of bounds, an appropriate error message should be displayed. The remaining elements in the array should be shifted to fill the empty space left by the deleted element. 10.82.142.18: 10.82.142.18: Given array: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] 10.82.142.18: Index of element to delete: 4 10.82.142.18: 10.82.142.18: Expected output: [1, 2, 3, 4, 6, 7, 8, 9, 10] 10.82.142.18: 10.82.142.18: Additional Constraints: 10.82.142.18: - The script should handle both positive and negative indices. 10.82.142.18: - The script should handle empty arrays. 10.82.142.18: - The script should also handle the case where the index is a floating-point number. 10.82.142.18: - The script should have a time complexity of O(n), where n is the length of the array. 10.82.142.18: - The script should have a space complexity of O(1). 10.82.142.18: (You may start with either python or pygrammar) 10.82.142.18: Here is a script that fulfills the given requirements: 10.82.142.18: 10.82.142.18: ```python 10.82.142.18: def delete_element(arr, index): 10.82.142.18: # Check if the array is empty 10.82.142.18: if len(arr) == 0: 10.82.142.18: print("Error: Array is empty.") 10.82.142.18: return arr 10.82.142.18: 10.82.142.18: # Check if the index is out of bounds 10.82.142.18: if index < 0 or index >= len(arr): 10.82.142.18: print("Error: Index is out of bounds.") 10.82.142.18: return arr 10.82.142.18: 10.82.142.18: # Shift the remaining elements to fill the empty space 10.82.142.18: for i in range(index, len(arr) - 1): 10.82.142.18: arr[i] = arr[i + 1] 10.82.142.18: 10.82.142.18: # Delete the last element 10.82.142.18: arr.pop() 10.82.142.18: 10.82.142.18: return arr 10.82.142.18: 10.82.142.18: 10.82.142.18: # Test the script 10.82.142.18: arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] 10.82.142.18: index = 4 10.82.142.18: result = delete_element(arr, index) 10.82.142.18: print(result) 10.82.142.18: ``` 10.82.142.18: 10.82.142.18: This script checks if the array is empty and if the index is out of bounds. If any of these conditions are true, it displays an appropriate error message and returns the original array. If the index is valid, the script shifts the remaining elements to fill the empty space left by the deleted element and then removes the last element. Finally, it returns the updated array. 10.82.142.18: 10.82.142.18: The time complexity of this script is O(n), where n is the length of the array, because shifting the remaining elements requires iterating through the array once. The space complexity is O(1) because the script only uses a constant amount of additional memory to store the index and temporary variables.<|endoftext|> 10.82.142.18: [INFO|configuration_utils.py:697] 2025-06-30 13:54:26,328 >> loading configuration file /share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500/config.json 10.82.142.18: [INFO|configuration_utils.py:771] 2025-06-30 13:54:26,329 >> Model config Qwen2Config { 10.82.142.18: "_name_or_path": "/share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500", 10.82.142.18: "architectures": [ 10.82.142.18: "Qwen2ForCausalLM" 10.82.142.18: ], 10.82.142.18: "attention_dropout": 0.0, 10.82.142.18: "bos_token_id": 151643, 10.82.142.18: "eos_token_id": 151643, 10.82.142.18: "hidden_act": "silu", 10.82.142.18: "hidden_size": 5120, 10.82.142.18: "initializer_range": 0.02, 10.82.142.18: "intermediate_size": 13824, 10.82.142.18: "max_position_embeddings": 32768, 10.82.142.18: "max_window_layers": 48, 10.82.142.18: "model_type": "qwen2", 10.82.142.18: "num_attention_heads": 40, 10.82.142.18: "num_hidden_layers": 48, 10.82.142.18: "num_key_value_heads": 8, 10.82.142.18: "rms_norm_eps": 1e-06, 10.82.142.18: "rope_scaling": null, 10.82.142.18: "rope_theta": 1000000.0, 10.82.142.18: "sliding_window": 131072, 10.82.142.18: "tie_word_embeddings": false, 10.82.142.18: "torch_dtype": "float16", 10.82.142.18: "transformers_version": "4.49.0", 10.82.142.18: "use_cache": true, 10.82.142.18: "use_sliding_window": false, 10.82.142.18: "vocab_size": 153078 10.82.142.18: } 10.82.142.18: 10.82.142.17: [INFO|configuration_utils.py:697] 2025-06-30 13:54:26,339 >> loading configuration file /share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500/config.json 10.82.142.17: [INFO|configuration_utils.py:771] 2025-06-30 13:54:26,340 >> Model config Qwen2Config { 10.82.142.17: "_name_or_path": "/share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500", 10.82.142.17: "architectures": [ 10.82.142.17: "Qwen2ForCausalLM" 10.82.142.17: ], 10.82.142.17: "attention_dropout": 0.0, 10.82.142.17: "bos_token_id": 151643, 10.82.142.17: "eos_token_id": 151643, 10.82.142.17: "hidden_act": "silu", 10.82.142.17: "hidden_size": 5120, 10.82.142.17: "initializer_range": 0.02, 10.82.142.17: "intermediate_size": 13824, 10.82.142.17: "max_position_embeddings": 32768, 10.82.142.17: "max_window_layers": 48, 10.82.142.17: "model_type": "qwen2", 10.82.142.17: "num_attention_heads": 40, 10.82.142.17: "num_hidden_layers": 48, 10.82.142.17: "num_key_value_heads": 8, 10.82.142.17: "rms_norm_eps": 1e-06, 10.82.142.17: "rope_scaling": null, 10.82.142.17: "rope_theta": 1000000.0, 10.82.142.17: "sliding_window": 131072, 10.82.142.17: "tie_word_embeddings": false, 10.82.142.17: "torch_dtype": "float16", 10.82.142.17: "transformers_version": "4.49.0", 10.82.142.17: "use_cache": true, 10.82.142.17: "use_sliding_window": false, 10.82.142.17: "vocab_size": 153078 10.82.142.17: } 10.82.142.17: 10.82.142.23: [INFO|configuration_utils.py:697] 2025-06-30 13:54:26,345 >> loading configuration file /share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500/config.json 10.82.142.23: [INFO|configuration_utils.py:771] 2025-06-30 13:54:26,346 >> Model config Qwen2Config { 10.82.142.23: "_name_or_path": "/share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500", 10.82.142.23: "architectures": [ 10.82.142.23: "Qwen2ForCausalLM" 10.82.142.23: ], 10.82.142.23: "attention_dropout": 0.0, 10.82.142.23: "bos_token_id": 151643, 10.82.142.23: "eos_token_id": 151643, 10.82.142.23: "hidden_act": "silu", 10.82.142.23: "hidden_size": 5120, 10.82.142.23: "initializer_range": 0.02, 10.82.142.23: "intermediate_size": 13824, 10.82.142.23: "max_position_embeddings": 32768, 10.82.142.23: "max_window_layers": 48, 10.82.142.23: "model_type": "qwen2", 10.82.142.23: "num_attention_heads": 40, 10.82.142.23: "num_hidden_layers": 48, 10.82.142.23: "num_key_value_heads": 8, 10.82.142.23: "rms_norm_eps": 1e-06, 10.82.142.23: "rope_scaling": null, 10.82.142.23: "rope_theta": 1000000.0, 10.82.142.23: "sliding_window": 131072, 10.82.142.23: "tie_word_embeddings": false, 10.82.142.23: "torch_dtype": "float16", 10.82.142.23: "transformers_version": "4.49.0", 10.82.142.23: "use_cache": true, 10.82.142.23: "use_sliding_window": false, 10.82.142.23: "vocab_size": 153078 10.82.142.23: } 10.82.142.23: 10.82.142.18: [INFO|modeling_utils.py:3979] 2025-06-30 13:54:26,369 >> loading weights file /share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500/pytorch_model.bin.index.json 10.82.142.18: [INFO|modeling_utils.py:1633] 2025-06-30 13:54:26,372 >> Instantiating Qwen2ForCausalLM model under default dtype torch.float16. 10.82.142.18: [INFO|configuration_utils.py:1140] 2025-06-30 13:54:26,374 >> Generate config GenerationConfig { 10.82.142.18: "bos_token_id": 151643, 10.82.142.18: "eos_token_id": 151643, 10.82.142.18: "use_cache": false 10.82.142.18: } 10.82.142.18: 10.82.142.17: [INFO|modeling_utils.py:3979] 2025-06-30 13:54:26,377 >> loading weights file /share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500/pytorch_model.bin.index.json 10.82.142.17: [INFO|modeling_utils.py:1633] 2025-06-30 13:54:26,379 >> Instantiating Qwen2ForCausalLM model under default dtype torch.float16. 10.82.142.17: [INFO|configuration_utils.py:1140] 2025-06-30 13:54:26,381 >> Generate config GenerationConfig { 10.82.142.17: "bos_token_id": 151643, 10.82.142.17: "eos_token_id": 151643, 10.82.142.17: "use_cache": false 10.82.142.17: } 10.82.142.17: 10.82.142.23: [INFO|modeling_utils.py:3979] 2025-06-30 13:54:26,385 >> loading weights file /share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500/pytorch_model.bin.index.json 10.82.142.23: [INFO|modeling_utils.py:1633] 2025-06-30 13:54:26,387 >> Instantiating Qwen2ForCausalLM model under default dtype torch.float16. 10.82.142.23: [INFO|configuration_utils.py:1140] 2025-06-30 13:54:26,390 >> Generate config GenerationConfig { 10.82.142.23: "bos_token_id": 151643, 10.82.142.23: "eos_token_id": 151643, 10.82.142.23: "use_cache": false 10.82.142.23: } 10.82.142.23: 10.82.142.18: Loading checkpoint shards: 0%| | 0/3 [00:00> All model checkpoint weights were used when initializing Qwen2ForCausalLM. 10.82.142.18: 10.82.142.18: [INFO|modeling_utils.py:4978] 2025-06-30 13:57:08,367 >> All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500. 10.82.142.18: If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. 10.82.142.18: Loading checkpoint shards: 100%|██████████| 3/3 [02:41<00:00, 52.98s/it] Loading checkpoint shards: 100%|██████████| 3/3 [02:41<00:00, 53.70s/it] 10.82.142.18: [INFO|configuration_utils.py:1093] 2025-06-30 13:57:08,599 >> loading configuration file /share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500/generation_config.json 10.82.142.18: [INFO|configuration_utils.py:1140] 2025-06-30 13:57:08,600 >> Generate config GenerationConfig { 10.82.142.18: "bos_token_id": 151643, 10.82.142.18: "eos_token_id": 151643, 10.82.142.18: "max_new_tokens": 2048 10.82.142.18: } 10.82.142.18: 10.82.142.18: [INFO|2025-06-30 13:57:08] llamafactory.model.model_utils.checkpointing:157 >> Gradient checkpointing enabled. 10.82.142.18: [INFO|2025-06-30 13:57:08] llamafactory.model.model_utils.attention:157 >> Using FlashAttention-2 for faster training and inference. 10.82.142.18: [INFO|2025-06-30 13:57:08] llamafactory.model.adapter:157 >> Upcasting trainable params to float32. 10.82.142.18: [INFO|2025-06-30 13:57:08] llamafactory.model.adapter:157 >> Fine-tuning method: Full 10.82.142.18: [INFO|2025-06-30 13:57:08] llamafactory.model.loader:157 >> trainable params: 14,780,417,024 || all params: 14,780,417,024 || trainable%: 100.0000 10.82.142.18: Detected kernel version 4.18.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. 10.82.142.18: [INFO|trainer.py:746] 2025-06-30 13:57:08,678 >> Using auto half precision backend 10.82.142.18: [2025-06-30 13:57:08,922] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.14.0, git-hash=unknown, git-branch=unknown 10.82.142.17: Loading checkpoint shards: 0%| | 0/3 [00:00> All model checkpoint weights were used when initializing Qwen2ForCausalLM. 10.82.142.17: 10.82.142.17: [INFO|modeling_utils.py:4978] 2025-06-30 13:57:21,558 >> All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500. 10.82.142.17: If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. 10.82.142.17: Loading checkpoint shards: 100%|██████████| 3/3 [02:54<00:00, 57.76s/it] Loading checkpoint shards: 100%|██████████| 3/3 [02:54<00:00, 58.13s/it] 10.82.142.17: Loading checkpoint shards: 100%|██████████| 3/3 [02:54<00:00, 57.90s/it] Loading checkpoint shards: 100%|██████████| 3/3 [02:54<00:00, 58.23s/it] 10.82.142.17: [INFO|configuration_utils.py:1093] 2025-06-30 13:57:21,998 >> loading configuration file /share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500/generation_config.json 10.82.142.17: [INFO|configuration_utils.py:1140] 2025-06-30 13:57:21,998 >> Generate config GenerationConfig { 10.82.142.17: "bos_token_id": 151643, 10.82.142.17: "eos_token_id": 151643, 10.82.142.17: "max_new_tokens": 2048 10.82.142.17: } 10.82.142.17: 10.82.142.17: [INFO|2025-06-30 13:57:22] llamafactory.model.model_utils.checkpointing:157 >> Gradient checkpointing enabled. 10.82.142.17: [INFO|2025-06-30 13:57:22] llamafactory.model.model_utils.attention:157 >> Using FlashAttention-2 for faster training and inference. 10.82.142.17: [INFO|2025-06-30 13:57:22] llamafactory.model.adapter:157 >> Upcasting trainable params to float32. 10.82.142.17: [INFO|2025-06-30 13:57:22] llamafactory.model.adapter:157 >> Fine-tuning method: Full 10.82.142.17: [INFO|2025-06-30 13:57:22] llamafactory.model.loader:157 >> trainable params: 14,780,417,024 || all params: 14,780,417,024 || trainable%: 100.0000 10.82.142.17: [INFO|trainer.py:746] 2025-06-30 13:57:22,135 >> Using auto half precision backend 10.82.142.23: Loading checkpoint shards: 0%| | 0/3 [00:00> All model checkpoint weights were used when initializing Qwen2ForCausalLM. 10.82.142.23: 10.82.142.23: [INFO|modeling_utils.py:4978] 2025-06-30 13:57:23,279 >> All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500. 10.82.142.23: If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. 10.82.142.23: Loading checkpoint shards: 100%|██████████| 3/3 [02:56<00:00, 59.60s/it] Loading checkpoint shards: 100%|██████████| 3/3 [02:56<00:00, 58.76s/it] 10.82.142.23: Loading checkpoint shards: 100%|██████████| 3/3 [02:56<00:00, 59.65s/it] Loading checkpoint shards: 100%|██████████| 3/3 [02:56<00:00, 58.80s/it] 10.82.142.23: [INFO|configuration_utils.py:1093] 2025-06-30 13:57:24,042 >> loading configuration file /share/liangqingyuan/GrammarCoder14B/Ablation/HFmodels14B/checkpoints-2500/generation_config.json 10.82.142.23: [INFO|configuration_utils.py:1140] 2025-06-30 13:57:24,043 >> Generate config GenerationConfig { 10.82.142.23: "bos_token_id": 151643, 10.82.142.23: "eos_token_id": 151643, 10.82.142.23: "max_new_tokens": 2048 10.82.142.23: } 10.82.142.23: 10.82.142.23: [INFO|2025-06-30 13:57:24] llamafactory.model.model_utils.checkpointing:157 >> Gradient checkpointing enabled. 10.82.142.23: [INFO|2025-06-30 13:57:24] llamafactory.model.model_utils.attention:157 >> Using FlashAttention-2 for faster training and inference. 10.82.142.23: [INFO|2025-06-30 13:57:24] llamafactory.model.adapter:157 >> Upcasting trainable params to float32. 10.82.142.23: [INFO|2025-06-30 13:57:24] llamafactory.model.adapter:157 >> Fine-tuning method: Full 10.82.142.23: [INFO|2025-06-30 13:57:24] llamafactory.model.loader:157 >> trainable params: 14,780,417,024 || all params: 14,780,417,024 || trainable%: 100.0000 10.82.142.23: [INFO|trainer.py:746] 2025-06-30 13:57:24,198 >> Using auto half precision backend 10.82.142.18: [2025-06-30 13:57:49,151] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False 10.82.142.18: [2025-06-30 13:57:49,153] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer 10.82.142.18: [2025-06-30 13:57:49,153] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer 10.82.142.18: [2025-06-30 13:57:49,181] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = AdamW 10.82.142.18: [2025-06-30 13:57:49,181] [INFO] [utils.py:56:is_zero_supported_optimizer] Checking ZeRO support for optimizer=AdamW type= 10.82.142.18: [2025-06-30 13:57:49,181] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.float16 ZeRO stage 2 optimizer 10.82.142.18: [2025-06-30 13:57:49,181] [INFO] [stage_1_and_2.py:149:__init__] Reduce bucket size 500000000 10.82.142.18: [2025-06-30 13:57:49,181] [INFO] [stage_1_and_2.py:150:__init__] Allgather bucket size 500000000 10.82.142.18: [2025-06-30 13:57:49,181] [INFO] [stage_1_and_2.py:151:__init__] CPU Offload: False 10.82.142.18: [2025-06-30 13:57:49,181] [INFO] [stage_1_and_2.py:152:__init__] Round robin gradient partitioning: True 10.82.142.18: [2025-06-30 13:58:37,784] [INFO] [utils.py:800:see_memory_usage] Before initializing optimizer states 10.82.142.18: [2025-06-30 13:58:37,785] [INFO] [utils.py:801:see_memory_usage] MA 29.83 GB Max_MA 29.83 GB CA 29.85 GB Max_CA 30 GB 10.82.142.18: [2025-06-30 13:58:37,785] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory: used = 158.96 GB, percent = 7.9% 10.82.142.18: [2025-06-30 13:58:38,119] [INFO] [utils.py:800:see_memory_usage] After initializing optimizer states 10.82.142.18: [2025-06-30 13:58:38,120] [INFO] [utils.py:801:see_memory_usage] MA 29.83 GB Max_MA 32.12 GB CA 32.14 GB Max_CA 32 GB 10.82.142.18: [2025-06-30 13:58:38,120] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory: used = 158.96 GB, percent = 7.9% 10.82.142.18: [2025-06-30 13:58:38,120] [INFO] [stage_1_and_2.py:539:__init__] optimizer state initialized 10.82.142.18: [2025-06-30 13:58:38,346] [INFO] [utils.py:800:see_memory_usage] After initializing ZeRO optimizer 10.82.142.18: [2025-06-30 13:58:38,347] [INFO] [utils.py:801:see_memory_usage] MA 29.83 GB Max_MA 29.83 GB CA 32.14 GB Max_CA 32 GB 10.82.142.18: [2025-06-30 13:58:38,347] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory: used = 158.95 GB, percent = 7.9% 10.82.142.18: [2025-06-30 13:58:38,350] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = AdamW 10.82.142.18: [2025-06-30 13:58:38,350] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler 10.82.142.18: [2025-06-30 13:58:38,350] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = None 10.82.142.18: [2025-06-30 13:58:38,350] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] 10.82.142.18: [2025-06-30 13:58:38,351] [INFO] [config.py:996:print] DeepSpeedEngine configuration: 10.82.142.18: [2025-06-30 13:58:38,352] [INFO] [config.py:1000:print] activation_checkpointing_config { 10.82.142.18: "partition_activations": false, 10.82.142.18: "contiguous_memory_optimization": false, 10.82.142.18: "cpu_checkpointing": false, 10.82.142.18: "number_checkpoints": null, 10.82.142.18: "synchronize_checkpoint_boundary": false, 10.82.142.18: "profile": false 10.82.142.18: } 10.82.142.18: [2025-06-30 13:58:38,352] [INFO] [config.py:1000:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} 10.82.142.18: [2025-06-30 13:58:38,352] [INFO] [config.py:1000:print] amp_enabled .................. False 10.82.142.18: [2025-06-30 13:58:38,352] [INFO] [config.py:1000:print] amp_params ................... False 10.82.142.18: [2025-06-30 13:58:38,352] [INFO] [config.py:1000:print] autotuning_config ............ { 10.82.142.18: "enabled": false, 10.82.142.18: "start_step": null, 10.82.142.18: "end_step": null, 10.82.142.18: "metric_path": null, 10.82.142.18: "arg_mappings": null, 10.82.142.18: "metric": "throughput", 10.82.142.18: "model_info": null, 10.82.142.18: "results_dir": "autotuning_results", 10.82.142.18: "exps_dir": "autotuning_exps", 10.82.142.18: "overwrite": true, 10.82.142.18: "fast": true, 10.82.142.18: "start_profile_step": 3, 10.82.142.18: "end_profile_step": 5, 10.82.142.18: "tuner_type": "gridsearch", 10.82.142.18: "tuner_early_stopping": 5, 10.82.142.18: "tuner_num_trials": 50, 10.82.142.18: "model_info_path": null, 10.82.142.18: "mp_size": 1, 10.82.142.18: "max_train_batch_size": null, 10.82.142.18: "min_train_batch_size": 1, 10.82.142.18: "max_train_micro_batch_size_per_gpu": 1.024000e+03, 10.82.142.18: "min_train_micro_batch_size_per_gpu": 1, 10.82.142.18: "num_tuning_micro_batch_sizes": 3 10.82.142.18: } 10.82.142.18: [2025-06-30 13:58:38,352] [INFO] [config.py:1000:print] bfloat16_enabled ............. False 10.82.142.18: [2025-06-30 13:58:38,352] [INFO] [config.py:1000:print] bfloat16_immediate_grad_update False 10.82.142.18: [2025-06-30 13:58:38,352] [INFO] [config.py:1000:print] checkpoint_parallel_write_pipeline False 10.82.142.18: [2025-06-30 13:58:38,352] [INFO] [config.py:1000:print] checkpoint_tag_validation_enabled True 10.82.142.18: [2025-06-30 13:58:38,352] [INFO] [config.py:1000:print] checkpoint_tag_validation_fail False 10.82.142.18: [2025-06-30 13:58:38,352] [INFO] [config.py:1000:print] comms_config ................. 10.82.142.18: [2025-06-30 13:58:38,352] [INFO] [config.py:1000:print] communication_data_type ...... None 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] compile_config ............... enabled=False backend='inductor' kwargs={} 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] curriculum_enabled_legacy .... False 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] curriculum_params_legacy ..... False 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] data_efficiency_enabled ...... False 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] dataloader_drop_last ......... False 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] disable_allgather ............ False 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] dump_state ................... False 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 1000, 'delayed_shift': 2, 'consecutive_hysteresis': False, 'min_scale': 1} 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] eigenvalue_enabled ........... False 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] eigenvalue_gas_boundary_resolution 1 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] eigenvalue_layer_name ........ bert.encoder.layer 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] eigenvalue_layer_num ......... 0 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] eigenvalue_max_iter .......... 100 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] eigenvalue_stability ......... 1e-06 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] eigenvalue_tol ............... 0.01 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] eigenvalue_verbose ........... False 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] elasticity_enabled ........... False 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] flops_profiler_config ........ { 10.82.142.18: "enabled": false, 10.82.142.18: "recompute_fwd_factor": 0.0, 10.82.142.18: "profile_step": 1, 10.82.142.18: "module_depth": -1, 10.82.142.18: "top_modules": 1, 10.82.142.18: "detailed": true, 10.82.142.18: "output_file": null 10.82.142.18: } 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] fp16_auto_cast ............... False 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] fp16_enabled ................. True 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] fp16_master_weights_and_gradients False 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] global_rank .................. 0 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] grad_accum_dtype ............. None 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] gradient_accumulation_steps .. 10 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] gradient_clipping ............ 1.0 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] gradient_predivide_factor .... 1.0 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] graph_harvesting ............. False 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] initial_dynamic_scale ........ 65536 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] load_universal_checkpoint .... False 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] loss_scale ................... 0 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] memory_breakdown ............. False 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] mics_hierarchial_params_gather False 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] mics_shard_size .............. -1 10.82.142.18: [2025-06-30 13:58:38,353] [INFO] [config.py:1000:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False 10.82.142.18: [2025-06-30 13:58:38,354] [INFO] [config.py:1000:print] nebula_config ................ { 10.82.142.18: "enabled": false, 10.82.142.18: "persistent_storage_path": null, 10.82.142.18: "persistent_time_interval": 100, 10.82.142.18: "num_of_version_in_retention": 2, 10.82.142.18: "enable_nebula_load": true, 10.82.142.18: "load_path": null 10.82.142.18: } 10.82.142.18: [2025-06-30 13:58:38,354] [INFO] [config.py:1000:print] optimizer_legacy_fusion ...... False 10.82.142.18: [2025-06-30 13:58:38,354] [INFO] [config.py:1000:print] optimizer_name ............... None 10.82.142.18: [2025-06-30 13:58:38,354] [INFO] [config.py:1000:print] optimizer_params ............. None 10.82.142.18: [2025-06-30 13:58:38,354] [INFO] [config.py:1000:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True} 10.82.142.18: [2025-06-30 13:58:38,354] [INFO] [config.py:1000:print] pld_enabled .................. False 10.82.142.18: [2025-06-30 13:58:38,354] [INFO] [config.py:1000:print] pld_params ................... False 10.82.142.18: [2025-06-30 13:58:38,354] [INFO] [config.py:1000:print] prescale_gradients ........... False 10.82.142.18: [2025-06-30 13:58:38,354] [INFO] [config.py:1000:print] scheduler_name ............... None 10.82.142.18: [2025-06-30 13:58:38,354] [INFO] [config.py:1000:print] scheduler_params ............. None 10.82.142.18: [2025-06-30 13:58:38,354] [INFO] [config.py:1000:print] seq_parallel_communication_data_type torch.float32 10.82.142.18: [2025-06-30 13:58:38,354] [INFO] [config.py:1000:print] sparse_attention ............. None 10.82.142.18: [2025-06-30 13:58:38,354] [INFO] [config.py:1000:print] sparse_gradients_enabled ..... False 10.82.142.18: [2025-06-30 13:58:38,354] [INFO] [config.py:1000:print] steps_per_print .............. inf 10.82.142.18: [2025-06-30 13:58:38,354] [INFO] [config.py:1000:print] train_batch_size ............. 240 10.82.142.18: [2025-06-30 13:58:38,354] [INFO] [config.py:1000:print] train_micro_batch_size_per_gpu 1 10.82.142.18: [2025-06-30 13:58:38,354] [INFO] [config.py:1000:print] use_data_before_expert_parallel_ False 10.82.142.18: [2025-06-30 13:58:38,354] [INFO] [config.py:1000:print] use_node_local_storage ....... False 10.82.142.18: [2025-06-30 13:58:38,354] [INFO] [config.py:1000:print] wall_clock_breakdown ......... False 10.82.142.18: [2025-06-30 13:58:38,354] [INFO] [config.py:1000:print] weight_quantization_config ... None 10.82.142.18: [2025-06-30 13:58:38,354] [INFO] [config.py:1000:print] world_size ................... 24 10.82.142.18: [2025-06-30 13:58:38,354] [INFO] [config.py:1000:print] zero_allow_untested_optimizer True 10.82.142.18: [2025-06-30 13:58:38,354] [INFO] [config.py:1000:print] zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=True zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True 10.82.142.18: [2025-06-30 13:58:38,354] [INFO] [config.py:1000:print] zero_enabled ................. True 10.82.142.18: [2025-06-30 13:58:38,354] [INFO] [config.py:1000:print] zero_force_ds_cpu_optimizer .. True 10.82.142.18: [2025-06-30 13:58:38,354] [INFO] [config.py:1000:print] zero_optimization_stage ...... 2 10.82.142.18: [2025-06-30 13:58:38,354] [INFO] [config.py:986:print_user_config] json = { 10.82.142.18: "train_batch_size": 240, 10.82.142.18: "train_micro_batch_size_per_gpu": 1, 10.82.142.18: "gradient_accumulation_steps": 10, 10.82.142.18: "gradient_clipping": 1.0, 10.82.142.18: "zero_allow_untested_optimizer": true, 10.82.142.18: "fp16": { 10.82.142.18: "enabled": true, 10.82.142.18: "loss_scale": 0, 10.82.142.18: "loss_scale_window": 1000, 10.82.142.18: "initial_scale_power": 16, 10.82.142.18: "hysteresis": 2, 10.82.142.18: "min_loss_scale": 1 10.82.142.18: }, 10.82.142.18: "bf16": { 10.82.142.18: "enabled": false 10.82.142.18: }, 10.82.142.18: "zero_optimization": { 10.82.142.18: "stage": 2, 10.82.142.18: "allgather_partitions": true, 10.82.142.18: "allgather_bucket_size": 5.000000e+08, 10.82.142.18: "overlap_comm": true, 10.82.142.18: "reduce_scatter": true, 10.82.142.18: "reduce_bucket_size": 5.000000e+08, 10.82.142.18: "contiguous_gradients": true, 10.82.142.18: "round_robin_gradients": true 10.82.142.18: }, 10.82.142.18: "steps_per_print": inf 10.82.142.18: } 10.82.142.18: [INFO|trainer.py:2405] 2025-06-30 13:58:38,356 >> ***** Running training ***** 10.82.142.18: [INFO|trainer.py:2406] 2025-06-30 13:58:38,356 >> Num examples = 4,499,718 10.82.142.18: [INFO|trainer.py:2407] 2025-06-30 13:58:38,356 >> Num Epochs = 2 10.82.142.18: [INFO|trainer.py:2408] 2025-06-30 13:58:38,356 >> Instantaneous batch size per device = 1 10.82.142.18: [INFO|trainer.py:2411] 2025-06-30 13:58:38,356 >> Total train batch size (w. parallel, distributed & accumulation) = 240 10.82.142.18: [INFO|trainer.py:2412] 2025-06-30 13:58:38,356 >> Gradient Accumulation steps = 10 10.82.142.18: [INFO|trainer.py:2413] 2025-06-30 13:58:38,356 >> Total optimization steps = 37,496 10.82.142.18: [INFO|trainer.py:2414] 2025-06-30 13:58:38,358 >> Number of trainable parameters = 14,780,417,024 10.82.142.18: [INFO|integration_utils.py:817] 2025-06-30 13:58:38,360 >> Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true" 10.82.142.18: wandb: Currently logged in as: liangqingyuan. Use `wandb login --relogin` to force relogin 10.82.142.18: wandb: wandb version 0.20.1 is available! To upgrade, please run: 10.82.142.18: wandb: $ pip install wandb --upgrade 10.82.142.18: wandb: Tracking run with wandb version 0.16.6 10.82.142.18: wandb: Run data is saved locally in /share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/wandb/run-20250630_135840-68drk2sp 10.82.142.18: wandb: Run `wandb offline` to turn off syncing. 10.82.142.18: wandb: Syncing run CPT14b_v5_I1v3 10.82.142.18: wandb: ⭐️ View project at https://wandb.ai/liangqingyuan/GrammarCoder14B 10.82.142.18: wandb: 🚀 View run at https://wandb.ai/liangqingyuan/GrammarCoder14B/runs/68drk2sp 10.82.142.17: [INFO|trainer.py:2405] 2025-06-30 13:58:49,674 >> ***** Running training ***** 10.82.142.17: [INFO|trainer.py:2406] 2025-06-30 13:58:49,674 >> Num examples = 4,499,718 10.82.142.17: [INFO|trainer.py:2407] 2025-06-30 13:58:49,674 >> Num Epochs = 2 10.82.142.17: [INFO|trainer.py:2408] 2025-06-30 13:58:49,674 >> Instantaneous batch size per device = 1 10.82.142.17: [INFO|trainer.py:2411] 2025-06-30 13:58:49,674 >> Total train batch size (w. parallel, distributed & accumulation) = 240 10.82.142.17: [INFO|trainer.py:2412] 2025-06-30 13:58:49,674 >> Gradient Accumulation steps = 10 10.82.142.17: [INFO|trainer.py:2413] 2025-06-30 13:58:49,674 >> Total optimization steps = 37,496 10.82.142.17: [INFO|trainer.py:2414] 2025-06-30 13:58:49,676 >> Number of trainable parameters = 14,780,417,024 10.82.142.23: [INFO|trainer.py:2405] 2025-06-30 13:58:57,126 >> ***** Running training ***** 10.82.142.23: [INFO|trainer.py:2406] 2025-06-30 13:58:57,126 >> Num examples = 4,499,718 10.82.142.23: [INFO|trainer.py:2407] 2025-06-30 13:58:57,126 >> Num Epochs = 2 10.82.142.23: [INFO|trainer.py:2408] 2025-06-30 13:58:57,126 >> Instantaneous batch size per device = 1 10.82.142.23: [INFO|trainer.py:2411] 2025-06-30 13:58:57,126 >> Total train batch size (w. parallel, distributed & accumulation) = 240 10.82.142.23: [INFO|trainer.py:2412] 2025-06-30 13:58:57,126 >> Gradient Accumulation steps = 10 10.82.142.23: [INFO|trainer.py:2413] 2025-06-30 13:58:57,126 >> Total optimization steps = 37,496 10.82.142.23: [INFO|trainer.py:2414] 2025-06-30 13:58:57,128 >> Number of trainable parameters = 14,780,417,024 10.82.142.18: {'loss': 0.1308, 'grad_norm': 0.398686021566391, 'learning_rate': 5e-06, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1388, 'grad_norm': 0.46305230259895325, 'learning_rate': 1e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1216, 'grad_norm': 0.3536907136440277, 'learning_rate': 1.5e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.124, 'grad_norm': 0.32838913798332214, 'learning_rate': 2e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1144, 'grad_norm': 0.3058461546897888, 'learning_rate': 2.5e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.109, 'grad_norm': 0.3052152097225189, 'learning_rate': 3e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1233, 'grad_norm': 0.3326779901981354, 'learning_rate': 3.5e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1099, 'grad_norm': 0.3231862187385559, 'learning_rate': 4e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1199, 'grad_norm': 0.32896554470062256, 'learning_rate': 4.5e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1042, 'grad_norm': 0.3078831434249878, 'learning_rate': 5e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1108, 'grad_norm': 0.2900382876396179, 'learning_rate': 5.500000000000001e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1092, 'grad_norm': 0.28514525294303894, 'learning_rate': 6e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1104, 'grad_norm': 0.30242955684661865, 'learning_rate': 6.500000000000001e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1209, 'grad_norm': 0.3053489029407501, 'learning_rate': 7e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1273, 'grad_norm': 0.32302212715148926, 'learning_rate': 7.500000000000001e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.12, 'grad_norm': 0.29552462697029114, 'learning_rate': 8e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.135, 'grad_norm': 0.33688342571258545, 'learning_rate': 8.5e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1224, 'grad_norm': 0.29682204127311707, 'learning_rate': 9e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.143, 'grad_norm': 0.33479437232017517, 'learning_rate': 9.5e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1197, 'grad_norm': 0.3345559537410736, 'learning_rate': 0.0001, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1341, 'grad_norm': 0.34470704197883606, 'learning_rate': 9.999999982431556e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1404, 'grad_norm': 0.3448582887649536, 'learning_rate': 9.999999929726225e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1399, 'grad_norm': 0.33284544944763184, 'learning_rate': 9.999999841884006e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1353, 'grad_norm': 0.3286615014076233, 'learning_rate': 9.999999718904902e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1527, 'grad_norm': 0.3405831456184387, 'learning_rate': 9.999999560788912e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1469, 'grad_norm': 0.3199956715106964, 'learning_rate': 9.999999367536037e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1536, 'grad_norm': 0.3393600881099701, 'learning_rate': 9.999999139146278e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1498, 'grad_norm': 0.3263189196586609, 'learning_rate': 9.99999887561964e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1391, 'grad_norm': 0.31700044870376587, 'learning_rate': 9.999998576956121e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.159, 'grad_norm': 0.33871933817863464, 'learning_rate': 9.999998243155724e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1543, 'grad_norm': 0.34099170565605164, 'learning_rate': 9.999997874218452e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1566, 'grad_norm': 0.33778145909309387, 'learning_rate': 9.999997470144308e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1688, 'grad_norm': 0.32202789187431335, 'learning_rate': 9.999997030933294e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1645, 'grad_norm': 0.3160647749900818, 'learning_rate': 9.999996556585412e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1764, 'grad_norm': 0.3506312072277069, 'learning_rate': 9.999996047100669e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1817, 'grad_norm': 0.3370567262172699, 'learning_rate': 9.999995502479064e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1558, 'grad_norm': 0.3320629596710205, 'learning_rate': 9.999994922720604e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1645, 'grad_norm': 0.3061317503452301, 'learning_rate': 9.999994307825292e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1795, 'grad_norm': 0.3522307276725769, 'learning_rate': 9.999993657793131e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1695, 'grad_norm': 0.33675846457481384, 'learning_rate': 9.999992972624131e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1698, 'grad_norm': 0.34461385011672974, 'learning_rate': 9.999992252318289e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1857, 'grad_norm': 0.3509563207626343, 'learning_rate': 9.999991496875616e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1855, 'grad_norm': 0.34643200039863586, 'learning_rate': 9.999990706296113e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1767, 'grad_norm': 0.33536601066589355, 'learning_rate': 9.99998988057979e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1922, 'grad_norm': 0.3312433362007141, 'learning_rate': 9.99998901972665e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1777, 'grad_norm': 0.34569883346557617, 'learning_rate': 9.999988123736699e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1669, 'grad_norm': 0.3153592646121979, 'learning_rate': 9.999987192609944e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1779, 'grad_norm': 0.3480944037437439, 'learning_rate': 9.999986226346392e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1863, 'grad_norm': 0.35337212681770325, 'learning_rate': 9.999985224946049e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1983, 'grad_norm': 0.3431400656700134, 'learning_rate': 9.999984188408922e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1838, 'grad_norm': 0.3314003348350525, 'learning_rate': 9.999983116735019e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1826, 'grad_norm': 0.3390146493911743, 'learning_rate': 9.999982009924345e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1967, 'grad_norm': 0.35979676246643066, 'learning_rate': 9.999980867976912e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1893, 'grad_norm': 0.39572709798812866, 'learning_rate': 9.999979690892725e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1905, 'grad_norm': 0.3344902992248535, 'learning_rate': 9.999978478671794e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1765, 'grad_norm': 0.32258379459381104, 'learning_rate': 9.999977231314127e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1855, 'grad_norm': 0.32133936882019043, 'learning_rate': 9.999975948819731e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1881, 'grad_norm': 0.33931997418403625, 'learning_rate': 9.999974631188618e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1937, 'grad_norm': 0.3421045243740082, 'learning_rate': 9.999973278420795e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1947, 'grad_norm': 0.337239146232605, 'learning_rate': 9.999971890516272e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.2015, 'grad_norm': 0.3348175585269928, 'learning_rate': 9.999970467475059e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1954, 'grad_norm': 0.33285388350486755, 'learning_rate': 9.999969009297165e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.2033, 'grad_norm': 0.33577030897140503, 'learning_rate': 9.999967515982604e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.2059, 'grad_norm': 0.3373951315879822, 'learning_rate': 9.999965987531382e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.191, 'grad_norm': 0.3341216742992401, 'learning_rate': 9.99996442394351e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1917, 'grad_norm': 0.3183084726333618, 'learning_rate': 9.999962825219002e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1895, 'grad_norm': 0.33498549461364746, 'learning_rate': 9.999961191357869e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.2054, 'grad_norm': 0.3271574079990387, 'learning_rate': 9.999959522360118e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.2013, 'grad_norm': 0.3310222923755646, 'learning_rate': 9.999957818225768e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.2186, 'grad_norm': 0.34026429057121277, 'learning_rate': 9.999956078954822e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.2007, 'grad_norm': 0.3307226300239563, 'learning_rate': 9.999954304547301e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.2075, 'grad_norm': 0.3343660533428192, 'learning_rate': 9.999952495003212e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.2156, 'grad_norm': 0.3603450357913971, 'learning_rate': 9.999950650322569e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.203, 'grad_norm': 0.335757315158844, 'learning_rate': 9.999948770505386e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1992, 'grad_norm': 0.3302127718925476, 'learning_rate': 9.999946855551675e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.2044, 'grad_norm': 0.3287745714187622, 'learning_rate': 9.99994490546145e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.2024, 'grad_norm': 0.31489625573158264, 'learning_rate': 9.999942920234725e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.2082, 'grad_norm': 0.3128495216369629, 'learning_rate': 9.999940899871513e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.2145, 'grad_norm': 0.31686297059059143, 'learning_rate': 9.999938844371829e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.2022, 'grad_norm': 0.3330387473106384, 'learning_rate': 9.999936753735687e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.2192, 'grad_norm': 0.34814751148223877, 'learning_rate': 9.999934627963103e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.2029, 'grad_norm': 0.3250124454498291, 'learning_rate': 9.999932467054089e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.2203, 'grad_norm': 0.3646756410598755, 'learning_rate': 9.999930271008663e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.2078, 'grad_norm': 0.3267667889595032, 'learning_rate': 9.99992803982684e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.2041, 'grad_norm': 0.32010674476623535, 'learning_rate': 9.999925773508634e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.2119, 'grad_norm': 0.3199160695075989, 'learning_rate': 9.999923472054063e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.2102, 'grad_norm': 0.3363480269908905, 'learning_rate': 9.99992113546314e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.2146, 'grad_norm': 0.3485029935836792, 'learning_rate': 9.999918763735886e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.2006, 'grad_norm': 0.3307000994682312, 'learning_rate': 9.999916356872314e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.1934, 'grad_norm': 0.32731881737709045, 'learning_rate': 9.999913914872443e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.2149, 'grad_norm': 0.3216370642185211, 'learning_rate': 9.99991143773629e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.2066, 'grad_norm': 0.3118319809436798, 'learning_rate': 9.999908925463872e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.2016, 'grad_norm': 0.3115937411785126, 'learning_rate': 9.999906378055205e-05, 'epoch': 0.0} 10.82.142.18: {'loss': 0.227, 'grad_norm': 0.31390756368637085, 'learning_rate': 9.999903795510308e-05, 'epoch': 0.01} 10.82.142.18: {'loss': 0.2139, 'grad_norm': 0.3215806484222412, 'learning_rate': 9.999901177829201e-05, 'epoch': 0.01} 10.82.142.18: {'loss': 0.2119, 'grad_norm': 0.32381314039230347, 'learning_rate': 9.9998985250119e-05, 'epoch': 0.01} 10.82.142.18: {'loss': 0.2036, 'grad_norm': 0.32022613286972046, 'learning_rate': 9.999895837058425e-05, 'epoch': 0.01} 10.82.142.18: {'loss': 0.2058, 'grad_norm': 0.3156028985977173, 'learning_rate': 9.999893113968795e-05, 'epoch': 0.01} 10.82.142.18: {'loss': 0.2064, 'grad_norm': 0.3254660665988922, 'learning_rate': 9.999890355743027e-05, 'epoch': 0.01} 10.82.142.18: {'loss': 0.2081, 'grad_norm': 0.3044165074825287, 'learning_rate': 9.999887562381143e-05, 'epoch': 0.01} 10.82.142.18: 0%| | 0/37496 [00:00> Saving model checkpoint to /share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3/checkpoint-100 10.82.142.18: [INFO|configuration_utils.py:423] 2025-06-30 18:42:09,958 >> Configuration saved in /share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3/checkpoint-100/config.json 10.82.142.18: [INFO|configuration_utils.py:909] 2025-06-30 18:42:09,962 >> Configuration saved in /share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3/checkpoint-100/generation_config.json 10.82.142.18: [INFO|modeling_utils.py:3048] 2025-06-30 18:42:28,668 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 6 checkpoint shards. You can find where each parameters has been saved in the index located at /share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3/checkpoint-100/model.safetensors.index.json. 10.82.142.18: [INFO|tokenization_utils_base.py:2500] 2025-06-30 18:42:28,686 >> tokenizer config file saved in /share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3/checkpoint-100/tokenizer_config.json 10.82.142.18: [INFO|tokenization_utils_base.py:2509] 2025-06-30 18:42:28,702 >> Special tokens file saved in /share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3/checkpoint-100/special_tokens_map.json 10.82.142.18: [2025-06-30 18:42:30,065] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step100 is about to be saved! 10.82.142.17: /root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py:1877: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. 10.82.142.17: warnings.warn( 10.82.142.23: /root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py:1877: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. 10.82.142.23: warnings.warn( 10.82.142.23: /root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py:1877: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. 10.82.142.23: warnings.warn( 10.82.142.23: /root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py:1877: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. 10.82.142.23: warnings.warn( 10.82.142.18: /root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py:1877: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. 10.82.142.18: warnings.warn( 10.82.142.18: /root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py:1877: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. 10.82.142.18: warnings.warn( 10.82.142.18: /root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py:1877: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. 10.82.142.18: warnings.warn( 10.82.142.18: /root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py:1877: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. 10.82.142.18: warnings.warn( 10.82.142.17: /root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py:1877: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. 10.82.142.17: warnings.warn( 10.82.142.17: /root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py:1877: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. 10.82.142.17: warnings.warn( 10.82.142.23: /root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py:1877: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. 10.82.142.23: warnings.warn( 10.82.142.18: /root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py:1877: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. 10.82.142.18: warnings.warn( 10.82.142.18: /root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py:1877: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. 10.82.142.18: warnings.warn( 10.82.142.23: /root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py:1877: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. 10.82.142.23: warnings.warn( 10.82.142.17: /root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py:1877: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. 10.82.142.17: warnings.warn( 10.82.142.17: /root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py:1877: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. 10.82.142.17: warnings.warn( 10.82.142.17: /root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py:1877: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. 10.82.142.17: warnings.warn( 10.82.142.18: /root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py:1877: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. 10.82.142.18: warnings.warn( 10.82.142.23: /root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py:1877: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. 10.82.142.23: warnings.warn( 10.82.142.17: /root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py:1877: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. 10.82.142.17: warnings.warn( 10.82.142.17: /root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py:1877: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. 10.82.142.23: /root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py:1877: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. 10.82.142.17: warnings.warn( 10.82.142.23: warnings.warn( 10.82.142.18: /root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py:1877: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. 10.82.142.18: warnings.warn( 10.82.142.23: /root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py:1877: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details. 10.82.142.23: warnings.warn( 10.82.142.18: [2025-06-30 18:42:30,311] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: /share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3/checkpoint-100/global_step100/mp_rank_00_model_states.pt 10.82.142.18: [2025-06-30 18:42:30,311] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3/checkpoint-100/global_step100/mp_rank_00_model_states.pt... 10.82.142.18: [2025-06-30 18:42:56,534] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3/checkpoint-100/global_step100/mp_rank_00_model_states.pt. 10.82.142.17: [2025-06-30 18:42:56,786] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3/checkpoint-100/global_step100/zero_pp_rank_16_mp_rank_00_optim_states.pt... 10.82.142.18: [2025-06-30 18:42:56,783] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3/checkpoint-100/global_step100/zero_pp_rank_0_mp_rank_00_optim_states.pt... 10.82.142.23: [2025-06-30 18:42:56,789] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3/checkpoint-100/global_step100/zero_pp_rank_8_mp_rank_00_optim_states.pt... 10.82.142.17: [2025-06-30 18:43:05,630] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3/checkpoint-100/global_step100/zero_pp_rank_16_mp_rank_00_optim_states.pt. 10.82.142.17: [2025-06-30 18:43:05,630] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved /share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3/checkpoint-100/global_step100/zero_pp_rank_16_mp_rank_00_optim_states.pt 10.82.142.17: [2025-06-30 18:43:05,630] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step100 is ready now! 10.82.142.23: [2025-06-30 18:43:05,866] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3/checkpoint-100/global_step100/zero_pp_rank_8_mp_rank_00_optim_states.pt. 10.82.142.23: [2025-06-30 18:43:05,866] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved /share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3/checkpoint-100/global_step100/zero_pp_rank_8_mp_rank_00_optim_states.pt 10.82.142.23: [2025-06-30 18:43:05,866] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step100 is ready now! 10.82.142.18: [2025-06-30 18:43:06,529] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3/checkpoint-100/global_step100/zero_pp_rank_0_mp_rank_00_optim_states.pt. 10.82.142.18: [2025-06-30 18:43:06,531] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved /share/liangqingyuan/GrammarCoder14B/Ablation/llama_factory/models/CPT14b_v5_I1v3/checkpoint-100/global_step100/zero_pp_rank_0_mp_rank_00_optim_states.pt 10.82.142.18: [2025-06-30 18:43:06,532] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step100 is ready now! 10.82.142.18: {'loss': 0.2299, 'grad_norm': 0.3222039043903351, 'learning_rate': 9.999884733883161e-05, 'epoch': 0.01} 10.82.142.18: {'loss': 0.2032, 'grad_norm': 0.3068801164627075, 'learning_rate': 9.999881870249103e-05, 'epoch': 0.01} 10.82.142.18: {'loss': 0.2232, 'grad_norm': 0.32019487023353577, 'learning_rate': 9.999878971478986e-05, 'epoch': 0.01} 10.82.142.18: {'loss': 0.2153, 'grad_norm': 0.3018111288547516, 'learning_rate': 9.999876037572832e-05, 'epoch': 0.01} 10.82.142.18: {'loss': 0.2017, 'grad_norm': 0.2898196876049042, 'learning_rate': 9.999873068530661e-05, 'epoch': 0.01} 10.82.142.18: {'loss': 0.2125, 'grad_norm': 0.3109876215457916, 'learning_rate': 9.999870064352497e-05, 'epoch': 0.01} 10.82.142.18: {'loss': 0.2137, 'grad_norm': 0.3126257061958313, 'learning_rate': 9.999867025038357e-05, 'epoch': 0.01} 10.82.142.18: {'loss': 0.2106, 'grad_norm': 0.303792804479599, 'learning_rate': 9.999863950588263e-05, 'epoch': 0.01} 10.82.142.18: {'loss': 0.2263, 'grad_norm': 0.35248640179634094, 'learning_rate': 9.999860841002238e-05, 'epoch': 0.01} 10.82.142.18: {'loss': 0.2182, 'grad_norm': 0.30670469999313354, 'learning_rate': 9.999857696280304e-05, 'epoch': 0.01} 10.82.142.18: {'loss': 0.2204, 'grad_norm': 0.3283405900001526, 'learning_rate': 9.999854516422483e-05, 'epoch': 0.01} 10.82.142.18: {'loss': 0.2144, 'grad_norm': 0.31946590542793274, 'learning_rate': 9.999851301428795e-05, 'epoch': 0.01} 10.82.142.18: {'loss': 0.2118, 'grad_norm': 0.3177313208580017, 'learning_rate': 9.999848051299265e-05, 'epoch': 0.01} 10.82.142.18: {'loss': 0.2245, 'grad_norm': 0.30677345395088196, 'learning_rate': 9.999844766033916e-05, 'epoch': 0.01} 10.82.142.18: {'loss': 0.204, 'grad_norm': 0.32525700330734253, 'learning_rate': 9.99984144563277e-05, 'epoch': 0.01} 10.82.142.18: {'loss': 0.2034, 'grad_norm': 0.31468653678894043, 'learning_rate': 9.99983809009585e-05, 'epoch': 0.01} 10.82.142.18: {'loss': 0.2103, 'grad_norm': 0.3079475462436676, 'learning_rate': 9.999834699423181e-05, 'epoch': 0.01} 10.82.142.18: {'loss': 0.2208, 'grad_norm': 0.3062748610973358, 'learning_rate': 9.999831273614786e-05, 'epoch': 0.01} 10.82.142.18: {'loss': 0.2195, 'grad_norm': 0.3155671954154968, 'learning_rate': 9.99982781267069e-05, 'epoch': 0.01} 10.82.142.18: {'loss': 0.2184, 'grad_norm': 0.3279094099998474, 'learning_rate': 9.999824316590916e-05, 'epoch': 0.01} 10.82.142.18: {'loss': 0.2261, 'grad_norm': 0.30267956852912903, 'learning_rate': 9.999820785375487e-05, 'epoch': 0.01} 10.82.142.18: {'loss': 0.2195, 'grad_norm': 0.30401918292045593, 'learning_rate': 9.999817219024432e-05, 'epoch': 0.01} 10.82.142.18: {'loss': 0.2303, 'grad_norm': 0.3452170789241791, 'learning_rate': 9.999813617537772e-05, 'epoch': 0.01} 10.82.142.18: {'loss': 0.2281, 'grad_norm': 0.3254702091217041, 'learning_rate': 9.999809980915537e-05, 'epoch': 0.01} 10.82.142.18: {'loss': 0.212, 'grad_norm': 0.2880006432533264, 'learning_rate': 9.999806309157748e-05, 'epoch': 0.01} 10.82.142.18: {'loss': 0.2291, 'grad_norm': 0.3074350357055664, 'learning_rate': 9.999802602264433e-05, 'epoch': 0.01}