YWZBrandon commited on May 12, 2025

Commit

94686ca

verified ·

1 Parent(s): e20eb9b

End of training

Browse files

Files changed (19) hide show

20250511_092138.log +39 -0
20250511_092208.log +163 -0
20250511_101930.log +163 -0
20250511_102227.log +163 -0
20250511_110511.log +237 -0
20250511_110815.log +83 -0
20250511_111054.log +85 -0
20250511_111333.log +0 -0
20250511_121707.log +0 -0
20250511_225651.log +0 -0
README.md +58 -0
added_tokens.json +102 -0
config.json +32 -0
generation_config.json +7 -0
model.safetensors +3 -0
special_tokens_map.json +125 -0
spiece.model +3 -0
tokenizer_config.json +942 -0
training_args.bin +3 -0

20250511_092138.log ADDED Viewed

	@@ -0,0 +1,39 @@

+[2025-05-11 09:21:38] Created output directory: train_results/google_flan-t5-large_full_upsample3000
+[2025-05-11 09:21:38] Chat mode disabled
+[2025-05-11 09:21:38] Model size is 3B or smaller (0 B). Using full fine-tuning.
+[2025-05-11 09:21:38] Adjusted parameters for t5 model:
+[2025-05-11 09:21:38]   - LEARNING_RATE: 1e-4
+[2025-05-11 09:21:38]   - BATCH_SIZE: 32
+[2025-05-11 09:21:38]   - GRADIENT_ACCUMULATION_STEPS: 1
+[2025-05-11 09:21:38] No QA format data will be used
+[2025-05-11 09:21:38] =======================================
+[2025-05-11 09:21:38] Starting training for model: google/flan-t5-large
+[2025-05-11 09:21:38] =======================================
+[2025-05-11 09:21:38] CUDA_VISIBLE_DEVICES: 0,1
+[2025-05-11 09:21:38] WANDB_PROJECT: wikidyk-ar
+[2025-05-11 09:21:38] DATA_PATH: data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json
+[2025-05-11 09:21:38] Global Batch Size: 64
+[2025-05-11 09:21:38] Data Size: -1
+[2025-05-11 09:21:38] Executing command: torchrun --nproc_per_node "2" --master-port 29502 src/train.py     --model_name_or_path "google/flan-t5-large"     --data_path "data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json"     --output_dir "train_results/google_flan-t5-large_full_upsample3000"     --num_upsample "3000"     --per_device_train_batch_size "32"     --gradient_accumulation_steps "1"     --learning_rate "1e-4"     --num_train_epochs "1"     --model_max_length "32768"     --report_to wandb --logging_steps 50 --save_steps 10000 --save_total_limit 3 <<<<<<< Updated upstream:scripts/train_full_flan_t5_large_us3000.sh
+=======
+    --resume_from_checkpoint True >>>>>>> Stashed changes:scripts/train_full_flan_t5_large_us_3000.sh
+    --bf16 True --use_flash_attention_2 True     --qa_data_ratio "-1"     --predict_mask "false"
+[2025-05-11 09:21:38] Training started at Sun May 11 09:21:38 UTC 2025
+scripts/train_full_flan_t5_large_us3000.sh: eval: line 272: syntax error near unexpected token `<<<'
+scripts/train_full_flan_t5_large_us3000.sh: eval: line 272: `torchrun --nproc_per_node "2" --master-port 29502 src/train.py --model_name_or_path "google/flan-t5-large" --data_path "data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json" --output_dir "train_results/google_flan-t5-large_full_upsample3000" --num_upsample "3000" --per_device_train_batch_size "32" --gradient_accumulation_steps "1" --learning_rate "1e-4" --num_train_epochs "1" --model_max_length "32768" --report_to wandb --logging_steps 50 --save_steps 10000 --save_total_limit 3 <<<<<<< Updated upstream:scripts/train_full_flan_t5_large_us3000.sh ======= --resume_from_checkpoint True >>>>>>> Stashed changes:scripts/train_full_flan_t5_large_us_3000.sh --bf16 True --use_flash_attention_2 True --qa_data_ratio "-1" --predict_mask "false"'
+[2025-05-11 09:21:38] ERROR: Training failed for google/flan-t5-large with exit code 2
+[2025-05-11 09:21:38] ERROR: Training failed for google/flan-t5-large with exit code 2
+[2025-05-11 09:21:38] Check error log for details: train_results/google_flan-t5-large_full_upsample3000/20250511_092138.log
+[2025-05-11 09:21:38] Resource usage after training google/flan-t5-large:
+[2025-05-11 09:21:38] GPU memory usage:
+1 MiB, 81920 MiB
+1 MiB, 81920 MiB
+1 MiB, 81920 MiB
+1 MiB, 81920 MiB
+[2025-05-11 09:21:38] Disk space usage for model outputs:
+27G	train_results/google_flan-t5-large_full_upsample3000
+[2025-05-11 09:21:38]
+[2025-05-11 09:21:38] All training runs completed at Sun May 11 09:21:38 UTC 2025
+[2025-05-11 09:21:38] =======================================
+[2025-05-11 09:21:38] Summary of training runs:
+[2025-05-11 09:21:38] Model | Status | Duration | Output Size

20250511_092208.log ADDED Viewed

	@@ -0,0 +1,163 @@





















































0	0%\| \| 0/576094 [00:00<?, ?it/s]/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3980: UserWarning: `as_target_tokenizer` is deprecated and will be removed in v5 of Transformers. You can tokenize your labels by using the argument `text_target` of the regular `__call__` method (either in the same call as your input texts if you use the same keyword arguments, or in a separate call.

+[2025-05-11 09:22:08] Created output directory: train_results/google_flan-t5-large_full_upsample3000
+[2025-05-11 09:22:08] Chat mode disabled
+[2025-05-11 09:22:08] Model size is 3B or smaller (0 B). Using full fine-tuning.
+[2025-05-11 09:22:08] Adjusted parameters for t5 model:
+[2025-05-11 09:22:08]   - LEARNING_RATE: 1e-4
+[2025-05-11 09:22:08]   - BATCH_SIZE: 32
+[2025-05-11 09:22:08]   - GRADIENT_ACCUMULATION_STEPS: 1
+[2025-05-11 09:22:08] No QA format data will be used
+[2025-05-11 09:22:08] =======================================
+[2025-05-11 09:22:08] Starting training for model: google/flan-t5-large
+[2025-05-11 09:22:08] =======================================
+[2025-05-11 09:22:08] CUDA_VISIBLE_DEVICES: 0,1
+[2025-05-11 09:22:08] WANDB_PROJECT: wikidyk-ar
+[2025-05-11 09:22:08] DATA_PATH: data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json
+[2025-05-11 09:22:08] Global Batch Size: 64
+[2025-05-11 09:22:08] Data Size: -1
+[2025-05-11 09:22:08] Executing command: torchrun --nproc_per_node "2" --master-port 29502 src/train.py     --model_name_or_path "google/flan-t5-large"     --data_path "data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json"     --output_dir "train_results/google_flan-t5-large_full_upsample3000"     --num_upsample "3000"     --per_device_train_batch_size "32"     --gradient_accumulation_steps "1"     --learning_rate "1e-4"     --num_train_epochs "1"     --model_max_length "32768"     --report_to wandb --logging_steps 50 --save_steps 10000 --save_total_limit 3     --resume_from_checkpoint True     --bf16 True --use_flash_attention_2 True     --qa_data_ratio "-1"     --predict_mask "false"
+[2025-05-11 09:22:08] Training started at Sun May 11 09:22:08 UTC 2025
+W0511 09:22:09.428000 255573 site-packages/torch/distributed/run.py:792]
+W0511 09:22:09.428000 255573 site-packages/torch/distributed/run.py:792] *****************************************
+W0511 09:22:09.428000 255573 site-packages/torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
+W0511 09:22:09.428000 255573 site-packages/torch/distributed/run.py:792] *****************************************
+WARNING:root:Output directory: train_results/google_flan-t5-large_full_upsample3000
+WARNING:root:Output directory: train_results/google_flan-t5-large_full_upsample3000
+You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
+You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
+WARNING:root:Loading data...
+WARNING:root:Loading data...
+WARNING:root:Dataset initialized with all QA data:
+WARNING:root:  - 0 QA examples
+WARNING:root:  - 12290 fact examples with upsampling factor 3000
+WARNING:root:  - Total examples: 36870000
+WARNING:root:Dataset initialized with all QA data:
+WARNING:root:  - 0 QA examples
+WARNING:root:  - 12290 fact examples with upsampling factor 3000
+WARNING:root:  - Total examples: 36870000
+/root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
+  trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
+/root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
+  trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
+You are resuming training from a checkpoint trained with 4.51.1 of Transformers but your current version is 4.51.3. This is not recommended and could yield to errors or unwanted behaviors.
+You are resuming training from a checkpoint trained with 4.51.1 of Transformers but your current version is 4.51.3. This is not recommended and could yield to errors or unwanted behaviors.
+There were missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'].
+There were missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'].
+wandb: WARNING The `run_name` is currently set to the same value as `TrainingArguments.output_dir`. If this was not intended, please specify a different run name by setting the `TrainingArguments.run_name` parameter.
+wandb: Currently logged in as: yuweiz to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
+wandb: Tracking run with wandb version 0.19.11
+wandb: Run data is saved locally in /root/yuwei/WikiDYKEvalV2/wandb/run-20250511_092259-8c7r30qb
+wandb: Run `wandb offline` to turn off syncing.
+wandb: Syncing run train_results/google_flan-t5-large_full_upsample3000
+wandb: ⭐️ View project at https://wandb.ai/yuweiz/wikidyk-ar
+wandb: 🚀 View run at https://wandb.ai/yuweiz/wikidyk-ar/runs/8c7r30qb
  0%|          | 0/576094 [00:00<?, ?it/s]/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3980: UserWarning: `as_target_tokenizer` is deprecated and will be removed in v5 of Transformers. You can tokenize your labels by using the argument `text_target` of the regular `__call__` method (either in the same call as your input texts if you use the same keyword arguments, or in a separate call.
+  warnings.warn(
+[rank1]: Traceback (most recent call last):
+[rank1]:   File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 134, in <module>
+[rank1]:     train()
+[rank1]:   File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 122, in train
+[rank1]:     trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
+[rank1]:   File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 2245, in train
+[rank1]:     return inner_training_loop(
+[rank1]:   File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 2534, in _inner_training_loop
+[rank1]:     self._load_rng_state(resume_from_checkpoint)
+[rank1]:   File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 3130, in _load_rng_state
+[rank1]:     checkpoint_rng_state = torch.load(rng_file, weights_only=True)
+[rank1]:   File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/serialization.py", line 1470, in load
+[rank1]:     raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+[rank1]: _pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
+[rank1]: 	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+[rank1]: 	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+[rank1]: 	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray._reconstruct was not an allowed global by default. Please use `torch.serialization.add_safe_globals([_reconstruct])` or the `torch.serialization.safe_globals([_reconstruct])` context manager to allowlist this global if you trust this class/function.
+[rank1]: Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3980: UserWarning: `as_target_tokenizer` is deprecated and will be removed in v5 of Transformers. You can tokenize your labels by using the argument `text_target` of the regular `__call__` method (either in the same call as your input texts if you use the same keyword arguments, or in a separate call.
+  warnings.warn(
+Traceback (most recent call last):
+  File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 134, in <module>
+    train()
+  File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 122, in train
+    trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 2245, in train
+    return inner_training_loop(
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 2534, in _inner_training_loop
+    self._load_rng_state(resume_from_checkpoint)
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 3130, in _load_rng_state
+    checkpoint_rng_state = torch.load(rng_file, weights_only=True)
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray._reconstruct was not an allowed global by default. Please use `torch.serialization.add_safe_globals([_reconstruct])` or the `torch.serialization.safe_globals([_reconstruct])` context manager to allowlist this global if you trust this class/function.
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[rank0]: Traceback (most recent call last):
+[rank0]:   File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 134, in <module>
+[rank0]:     train()
+[rank0]:   File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 122, in train
+[rank0]:     trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
+[rank0]:   File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 2245, in train
+[rank0]:     return inner_training_loop(
+[rank0]:   File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 2534, in _inner_training_loop
+[rank0]:     self._load_rng_state(resume_from_checkpoint)
+[rank0]:   File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 3130, in _load_rng_state
+[rank0]:     checkpoint_rng_state = torch.load(rng_file, weights_only=True)
+[rank0]:   File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/serialization.py", line 1470, in load
+[rank0]:     raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+[rank0]: _pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
+[rank0]: 	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+[rank0]: 	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+[rank0]: 	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray._reconstruct was not an allowed global by default. Please use `torch.serialization.add_safe_globals([_reconstruct])` or the `torch.serialization.safe_globals([_reconstruct])` context manager to allowlist this global if you trust this class/function.
+[rank0]: Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[1;34mwandb[0m:
+[1;34mwandb[0m: 🚀 View run [33mtrain_results/google_flan-t5-large_full_upsample3000[0m at: [34mhttps://wandb.ai/yuweiz/wikidyk-ar/runs/8c7r30qb[0m
+[1;34mwandb[0m: Find logs at: [1;35mwandb/run-20250511_092259-8c7r30qb/logs[0m
+W0511 09:23:12.826000 255573 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 255641 closing signal SIGTERM
+E0511 09:23:13.341000 255573 site-packages/torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 1 (pid: 255642) of binary: /root/miniconda3/envs/wikidyk/bin/python
+Traceback (most recent call last):
+  File "/root/miniconda3/envs/wikidyk/bin/torchrun", line 8, in <module>
+    sys.exit(main())
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
+    return f(*args, **kwargs)
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/run.py", line 918, in main
+    run(args)
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/run.py", line 909, in run
+    elastic_launch(
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 138, in __call__
+    return launch_agent(self._config, self._entrypoint, list(args))
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
+    raise ChildFailedError(
+torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
+============================================================
+src/train.py FAILED
+------------------------------------------------------------
+Failures:
+  <NO_OTHER_FAILURES>
+------------------------------------------------------------
+Root Cause (first observed failure):
+[0]:
+  time      : 2025-05-11_09:23:12
+  host      : bb9aa167977b
+  rank      : 1 (local_rank: 1)
+  exitcode  : 1 (pid: 255642)
+  error_file: <N/A>
+  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
+============================================================
+[2025-05-11 09:23:13] ERROR: Training failed for google/flan-t5-large with exit code 1
+[2025-05-11 09:23:13] ERROR: Training failed for google/flan-t5-large with exit code 1
+[2025-05-11 09:23:13] Check error log for details: train_results/google_flan-t5-large_full_upsample3000/20250511_092208.log
+[2025-05-11 09:23:13] Resource usage after training google/flan-t5-large:
+[2025-05-11 09:23:13] GPU memory usage:
+1 MiB, 81920 MiB
+1 MiB, 81920 MiB
+1 MiB, 81920 MiB
+1 MiB, 81920 MiB
+[2025-05-11 09:23:14] Disk space usage for model outputs:
+27G	train_results/google_flan-t5-large_full_upsample3000
+[2025-05-11 09:23:14]
+[2025-05-11 09:23:14] All training runs completed at Sun May 11 09:23:14 UTC 2025
+[2025-05-11 09:23:14] =======================================
+[2025-05-11 09:23:14] Summary of training runs:
+[2025-05-11 09:23:14] Model | Status | Duration | Output Size

20250511_101930.log ADDED Viewed

	@@ -0,0 +1,163 @@





















































0	0%\| \| 0/576094 [00:00<?, ?it/s]/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3980: UserWarning: `as_target_tokenizer` is deprecated and will be removed in v5 of Transformers. You can tokenize your labels by using the argument `text_target` of the regular `__call__` method (either in the same call as your input texts if you use the same keyword arguments, or in a separate call.

+[2025-05-11 10:19:30] Created output directory: train_results/google_flan-t5-large_full_upsample3000
+[2025-05-11 10:19:30] Chat mode disabled
+[2025-05-11 10:19:30] Model size is 3B or smaller (0 B). Using full fine-tuning.
+[2025-05-11 10:19:30] Adjusted parameters for t5 model:
+[2025-05-11 10:19:30]   - LEARNING_RATE: 1e-4
+[2025-05-11 10:19:30]   - BATCH_SIZE: 32
+[2025-05-11 10:19:30]   - GRADIENT_ACCUMULATION_STEPS: 1
+[2025-05-11 10:19:30] No QA format data will be used
+[2025-05-11 10:19:30] =======================================
+[2025-05-11 10:19:30] Starting training for model: google/flan-t5-large
+[2025-05-11 10:19:30] =======================================
+[2025-05-11 10:19:30] CUDA_VISIBLE_DEVICES: 0,1
+[2025-05-11 10:19:30] WANDB_PROJECT: wikidyk-ar
+[2025-05-11 10:19:30] DATA_PATH: data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json
+[2025-05-11 10:19:30] Global Batch Size: 64
+[2025-05-11 10:19:30] Data Size: -1
+[2025-05-11 10:19:30] Executing command: torchrun --nproc_per_node "2" --master-port 29502 src/train.py     --model_name_or_path "google/flan-t5-large"     --data_path "data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json"     --output_dir "train_results/google_flan-t5-large_full_upsample3000"     --num_upsample "3000"     --per_device_train_batch_size "32"     --gradient_accumulation_steps "1"     --learning_rate "1e-4"     --num_train_epochs "1"     --model_max_length "32768"     --report_to wandb --logging_steps 50 --save_steps 10000 --save_total_limit 3     --resume_from_checkpoint True     --bf16 True --use_flash_attention_2 True     --qa_data_ratio "-1"     --predict_mask "false"
+[2025-05-11 10:19:30] Training started at Sun May 11 10:19:30 UTC 2025
+W0511 10:19:31.717000 266021 site-packages/torch/distributed/run.py:792]
+W0511 10:19:31.717000 266021 site-packages/torch/distributed/run.py:792] *****************************************
+W0511 10:19:31.717000 266021 site-packages/torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
+W0511 10:19:31.717000 266021 site-packages/torch/distributed/run.py:792] *****************************************
+WARNING:root:Output directory: train_results/google_flan-t5-large_full_upsample3000
+WARNING:root:Output directory: train_results/google_flan-t5-large_full_upsample3000
+You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
+WARNING:root:Loading data...
+You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
+WARNING:root:Loading data...
+WARNING:root:Dataset initialized with all QA data:
+WARNING:root:  - 0 QA examples
+WARNING:root:  - 12290 fact examples with upsampling factor 3000
+WARNING:root:  - Total examples: 36870000
+WARNING:root:Dataset initialized with all QA data:
+WARNING:root:  - 0 QA examples
+WARNING:root:  - 12290 fact examples with upsampling factor 3000
+WARNING:root:  - Total examples: 36870000
+/root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
+  trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
+/root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
+  trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
+You are resuming training from a checkpoint trained with 4.51.1 of Transformers but your current version is 4.51.3. This is not recommended and could yield to errors or unwanted behaviors.
+There were missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'].
+You are resuming training from a checkpoint trained with 4.51.1 of Transformers but your current version is 4.51.3. This is not recommended and could yield to errors or unwanted behaviors.
+There were missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'].
+wandb: WARNING The `run_name` is currently set to the same value as `TrainingArguments.output_dir`. If this was not intended, please specify a different run name by setting the `TrainingArguments.run_name` parameter.
+wandb: Currently logged in as: yuweiz to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
+wandb: Tracking run with wandb version 0.19.11
+wandb: Run data is saved locally in /root/yuwei/WikiDYKEvalV2/wandb/run-20250511_102023-3mrocyhv
+wandb: Run `wandb offline` to turn off syncing.
+wandb: Syncing run train_results/google_flan-t5-large_full_upsample3000
+wandb: ⭐️ View project at https://wandb.ai/yuweiz/wikidyk-ar
+wandb: 🚀 View run at https://wandb.ai/yuweiz/wikidyk-ar/runs/3mrocyhv
  0%|          | 0/576094 [00:00<?, ?it/s]/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3980: UserWarning: `as_target_tokenizer` is deprecated and will be removed in v5 of Transformers. You can tokenize your labels by using the argument `text_target` of the regular `__call__` method (either in the same call as your input texts if you use the same keyword arguments, or in a separate call.
+  warnings.warn(
+[rank1]: Traceback (most recent call last):
+[rank1]:   File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 134, in <module>
+[rank1]:     train()
+[rank1]:   File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 122, in train
+[rank1]:     trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
+[rank1]:   File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 2245, in train
+[rank1]:     return inner_training_loop(
+[rank1]:   File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 2534, in _inner_training_loop
+[rank1]:     self._load_rng_state(resume_from_checkpoint)
+[rank1]:   File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 3130, in _load_rng_state
+[rank1]:     checkpoint_rng_state = torch.load(rng_file, weights_only=True)
+[rank1]:   File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/serialization.py", line 1470, in load
+[rank1]:     raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+[rank1]: _pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
+[rank1]: 	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+[rank1]: 	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+[rank1]: 	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray._reconstruct was not an allowed global by default. Please use `torch.serialization.add_safe_globals([_reconstruct])` or the `torch.serialization.safe_globals([_reconstruct])` context manager to allowlist this global if you trust this class/function.
+[rank1]: Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3980: UserWarning: `as_target_tokenizer` is deprecated and will be removed in v5 of Transformers. You can tokenize your labels by using the argument `text_target` of the regular `__call__` method (either in the same call as your input texts if you use the same keyword arguments, or in a separate call.
+  warnings.warn(
+Traceback (most recent call last):
+  File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 134, in <module>
+    train()
+  File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 122, in train
+    trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 2245, in train
+    return inner_training_loop(
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 2534, in _inner_training_loop
+    self._load_rng_state(resume_from_checkpoint)
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 3130, in _load_rng_state
+    checkpoint_rng_state = torch.load(rng_file, weights_only=True)
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray._reconstruct was not an allowed global by default. Please use `torch.serialization.add_safe_globals([_reconstruct])` or the `torch.serialization.safe_globals([_reconstruct])` context manager to allowlist this global if you trust this class/function.
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[rank0]: Traceback (most recent call last):
+[rank0]:   File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 134, in <module>
+[rank0]:     train()
+[rank0]:   File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 122, in train
+[rank0]:     trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
+[rank0]:   File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 2245, in train
+[rank0]:     return inner_training_loop(
+[rank0]:   File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 2534, in _inner_training_loop
+[rank0]:     self._load_rng_state(resume_from_checkpoint)
+[rank0]:   File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 3130, in _load_rng_state
+[rank0]:     checkpoint_rng_state = torch.load(rng_file, weights_only=True)
+[rank0]:   File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/serialization.py", line 1470, in load
+[rank0]:     raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+[rank0]: _pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
+[rank0]: 	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+[rank0]: 	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+[rank0]: 	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray._reconstruct was not an allowed global by default. Please use `torch.serialization.add_safe_globals([_reconstruct])` or the `torch.serialization.safe_globals([_reconstruct])` context manager to allowlist this global if you trust this class/function.
+[rank0]: Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[1;34mwandb[0m:
+[1;34mwandb[0m: 🚀 View run [33mtrain_results/google_flan-t5-large_full_upsample3000[0m at: [34mhttps://wandb.ai/yuweiz/wikidyk-ar/runs/3mrocyhv[0m
+[1;34mwandb[0m: Find logs at: [1;35mwandb/run-20250511_102023-3mrocyhv/logs[0m
+W0511 10:20:35.125000 266021 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 266087 closing signal SIGTERM
+E0511 10:20:36.241000 266021 site-packages/torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 0 (pid: 266086) of binary: /root/miniconda3/envs/wikidyk/bin/python
+Traceback (most recent call last):
+  File "/root/miniconda3/envs/wikidyk/bin/torchrun", line 8, in <module>
+    sys.exit(main())
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
+    return f(*args, **kwargs)
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/run.py", line 918, in main
+    run(args)
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/run.py", line 909, in run
+    elastic_launch(
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 138, in __call__
+    return launch_agent(self._config, self._entrypoint, list(args))
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
+    raise ChildFailedError(
+torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
+============================================================
+src/train.py FAILED
+------------------------------------------------------------
+Failures:
+  <NO_OTHER_FAILURES>
+------------------------------------------------------------
+Root Cause (first observed failure):
+[0]:
+  time      : 2025-05-11_10:20:35
+  host      : bb9aa167977b
+  rank      : 0 (local_rank: 0)
+  exitcode  : 1 (pid: 266086)
+  error_file: <N/A>
+  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
+============================================================
+[2025-05-11 10:20:36] ERROR: Training failed for google/flan-t5-large with exit code 1
+[2025-05-11 10:20:36] ERROR: Training failed for google/flan-t5-large with exit code 1
+[2025-05-11 10:20:36] Check error log for details: train_results/google_flan-t5-large_full_upsample3000/20250511_101930.log
+[2025-05-11 10:20:36] Resource usage after training google/flan-t5-large:
+[2025-05-11 10:20:36] GPU memory usage:
+1 MiB, 81920 MiB
+1 MiB, 81920 MiB
+38923 MiB, 81920 MiB
+39333 MiB, 81920 MiB
+[2025-05-11 10:20:36] Disk space usage for model outputs:
+18G	train_results/google_flan-t5-large_full_upsample3000
+[2025-05-11 10:20:36]
+[2025-05-11 10:20:36] All training runs completed at Sun May 11 10:20:36 UTC 2025
+[2025-05-11 10:20:36] =======================================
+[2025-05-11 10:20:36] Summary of training runs:
+[2025-05-11 10:20:36] Model | Status | Duration | Output Size

20250511_102227.log ADDED Viewed

	@@ -0,0 +1,163 @@





















































0	0%\| \| 0/288047 [00:00<?, ?it/s]/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3980: UserWarning: `as_target_tokenizer` is deprecated and will be removed in v5 of Transformers. You can tokenize your labels by using the argument `text_target` of the regular `__call__` method (either in the same call as your input texts if you use the same keyword arguments, or in a separate call.

+[2025-05-11 10:22:27] Created output directory: train_results/google_flan-t5-large_full_upsample3000
+[2025-05-11 10:22:27] Chat mode disabled
+[2025-05-11 10:22:27] Model size is 3B or smaller (0 B). Using full fine-tuning.
+[2025-05-11 10:22:27] Adjusted parameters for t5 model:
+[2025-05-11 10:22:27]   - LEARNING_RATE: 1e-4
+[2025-05-11 10:22:27]   - BATCH_SIZE: 32
+[2025-05-11 10:22:27]   - GRADIENT_ACCUMULATION_STEPS: 2
+[2025-05-11 10:22:27] No QA format data will be used
+[2025-05-11 10:22:27] =======================================
+[2025-05-11 10:22:27] Starting training for model: google/flan-t5-large
+[2025-05-11 10:22:27] =======================================
+[2025-05-11 10:22:27] CUDA_VISIBLE_DEVICES: 0,1
+[2025-05-11 10:22:27] WANDB_PROJECT: wikidyk-ar
+[2025-05-11 10:22:27] DATA_PATH: data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json
+[2025-05-11 10:22:27] Global Batch Size: 128
+[2025-05-11 10:22:27] Data Size: -1
+[2025-05-11 10:22:27] Executing command: torchrun --nproc_per_node "2" --master-port 29502 src/train.py     --model_name_or_path "google/flan-t5-large"     --data_path "data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json"     --output_dir "train_results/google_flan-t5-large_full_upsample3000"     --num_upsample "3000"     --per_device_train_batch_size "32"     --gradient_accumulation_steps "2"     --learning_rate "1e-4"     --num_train_epochs "1"     --model_max_length "32768"     --report_to wandb --logging_steps 50 --save_steps 10000 --save_total_limit 3     --resume_from_checkpoint True     --bf16 True --use_flash_attention_2 True     --qa_data_ratio "-1"     --predict_mask "false"
+[2025-05-11 10:22:27] Training started at Sun May 11 10:22:27 UTC 2025
+W0511 10:22:28.415000 266988 site-packages/torch/distributed/run.py:792]
+W0511 10:22:28.415000 266988 site-packages/torch/distributed/run.py:792] *****************************************
+W0511 10:22:28.415000 266988 site-packages/torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
+W0511 10:22:28.415000 266988 site-packages/torch/distributed/run.py:792] *****************************************
+WARNING:root:Output directory: train_results/google_flan-t5-large_full_upsample3000
+WARNING:root:Output directory: train_results/google_flan-t5-large_full_upsample3000
+You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
+WARNING:root:Loading data...
+You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
+WARNING:root:Loading data...
+WARNING:root:Dataset initialized with all QA data:
+WARNING:root:  - 0 QA examples
+WARNING:root:  - 12290 fact examples with upsampling factor 3000
+WARNING:root:  - Total examples: 36870000
+WARNING:root:Dataset initialized with all QA data:
+WARNING:root:  - 0 QA examples
+WARNING:root:  - 12290 fact examples with upsampling factor 3000
+WARNING:root:  - Total examples: 36870000
+/root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
+  trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
+You are resuming training from a checkpoint trained with 4.51.1 of Transformers but your current version is 4.51.3. This is not recommended and could yield to errors or unwanted behaviors.
+There were missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'].
+/root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
+  trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
+You are resuming training from a checkpoint trained with 4.51.1 of Transformers but your current version is 4.51.3. This is not recommended and could yield to errors or unwanted behaviors.
+There were missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'].
+wandb: WARNING The `run_name` is currently set to the same value as `TrainingArguments.output_dir`. If this was not intended, please specify a different run name by setting the `TrainingArguments.run_name` parameter.
+wandb: Currently logged in as: yuweiz to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
+wandb: Tracking run with wandb version 0.19.11
+wandb: Run data is saved locally in /root/yuwei/WikiDYKEvalV2/wandb/run-20250511_102316-ofl0xql6
+wandb: Run `wandb offline` to turn off syncing.
+wandb: Syncing run train_results/google_flan-t5-large_full_upsample3000
+wandb: ⭐️ View project at https://wandb.ai/yuweiz/wikidyk-ar
+wandb: 🚀 View run at https://wandb.ai/yuweiz/wikidyk-ar/runs/ofl0xql6
  0%|          | 0/288047 [00:00<?, ?it/s]/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3980: UserWarning: `as_target_tokenizer` is deprecated and will be removed in v5 of Transformers. You can tokenize your labels by using the argument `text_target` of the regular `__call__` method (either in the same call as your input texts if you use the same keyword arguments, or in a separate call.
+  warnings.warn(
+[rank1]: Traceback (most recent call last):
+[rank1]:   File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 134, in <module>
+[rank1]:     train()
+[rank1]:   File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 122, in train
+[rank1]:     trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
+[rank1]:   File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 2245, in train
+[rank1]:     return inner_training_loop(
+[rank1]:   File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 2534, in _inner_training_loop
+[rank1]:     self._load_rng_state(resume_from_checkpoint)
+[rank1]:   File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 3130, in _load_rng_state
+[rank1]:     checkpoint_rng_state = torch.load(rng_file, weights_only=True)
+[rank1]:   File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/serialization.py", line 1470, in load
+[rank1]:     raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+[rank1]: _pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
+[rank1]: 	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+[rank1]: 	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+[rank1]: 	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray._reconstruct was not an allowed global by default. Please use `torch.serialization.add_safe_globals([_reconstruct])` or the `torch.serialization.safe_globals([_reconstruct])` context manager to allowlist this global if you trust this class/function.
+[rank1]: Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3980: UserWarning: `as_target_tokenizer` is deprecated and will be removed in v5 of Transformers. You can tokenize your labels by using the argument `text_target` of the regular `__call__` method (either in the same call as your input texts if you use the same keyword arguments, or in a separate call.
+  warnings.warn(
+Traceback (most recent call last):
+  File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 134, in <module>
+    train()
+  File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 122, in train
+    trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 2245, in train
+    return inner_training_loop(
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 2534, in _inner_training_loop
+    self._load_rng_state(resume_from_checkpoint)
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 3130, in _load_rng_state
+    checkpoint_rng_state = torch.load(rng_file, weights_only=True)
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray._reconstruct was not an allowed global by default. Please use `torch.serialization.add_safe_globals([_reconstruct])` or the `torch.serialization.safe_globals([_reconstruct])` context manager to allowlist this global if you trust this class/function.
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[rank0]: Traceback (most recent call last):
+[rank0]:   File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 134, in <module>
+[rank0]:     train()
+[rank0]:   File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 122, in train
+[rank0]:     trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
+[rank0]:   File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 2245, in train
+[rank0]:     return inner_training_loop(
+[rank0]:   File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 2534, in _inner_training_loop
+[rank0]:     self._load_rng_state(resume_from_checkpoint)
+[rank0]:   File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 3130, in _load_rng_state
+[rank0]:     checkpoint_rng_state = torch.load(rng_file, weights_only=True)
+[rank0]:   File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/serialization.py", line 1470, in load
+[rank0]:     raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+[rank0]: _pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
+[rank0]: 	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+[rank0]: 	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+[rank0]: 	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray._reconstruct was not an allowed global by default. Please use `torch.serialization.add_safe_globals([_reconstruct])` or the `torch.serialization.safe_globals([_reconstruct])` context manager to allowlist this global if you trust this class/function.
+[rank0]: Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[1;34mwandb[0m:
+[1;34mwandb[0m: 🚀 View run [33mtrain_results/google_flan-t5-large_full_upsample3000[0m at: [34mhttps://wandb.ai/yuweiz/wikidyk-ar/runs/ofl0xql6[0m
+[1;34mwandb[0m: Find logs at: [1;35mwandb/run-20250511_102316-ofl0xql6/logs[0m
+W0511 10:23:29.022000 266988 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 267057 closing signal SIGTERM
+E0511 10:23:30.388000 266988 site-packages/torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 0 (pid: 267056) of binary: /root/miniconda3/envs/wikidyk/bin/python
+Traceback (most recent call last):
+  File "/root/miniconda3/envs/wikidyk/bin/torchrun", line 8, in <module>
+    sys.exit(main())
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
+    return f(*args, **kwargs)
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/run.py", line 918, in main
+    run(args)
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/run.py", line 909, in run
+    elastic_launch(
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 138, in __call__
+    return launch_agent(self._config, self._entrypoint, list(args))
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
+    raise ChildFailedError(
+torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
+============================================================
+src/train.py FAILED
+------------------------------------------------------------
+Failures:
+  <NO_OTHER_FAILURES>
+------------------------------------------------------------
+Root Cause (first observed failure):
+[0]:
+  time      : 2025-05-11_10:23:29
+  host      : bb9aa167977b
+  rank      : 0 (local_rank: 0)
+  exitcode  : 1 (pid: 267056)
+  error_file: <N/A>
+  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
+============================================================
+[2025-05-11 10:23:30] ERROR: Training failed for google/flan-t5-large with exit code 1
+[2025-05-11 10:23:30] ERROR: Training failed for google/flan-t5-large with exit code 1
+[2025-05-11 10:23:30] Check error log for details: train_results/google_flan-t5-large_full_upsample3000/20250511_102227.log
+[2025-05-11 10:23:30] Resource usage after training google/flan-t5-large:
+[2025-05-11 10:23:30] GPU memory usage:
+1 MiB, 81920 MiB
+1 MiB, 81920 MiB
+38923 MiB, 81920 MiB
+39333 MiB, 81920 MiB
+[2025-05-11 10:23:30] Disk space usage for model outputs:
+18G	train_results/google_flan-t5-large_full_upsample3000
+[2025-05-11 10:23:30]
+[2025-05-11 10:23:30] All training runs completed at Sun May 11 10:23:30 UTC 2025
+[2025-05-11 10:23:30] =======================================
+[2025-05-11 10:23:30] Summary of training runs:
+[2025-05-11 10:23:30] Model | Status | Duration | Output Size

20250511_110511.log ADDED Viewed

	@@ -0,0 +1,237 @@

+[2025-05-11 11:05:11] Created output directory: train_results/google_flan-t5-large_full_upsample3000
+[2025-05-11 11:05:11] Chat mode disabled
+[2025-05-11 11:05:11] Model size is 3B or smaller (0 B). Using full fine-tuning.
+[2025-05-11 11:05:11] Adjusted parameters for t5 model:
+[2025-05-11 11:05:11]   - LEARNING_RATE: 1e-4
+[2025-05-11 11:05:11]   - BATCH_SIZE: 32
+[2025-05-11 11:05:11]   - GRADIENT_ACCUMULATION_STEPS: 2
+[2025-05-11 11:05:11] No QA format data will be used
+[2025-05-11 11:05:11] =======================================
+[2025-05-11 11:05:11] Starting training for model: google/flan-t5-large
+[2025-05-11 11:05:11] =======================================
+[2025-05-11 11:05:11] CUDA_VISIBLE_DEVICES: 0,1
+[2025-05-11 11:05:11] WANDB_PROJECT: wikidyk-ar
+[2025-05-11 11:05:11] DATA_PATH: data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json
+[2025-05-11 11:05:11] Global Batch Size: 128
+[2025-05-11 11:05:11] Data Size: -1
+[2025-05-11 11:05:11] Executing command: torchrun --nproc_per_node "2" --master-port 29502 src/train.py     --model_name_or_path "google/flan-t5-large"     --data_path "data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json"     --output_dir "train_results/google_flan-t5-large_full_upsample3000"     --num_upsample "3000"     --per_device_train_batch_size "32"     --gradient_accumulation_steps "2"     --learning_rate "1e-4"     --num_train_epochs "1"     --model_max_length "32768"     --report_to wandb --logging_steps 50 --save_steps 10000 --save_total_limit 3     --resume_from_checkpoint True     --bf16 True --use_flash_attention_2 True     --qa_data_ratio "-1"     --predict_mask "false"
+[2025-05-11 11:05:11] Training started at Sun May 11 11:05:11 UTC 2025
+W0511 11:05:13.027000 275137 site-packages/torch/distributed/run.py:793]
+W0511 11:05:13.027000 275137 site-packages/torch/distributed/run.py:793] *****************************************
+W0511 11:05:13.027000 275137 site-packages/torch/distributed/run.py:793] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
+W0511 11:05:13.027000 275137 site-packages/torch/distributed/run.py:793] *****************************************
+Traceback (most recent call last):
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1967, in _get_module
+    return importlib.import_module("." + module_name, self.__name__)
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/importlib/__init__.py", line 126, in import_module
+    return _bootstrap._gcd_import(name[level:], package, level)
+  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
+  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
+  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
+  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
+  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
+  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/modeling_utils.py", line 62, in <module>
+    from .integrations.flash_attention import flash_attention_forward
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/integrations/flash_attention.py", line 5, in <module>
+    from ..modeling_flash_attention_utils import _flash_attention_forward, flash_attn_supports_top_left_mask
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/modeling_flash_attention_utils.py", line 36, in <module>
+    from flash_attn.bert_padding import index_first_axis, pad_input, unpad_input  # noqa
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/flash_attn/__init__.py", line 3, in <module>
+    from flash_attn.flash_attn_interface import (
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 15, in <module>
+    import flash_attn_2_cuda as flash_attn_gpu
+ImportError: /root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNK3c1011StorageImpl27throw_data_ptr_access_errorEv
+The above exception was the direct cause of the following exception:
+Traceback (most recent call last):
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1967, in _get_module
+    return importlib.import_module("." + module_name, self.__name__)
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/importlib/__init__.py", line 126, in import_module
+    return _bootstrap._gcd_import(name[level:], package, level)
+  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
+  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
+  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
+  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
+  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
+  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/integrations/integration_utils.py", line 36, in <module>
+    from .. import PreTrainedModel, TFPreTrainedModel
+  File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1955, in __getattr__
+    module = self._get_module(self._class_to_module[name])
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1969, in _get_module
+    raise RuntimeError(
+RuntimeError: Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
+/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNK3c1011StorageImpl27throw_data_ptr_access_errorEv
+The above exception was the direct cause of the following exception:
+Traceback (most recent call last):
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1967, in _get_module
+    return importlib.import_module("." + module_name, self.__name__)
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/importlib/__init__.py", line 126, in import_module
+    return _bootstrap._gcd_import(name[level:], package, level)
+  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
+  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
+  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
+  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
+  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
+  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 41, in <module>
+    from .integrations import (
+  File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1955, in __getattr__
+    module = self._get_module(self._class_to_module[name])
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1969, in _get_module
+    raise RuntimeError(
+RuntimeError: Failed to import transformers.integrations.integration_utils because of the following error (look up to see its traceback):
+Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
+/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNK3c1011StorageImpl27throw_data_ptr_access_errorEv
+The above exception was the direct cause of the following exception:
+Traceback (most recent call last):
+  File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 5, in <module>
+    from transformers import Trainer, TrainingArguments, PreTrainedTokenizer, HfArgumentParser, AutoTokenizer
+  File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1955, in __getattr__
+    module = self._get_module(self._class_to_module[name])
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1969, in _get_module
+Traceback (most recent call last):
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1967, in _get_module
+    raise RuntimeError(
+RuntimeError: Failed to import transformers.trainer because of the following error (look up to see its traceback):
+Failed to import transformers.integrations.integration_utils because of the following error (look up to see its traceback):
+Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
+/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNK3c1011StorageImpl27throw_data_ptr_access_errorEv
+    return importlib.import_module("." + module_name, self.__name__)
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/importlib/__init__.py", line 126, in import_module
+    return _bootstrap._gcd_import(name[level:], package, level)
+  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
+  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
+  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
+  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
+  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
+  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/modeling_utils.py", line 62, in <module>
+    from .integrations.flash_attention import flash_attention_forward
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/integrations/flash_attention.py", line 5, in <module>
+    from ..modeling_flash_attention_utils import _flash_attention_forward, flash_attn_supports_top_left_mask
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/modeling_flash_attention_utils.py", line 36, in <module>
+    from flash_attn.bert_padding import index_first_axis, pad_input, unpad_input  # noqa
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/flash_attn/__init__.py", line 3, in <module>
+    from flash_attn.flash_attn_interface import (
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 15, in <module>
+    import flash_attn_2_cuda as flash_attn_gpu
+ImportError: /root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNK3c1011StorageImpl27throw_data_ptr_access_errorEv
+The above exception was the direct cause of the following exception:
+Traceback (most recent call last):
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1967, in _get_module
+    return importlib.import_module("." + module_name, self.__name__)
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/importlib/__init__.py", line 126, in import_module
+    return _bootstrap._gcd_import(name[level:], package, level)
+  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
+  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
+  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
+  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
+  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
+  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/integrations/integration_utils.py", line 36, in <module>
+    from .. import PreTrainedModel, TFPreTrainedModel
+  File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1955, in __getattr__
+    module = self._get_module(self._class_to_module[name])
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1969, in _get_module
+    raise RuntimeError(
+RuntimeError: Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
+/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNK3c1011StorageImpl27throw_data_ptr_access_errorEv
+The above exception was the direct cause of the following exception:
+Traceback (most recent call last):
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1967, in _get_module
+    return importlib.import_module("." + module_name, self.__name__)
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/importlib/__init__.py", line 126, in import_module
+    return _bootstrap._gcd_import(name[level:], package, level)
+  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
+  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
+  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
+  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
+  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
+  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 41, in <module>
+    from .integrations import (
+  File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1955, in __getattr__
+    module = self._get_module(self._class_to_module[name])
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1969, in _get_module
+    raise RuntimeError(
+RuntimeError: Failed to import transformers.integrations.integration_utils because of the following error (look up to see its traceback):
+Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
+/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNK3c1011StorageImpl27throw_data_ptr_access_errorEv
+The above exception was the direct cause of the following exception:
+Traceback (most recent call last):
+  File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 5, in <module>
+    from transformers import Trainer, TrainingArguments, PreTrainedTokenizer, HfArgumentParser, AutoTokenizer
+  File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1955, in __getattr__
+    module = self._get_module(self._class_to_module[name])
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1969, in _get_module
+    raise RuntimeError(
+RuntimeError: Failed to import transformers.trainer because of the following error (look up to see its traceback):
+Failed to import transformers.integrations.integration_utils because of the following error (look up to see its traceback):
+Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
+/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNK3c1011StorageImpl27throw_data_ptr_access_errorEv
+W0511 11:05:16.241000 275137 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 275202 closing signal SIGTERM
+E0511 11:05:16.506000 275137 site-packages/torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 1 (pid: 275203) of binary: /root/miniconda3/envs/wikidyk/bin/python
+Traceback (most recent call last):
+  File "/root/miniconda3/envs/wikidyk/bin/torchrun", line 8, in <module>
+    sys.exit(main())
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
+    return f(*args, **kwargs)
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/run.py", line 919, in main
+    run(args)
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/run.py", line 910, in run
+    elastic_launch(
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 138, in __call__
+    return launch_agent(self._config, self._entrypoint, list(args))
+  File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
+    raise ChildFailedError(
+torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
+============================================================
+src/train.py FAILED
+------------------------------------------------------------
+Failures:
+  <NO_OTHER_FAILURES>
+------------------------------------------------------------
+Root Cause (first observed failure):
+[0]:
+  time      : 2025-05-11_11:05:16
+  host      : bb9aa167977b
+  rank      : 1 (local_rank: 1)
+  exitcode  : 1 (pid: 275203)
+  error_file: <N/A>
+  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
+============================================================
+[2025-05-11 11:05:16] ERROR: Training failed for google/flan-t5-large with exit code 1
+[2025-05-11 11:05:16] ERROR: Training failed for google/flan-t5-large with exit code 1
+[2025-05-11 11:05:16] Check error log for details: train_results/google_flan-t5-large_full_upsample3000/20250511_110511.log
+[2025-05-11 11:05:16] Resource usage after training google/flan-t5-large:
+[2025-05-11 11:05:16] GPU memory usage:
+1 MiB, 81920 MiB
+1 MiB, 81920 MiB
+40409 MiB, 81920 MiB
+40721 MiB, 81920 MiB
+[2025-05-11 11:05:16] Disk space usage for model outputs:
+18G	train_results/google_flan-t5-large_full_upsample3000
+[2025-05-11 11:05:16]
+[2025-05-11 11:05:16] All training runs completed at Sun May 11 11:05:16 UTC 2025
+[2025-05-11 11:05:16] =======================================
+[2025-05-11 11:05:16] Summary of training runs:
+[2025-05-11 11:05:16] Model | Status | Duration | Output Size

20250511_110815.log ADDED Viewed

@@ -0,0 +1,83 @@
  0%|          | 0/288047 [00:00<?, ?it/s]/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3980: UserWarning: `as_target_tokenizer` is deprecated and will be removed in v5 of Transformers. You can tokenize your labels by using the argument `text_target` of the regular `__call__` method (either in the same call as your input texts if you use the same keyword arguments, or in a separate call.
 62%|██████▏   | 180001/288047 [00:05<00:03, 31170.87it/s]
 62%|██████▏   | 180025/288047 [00:20<00:03, 31170.87it/s]
 62%|██████▏   | 180026/288047 [00:20<00:15, 6839.34it/s]
 62%|██████▏   | 180027/288047 [00:20<00:16, 6580.38it/s]
 63%|██████▎   | 180050/288047 [00:32<00:16, 6580.38it/s]
 63%|██████▎   | 180064/288047 [00:40<00:16, 6580.38it/s]
 63%|██████▎   | 180065/288047 [00:40<00:50, 2126.11it/s]
 63%|██████▎   | 180066/288047 [00:40<00:51, 2079.67it/s][2025-05-11 11:10:04] ERROR: Training failed for google/flan-t5-large with exit code 1

+[2025-05-11 11:08:15] Created output directory: train_results/google_flan-t5-large_full_upsample3000
+[2025-05-11 11:08:15] Chat mode disabled
+[2025-05-11 11:08:15] Model size is 3B or smaller (0 B). Using full fine-tuning.
+[2025-05-11 11:08:15] Adjusted parameters for t5 model:
+[2025-05-11 11:08:15]   - LEARNING_RATE: 1e-4
+[2025-05-11 11:08:15]   - BATCH_SIZE: 32
+[2025-05-11 11:08:15]   - GRADIENT_ACCUMULATION_STEPS: 2
+[2025-05-11 11:08:15] No QA format data will be used
+[2025-05-11 11:08:15] =======================================
+[2025-05-11 11:08:15] Starting training for model: google/flan-t5-large
+[2025-05-11 11:08:15] =======================================
+[2025-05-11 11:08:15] CUDA_VISIBLE_DEVICES: 0,1
+[2025-05-11 11:08:15] WANDB_PROJECT: wikidyk-ar
+[2025-05-11 11:08:15] DATA_PATH: data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json
+[2025-05-11 11:08:15] Global Batch Size: 128
+[2025-05-11 11:08:15] Data Size: -1
+[2025-05-11 11:08:15] Executing command: torchrun --nproc_per_node "2" --master-port 29502 src/train.py     --model_name_or_path "google/flan-t5-large"     --data_path "data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json"     --output_dir "train_results/google_flan-t5-large_full_upsample3000"     --num_upsample "3000"     --per_device_train_batch_size "32"     --gradient_accumulation_steps "2"     --learning_rate "1e-4"     --num_train_epochs "1"     --model_max_length "32768"     --report_to wandb --logging_steps 50 --save_steps 10000 --save_total_limit 3     --resume_from_checkpoint True     --bf16 True --use_flash_attention_2 True     --qa_data_ratio "-1"     --predict_mask "false"
+[2025-05-11 11:08:15] Training started at Sun May 11 11:08:15 UTC 2025
+W0511 11:08:16.439000 275986 site-packages/torch/distributed/run.py:792]
+W0511 11:08:16.439000 275986 site-packages/torch/distributed/run.py:792] *****************************************
+W0511 11:08:16.439000 275986 site-packages/torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
+W0511 11:08:16.439000 275986 site-packages/torch/distributed/run.py:792] *****************************************
+WARNING:root:Output directory: train_results/google_flan-t5-large_full_upsample3000
+WARNING:root:Output directory: train_results/google_flan-t5-large_full_upsample3000
+You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
+WARNING:root:Loading data...
+You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
+WARNING:root:Loading data...
+WARNING:root:Dataset initialized with all QA data:
+WARNING:root:  - 0 QA examples
+WARNING:root:  - 12290 fact examples with upsampling factor 3000
+WARNING:root:  - Total examples: 36870000
+WARNING:root:Dataset initialized with all QA data:
+WARNING:root:  - 0 QA examples
+WARNING:root:  - 12290 fact examples with upsampling factor 3000
+WARNING:root:  - Total examples: 36870000
+/root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
+  trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
+You are resuming training from a checkpoint trained with 4.51.1 of Transformers but your current version is 4.51.3. This is not recommended and could yield to errors or unwanted behaviors.
+There were missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'].
+/root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
+  trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
+You are resuming training from a checkpoint trained with 4.51.1 of Transformers but your current version is 4.51.3. This is not recommended and could yield to errors or unwanted behaviors.
+There were missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'].
+wandb: WARNING The `run_name` is currently set to the same value as `TrainingArguments.output_dir`. If this was not intended, please specify a different run name by setting the `TrainingArguments.run_name` parameter.
+wandb: Currently logged in as: yuweiz to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
+wandb: Tracking run with wandb version 0.19.11
+wandb: Run data is saved locally in /root/yuwei/WikiDYKEvalV2/wandb/run-20250511_110911-cbretwes
+wandb: Run `wandb offline` to turn off syncing.
+wandb: Syncing run train_results/google_flan-t5-large_full_upsample3000
+wandb: ⭐️ View project at https://wandb.ai/yuweiz/wikidyk-ar
+wandb: 🚀 View run at https://wandb.ai/yuweiz/wikidyk-ar/runs/cbretwes
  0%|          | 0/288047 [00:00<?, ?it/s]/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3980: UserWarning: `as_target_tokenizer` is deprecated and will be removed in v5 of Transformers. You can tokenize your labels by using the argument `text_target` of the regular `__call__` method (either in the same call as your input texts if you use the same keyword arguments, or in a separate call.
+  warnings.warn(
+Didn't manage to set back the RNG states of the CUDA because of the following error:
+ tuple index out of range
+This won't yield the same results as if the training had not been interrupted.
+Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.48.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.
+[rank1]:[W511 11:09:14.166081835 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
+/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3980: UserWarning: `as_target_tokenizer` is deprecated and will be removed in v5 of Transformers. You can tokenize your labels by using the argument `text_target` of the regular `__call__` method (either in the same call as your input texts if you use the same keyword arguments, or in a separate call.
+  warnings.warn(
+Didn't manage to set back the RNG states of the CUDA because of the following error:
+ tuple index out of range
+This won't yield the same results as if the training had not been interrupted.
+Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.48.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.
+[rank0]:[W511 11:09:17.716570778 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
 62%|██████▏   | 180001/288047 [00:05<00:03, 31170.87it/s]
 62%|██████▏   | 180025/288047 [00:20<00:03, 31170.87it/s]
 62%|██████▏   | 180026/288047 [00:20<00:15, 6839.34it/s]
 62%|██████▏   | 180027/288047 [00:20<00:16, 6580.38it/s]
 63%|██████▎   | 180050/288047 [00:32<00:16, 6580.38it/s]
 63%|██████▎   | 180064/288047 [00:40<00:16, 6580.38it/s]
 63%|██████▎   | 180065/288047 [00:40<00:50, 2126.11it/s]
 63%|██████▎   | 180066/288047 [00:40<00:51, 2079.67it/s][2025-05-11 11:10:04] ERROR: Training failed for google/flan-t5-large with exit code 1
+[2025-05-11 11:10:04] ERROR: Training failed for google/flan-t5-large with exit code 1
+[2025-05-11 11:10:04] Check error log for details: train_results/google_flan-t5-large_full_upsample3000/20250511_110815.log
+[2025-05-11 11:10:04] Resource usage after training google/flan-t5-large:
+[2025-05-11 11:10:04] GPU memory usage:
+1 MiB, 81920 MiB
+1 MiB, 81920 MiB
+40409 MiB, 81920 MiB
+40721 MiB, 81920 MiB
+[2025-05-11 11:10:05] Disk space usage for model outputs:
+18G	train_results/google_flan-t5-large_full_upsample3000
+[2025-05-11 11:10:05]
+[2025-05-11 11:10:05] All training runs completed at Sun May 11 11:10:05 UTC 2025
+[2025-05-11 11:10:05] =======================================
+[2025-05-11 11:10:05] Summary of training runs:
+[2025-05-11 11:10:05] Model | Status | Duration | Output Size

20250511_111054.log ADDED Viewed

@@ -0,0 +1,85 @@
  0%|          | 0/288047 [00:00<?, ?it/s]/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3980: UserWarning: `as_target_tokenizer` is deprecated and will be removed in v5 of Transformers. You can tokenize your labels by using the argument `text_target` of the regular `__call__` method (either in the same call as your input texts if you use the same keyword arguments, or in a separate call.
 66%|██████▌   | 190001/288047 [00:05<00:02, 33662.28it/s]
 66%|██████▌   | 190025/288047 [00:20<00:02, 33662.28it/s]
 66%|██████▌   | 190026/288047 [00:20<00:13, 7188.28it/s]
 66%|██████▌   | 190027/288047 [00:20<00:14, 6904.23it/s]
                                                         {'loss': 0.0021, 'grad_norm': 0.019677339121699333, 'learning_rate': 3.402153120844862e-05, 'epoch': 0.66}
 66%|██████▌   | 190050/288047 [00:34<00:14, 6904.23it/s]
 66%|██████▌   | 190060/288047 [00:40<00:14, 6904.23it/s]
 66%|██████▌   | 190061/288047 [00:40<00:43, 2257.82it/s]
 66%|██████▌   | 190062/288047 [00:40<00:44, 2192.95it/s]
 66%|██████▌   | 190100/288047 [00:59<00:44, 2192.95it/s]
 66%|██████▌   | 190101/288047 [01:00<00:44, 2192.95it/s]
 66%|██████▌   | 190102/288047 [01:00<01:44, 934.46it/s]
 66%|██████▌   | 190103/288047 [01:00<01:47, 914.55it/s]
 66%|██████▌   | 190137/288047 [01:20<01:47, 914.55it/s]
 66%|██████▌   | 190138/288047 [01:20<03:47, 429.43it/s]
 66%|██████▌   | 190139/288047 [01:20<03:53, 419.89it/s]
 66%|██████▌   | 190150/288047 [01:27<03:53, 419.89it/s][2025-05-11 11:13:24] ERROR: Training failed for google/flan-t5-large with exit code 1

+[2025-05-11 11:10:54] Created output directory: train_results/google_flan-t5-large_full_upsample3000
+[2025-05-11 11:10:54] Chat mode disabled
+[2025-05-11 11:10:54] Model size is 3B or smaller (0 B). Using full fine-tuning.
+[2025-05-11 11:10:54] Adjusted parameters for t5 model:
+[2025-05-11 11:10:54]   - LEARNING_RATE: 1e-4
+[2025-05-11 11:10:54]   - BATCH_SIZE: 32
+[2025-05-11 11:10:54]   - GRADIENT_ACCUMULATION_STEPS: 2
+[2025-05-11 11:10:54] No QA format data will be used
+[2025-05-11 11:10:54] =======================================
+[2025-05-11 11:10:54] Starting training for model: google/flan-t5-large
+[2025-05-11 11:10:54] =======================================
+[2025-05-11 11:10:54] CUDA_VISIBLE_DEVICES: 0,1
+[2025-05-11 11:10:54] WANDB_PROJECT: wikidyk-ar
+[2025-05-11 11:10:54] DATA_PATH: data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json
+[2025-05-11 11:10:54] Global Batch Size: 128
+[2025-05-11 11:10:54] Data Size: -1
+[2025-05-11 11:10:54] Executing command: torchrun --nproc_per_node "2" --master-port 29502 src/train.py     --model_name_or_path "google/flan-t5-large"     --data_path "data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json"     --output_dir "train_results/google_flan-t5-large_full_upsample3000"     --num_upsample "3000"     --per_device_train_batch_size "32"     --gradient_accumulation_steps "2"     --learning_rate "1e-4"     --num_train_epochs "1"     --model_max_length "32768"     --report_to wandb --logging_steps 50 --save_steps 10000 --save_total_limit 3     --resume_from_checkpoint True     --bf16 True --use_flash_attention_2 True     --qa_data_ratio "-1"     --predict_mask "false"
+[2025-05-11 11:10:54] Training started at Sun May 11 11:10:54 UTC 2025
+W0511 11:10:55.754000 276498 site-packages/torch/distributed/run.py:792]
+W0511 11:10:55.754000 276498 site-packages/torch/distributed/run.py:792] *****************************************
+W0511 11:10:55.754000 276498 site-packages/torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
+W0511 11:10:55.754000 276498 site-packages/torch/distributed/run.py:792] *****************************************
+WARNING:root:Output directory: train_results/google_flan-t5-large_full_upsample3000
+You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
+WARNING:root:Loading data...
+WARNING:root:Output directory: train_results/google_flan-t5-large_full_upsample3000
+You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
+WARNING:root:Loading data...
+WARNING:root:Dataset initialized with all QA data:
+WARNING:root:  - 0 QA examples
+WARNING:root:  - 12290 fact examples with upsampling factor 3000
+WARNING:root:  - Total examples: 36870000
+WARNING:root:Dataset initialized with all QA data:
+WARNING:root:  - 0 QA examples
+WARNING:root:  - 12290 fact examples with upsampling factor 3000
+WARNING:root:  - Total examples: 36870000
+/root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
+  trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
+You are resuming training from a checkpoint trained with 4.51.1 of Transformers but your current version is 4.51.3. This is not recommended and could yield to errors or unwanted behaviors.
+There were missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'].
+/root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
+  trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
+You are resuming training from a checkpoint trained with 4.51.1 of Transformers but your current version is 4.51.3. This is not recommended and could yield to errors or unwanted behaviors.
+There were missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'].
+wandb: WARNING The `run_name` is currently set to the same value as `TrainingArguments.output_dir`. If this was not intended, please specify a different run name by setting the `TrainingArguments.run_name` parameter.
+wandb: Currently logged in as: yuweiz to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
+wandb: Tracking run with wandb version 0.19.11
+wandb: Run data is saved locally in /root/yuwei/WikiDYKEvalV2/wandb/run-20250511_111150-2u5am4ts
+wandb: Run `wandb offline` to turn off syncing.
+wandb: Syncing run train_results/google_flan-t5-large_full_upsample3000
+wandb: ⭐️ View project at https://wandb.ai/yuweiz/wikidyk-ar
+wandb: 🚀 View run at https://wandb.ai/yuweiz/wikidyk-ar/runs/2u5am4ts
  0%|          | 0/288047 [00:00<?, ?it/s]/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3980: UserWarning: `as_target_tokenizer` is deprecated and will be removed in v5 of Transformers. You can tokenize your labels by using the argument `text_target` of the regular `__call__` method (either in the same call as your input texts if you use the same keyword arguments, or in a separate call.
+  warnings.warn(
+Didn't manage to set back the RNG states of the CUDA because of the following error:
+ tuple index out of range
+This won't yield the same results as if the training had not been interrupted.
+/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3980: UserWarning: `as_target_tokenizer` is deprecated and will be removed in v5 of Transformers. You can tokenize your labels by using the argument `text_target` of the regular `__call__` method (either in the same call as your input texts if you use the same keyword arguments, or in a separate call.
+  warnings.warn(
+Didn't manage to set back the RNG states of the CUDA because of the following error:
+ tuple index out of range
+This won't yield the same results as if the training had not been interrupted.
+Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.48.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.
+[rank0]:[W511 11:11:55.768036568 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
+Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.48.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.
+[rank1]:[W511 11:11:55.383761995 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
 66%|██████▌   | 190001/288047 [00:05<00:02, 33662.28it/s]
 66%|██████▌   | 190025/288047 [00:20<00:02, 33662.28it/s]
 66%|██████▌   | 190026/288047 [00:20<00:13, 7188.28it/s]
 66%|██████▌   | 190027/288047 [00:20<00:14, 6904.23it/s]
                                                         {'loss': 0.0021, 'grad_norm': 0.019677339121699333, 'learning_rate': 3.402153120844862e-05, 'epoch': 0.66}
 66%|██████▌   | 190050/288047 [00:34<00:14, 6904.23it/s]
 66%|██████▌   | 190060/288047 [00:40<00:14, 6904.23it/s]
 66%|██████▌   | 190061/288047 [00:40<00:43, 2257.82it/s]
 66%|██████▌   | 190062/288047 [00:40<00:44, 2192.95it/s]
 66%|██████▌   | 190100/288047 [00:59<00:44, 2192.95it/s]
 66%|██████▌   | 190101/288047 [01:00<00:44, 2192.95it/s]
 66%|██████▌   | 190102/288047 [01:00<01:44, 934.46it/s]
 66%|██████▌   | 190103/288047 [01:00<01:47, 914.55it/s]
 66%|██████▌   | 190137/288047 [01:20<01:47, 914.55it/s]
 66%|██████▌   | 190138/288047 [01:20<03:47, 429.43it/s]
 66%|██████▌   | 190139/288047 [01:20<03:53, 419.89it/s]
 66%|██████▌   | 190150/288047 [01:27<03:53, 419.89it/s][2025-05-11 11:13:24] ERROR: Training failed for google/flan-t5-large with exit code 1
+[2025-05-11 11:13:24] ERROR: Training failed for google/flan-t5-large with exit code 1
+[2025-05-11 11:13:24] Check error log for details: train_results/google_flan-t5-large_full_upsample3000/20250511_111054.log
+[2025-05-11 11:13:24] Resource usage after training google/flan-t5-large:
+[2025-05-11 11:13:24] GPU memory usage:
+1 MiB, 81920 MiB
+1 MiB, 81920 MiB
+40409 MiB, 81920 MiB
+40721 MiB, 81920 MiB
+[2025-05-11 11:13:24] Disk space usage for model outputs:
+27G	train_results/google_flan-t5-large_full_upsample3000
+[2025-05-11 11:13:24]
+[2025-05-11 11:13:24] All training runs completed at Sun May 11 11:13:24 UTC 2025
+[2025-05-11 11:13:24] =======================================
+[2025-05-11 11:13:24] Summary of training runs:
+[2025-05-11 11:13:24] Model | Status | Duration | Output Size

20250511_111333.log ADDED Viewed

The diff for this file is too large to render. See raw diff

20250511_121707.log ADDED Viewed

The diff for this file is too large to render. See raw diff

20250511_225651.log ADDED Viewed

The diff for this file is too large to render. See raw diff

README.md ADDED Viewed

	@@ -0,0 +1,58 @@

+---
+library_name: transformers
+license: apache-2.0
+base_model: google/flan-t5-large
+tags:
+- generated_from_trainer
+model-index:
+- name: google_flan-t5-large_full_upsample3000
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# google_flan-t5-large_full_upsample3000
+This model is a fine-tuned version of [google/flan-t5-large](https://huggingface.co/google/flan-t5-large) on an unknown dataset.
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0001
+- train_batch_size: 32
+- eval_batch_size: 8
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 2
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 128
+- total_eval_batch_size: 16
+- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: linear
+- num_epochs: 1.0
+### Training results
+### Framework versions
+- Transformers 4.51.3
+- Pytorch 2.6.0+cu124
+- Datasets 3.6.0
+- Tokenizers 0.21.1

added_tokens.json ADDED Viewed

	@@ -0,0 +1,102 @@

+{
+  "<extra_id_0>": 32099,
+  "<extra_id_10>": 32089,
+  "<extra_id_11>": 32088,
+  "<extra_id_12>": 32087,
+  "<extra_id_13>": 32086,
+  "<extra_id_14>": 32085,
+  "<extra_id_15>": 32084,
+  "<extra_id_16>": 32083,
+  "<extra_id_17>": 32082,
+  "<extra_id_18>": 32081,
+  "<extra_id_19>": 32080,
+  "<extra_id_1>": 32098,
+  "<extra_id_20>": 32079,
+  "<extra_id_21>": 32078,
+  "<extra_id_22>": 32077,
+  "<extra_id_23>": 32076,
+  "<extra_id_24>": 32075,
+  "<extra_id_25>": 32074,
+  "<extra_id_26>": 32073,
+  "<extra_id_27>": 32072,
+  "<extra_id_28>": 32071,
+  "<extra_id_29>": 32070,
+  "<extra_id_2>": 32097,
+  "<extra_id_30>": 32069,
+  "<extra_id_31>": 32068,
+  "<extra_id_32>": 32067,
+  "<extra_id_33>": 32066,
+  "<extra_id_34>": 32065,
+  "<extra_id_35>": 32064,
+  "<extra_id_36>": 32063,
+  "<extra_id_37>": 32062,
+  "<extra_id_38>": 32061,
+  "<extra_id_39>": 32060,
+  "<extra_id_3>": 32096,
+  "<extra_id_40>": 32059,
+  "<extra_id_41>": 32058,
+  "<extra_id_42>": 32057,
+  "<extra_id_43>": 32056,
+  "<extra_id_44>": 32055,
+  "<extra_id_45>": 32054,
+  "<extra_id_46>": 32053,
+  "<extra_id_47>": 32052,
+  "<extra_id_48>": 32051,
+  "<extra_id_49>": 32050,
+  "<extra_id_4>": 32095,
+  "<extra_id_50>": 32049,
+  "<extra_id_51>": 32048,
+  "<extra_id_52>": 32047,
+  "<extra_id_53>": 32046,
+  "<extra_id_54>": 32045,
+  "<extra_id_55>": 32044,
+  "<extra_id_56>": 32043,
+  "<extra_id_57>": 32042,
+  "<extra_id_58>": 32041,
+  "<extra_id_59>": 32040,
+  "<extra_id_5>": 32094,
+  "<extra_id_60>": 32039,
+  "<extra_id_61>": 32038,
+  "<extra_id_62>": 32037,
+  "<extra_id_63>": 32036,
+  "<extra_id_64>": 32035,
+  "<extra_id_65>": 32034,
+  "<extra_id_66>": 32033,
+  "<extra_id_67>": 32032,
+  "<extra_id_68>": 32031,
+  "<extra_id_69>": 32030,
+  "<extra_id_6>": 32093,
+  "<extra_id_70>": 32029,
+  "<extra_id_71>": 32028,
+  "<extra_id_72>": 32027,
+  "<extra_id_73>": 32026,
+  "<extra_id_74>": 32025,
+  "<extra_id_75>": 32024,
+  "<extra_id_76>": 32023,
+  "<extra_id_77>": 32022,
+  "<extra_id_78>": 32021,
+  "<extra_id_79>": 32020,
+  "<extra_id_7>": 32092,
+  "<extra_id_80>": 32019,
+  "<extra_id_81>": 32018,
+  "<extra_id_82>": 32017,
+  "<extra_id_83>": 32016,
+  "<extra_id_84>": 32015,
+  "<extra_id_85>": 32014,
+  "<extra_id_86>": 32013,
+  "<extra_id_87>": 32012,
+  "<extra_id_88>": 32011,
+  "<extra_id_89>": 32010,
+  "<extra_id_8>": 32091,
+  "<extra_id_90>": 32009,
+  "<extra_id_91>": 32008,
+  "<extra_id_92>": 32007,
+  "<extra_id_93>": 32006,
+  "<extra_id_94>": 32005,
+  "<extra_id_95>": 32004,
+  "<extra_id_96>": 32003,
+  "<extra_id_97>": 32002,
+  "<extra_id_98>": 32001,
+  "<extra_id_99>": 32000,
+  "<extra_id_9>": 32090
+}

config.json ADDED Viewed

	@@ -0,0 +1,32 @@

+{
+  "architectures": [
+    "T5ForConditionalGeneration"
+  ],
+  "classifier_dropout": 0.0,
+  "d_ff": 2816,
+  "d_kv": 64,
+  "d_model": 1024,
+  "decoder_start_token_id": 0,
+  "dense_act_fn": "gelu_new",
+  "dropout_rate": 0.1,
+  "eos_token_id": 1,
+  "feed_forward_proj": "gated-gelu",
+  "initializer_factor": 1.0,
+  "is_encoder_decoder": true,
+  "is_gated_act": true,
+  "layer_norm_epsilon": 1e-06,
+  "model_type": "t5",
+  "n_positions": 512,
+  "num_decoder_layers": 24,
+  "num_heads": 16,
+  "num_layers": 24,
+  "output_past": true,
+  "pad_token_id": 0,
+  "relative_attention_max_distance": 128,
+  "relative_attention_num_buckets": 32,
+  "tie_word_embeddings": false,
+  "torch_dtype": "float32",
+  "transformers_version": "4.51.3",
+  "use_cache": true,
+  "vocab_size": 32128
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "_from_model_config": true,
+  "decoder_start_token_id": 0,
+  "eos_token_id": 1,
+  "pad_token_id": 0,
+  "transformers_version": "4.51.3"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0a0f6221b9dcbcdce6374d0958d35cc9f3ce103485e6b9640c7b85bbf68f71ce
+size 3132668808

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,125 @@

+{
+  "additional_special_tokens": [
+    "<extra_id_0>",
+    "<extra_id_1>",
+    "<extra_id_2>",
+    "<extra_id_3>",
+    "<extra_id_4>",
+    "<extra_id_5>",
+    "<extra_id_6>",
+    "<extra_id_7>",
+    "<extra_id_8>",
+    "<extra_id_9>",
+    "<extra_id_10>",
+    "<extra_id_11>",
+    "<extra_id_12>",
+    "<extra_id_13>",
+    "<extra_id_14>",
+    "<extra_id_15>",
+    "<extra_id_16>",
+    "<extra_id_17>",
+    "<extra_id_18>",
+    "<extra_id_19>",
+    "<extra_id_20>",
+    "<extra_id_21>",
+    "<extra_id_22>",
+    "<extra_id_23>",
+    "<extra_id_24>",
+    "<extra_id_25>",
+    "<extra_id_26>",
+    "<extra_id_27>",
+    "<extra_id_28>",
+    "<extra_id_29>",
+    "<extra_id_30>",
+    "<extra_id_31>",
+    "<extra_id_32>",
+    "<extra_id_33>",
+    "<extra_id_34>",
+    "<extra_id_35>",
+    "<extra_id_36>",
+    "<extra_id_37>",
+    "<extra_id_38>",
+    "<extra_id_39>",
+    "<extra_id_40>",
+    "<extra_id_41>",
+    "<extra_id_42>",
+    "<extra_id_43>",
+    "<extra_id_44>",
+    "<extra_id_45>",
+    "<extra_id_46>",
+    "<extra_id_47>",
+    "<extra_id_48>",
+    "<extra_id_49>",
+    "<extra_id_50>",
+    "<extra_id_51>",
+    "<extra_id_52>",
+    "<extra_id_53>",
+    "<extra_id_54>",
+    "<extra_id_55>",
+    "<extra_id_56>",
+    "<extra_id_57>",
+    "<extra_id_58>",
+    "<extra_id_59>",
+    "<extra_id_60>",
+    "<extra_id_61>",
+    "<extra_id_62>",
+    "<extra_id_63>",
+    "<extra_id_64>",
+    "<extra_id_65>",
+    "<extra_id_66>",
+    "<extra_id_67>",
+    "<extra_id_68>",
+    "<extra_id_69>",
+    "<extra_id_70>",
+    "<extra_id_71>",
+    "<extra_id_72>",
+    "<extra_id_73>",
+    "<extra_id_74>",
+    "<extra_id_75>",
+    "<extra_id_76>",
+    "<extra_id_77>",
+    "<extra_id_78>",
+    "<extra_id_79>",
+    "<extra_id_80>",
+    "<extra_id_81>",
+    "<extra_id_82>",
+    "<extra_id_83>",
+    "<extra_id_84>",
+    "<extra_id_85>",
+    "<extra_id_86>",
+    "<extra_id_87>",
+    "<extra_id_88>",
+    "<extra_id_89>",
+    "<extra_id_90>",
+    "<extra_id_91>",
+    "<extra_id_92>",
+    "<extra_id_93>",
+    "<extra_id_94>",
+    "<extra_id_95>",
+    "<extra_id_96>",
+    "<extra_id_97>",
+    "<extra_id_98>",
+    "<extra_id_99>"
+  ],
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

spiece.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d60acb128cf7b7f2536e8f38a5b18a05535c9e14c7a355904270e15b0945ea86
+size 791656

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,942 @@

+{
+  "add_prefix_space": true,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32000": {
+      "content": "<extra_id_99>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32001": {
+      "content": "<extra_id_98>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32002": {
+      "content": "<extra_id_97>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32003": {
+      "content": "<extra_id_96>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32004": {
+      "content": "<extra_id_95>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32005": {
+      "content": "<extra_id_94>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32006": {
+      "content": "<extra_id_93>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32007": {
+      "content": "<extra_id_92>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32008": {
+      "content": "<extra_id_91>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32009": {
+      "content": "<extra_id_90>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32010": {
+      "content": "<extra_id_89>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32011": {
+      "content": "<extra_id_88>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32012": {
+      "content": "<extra_id_87>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32013": {
+      "content": "<extra_id_86>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32014": {
+      "content": "<extra_id_85>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32015": {
+      "content": "<extra_id_84>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32016": {
+      "content": "<extra_id_83>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32017": {
+      "content": "<extra_id_82>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32018": {
+      "content": "<extra_id_81>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32019": {
+      "content": "<extra_id_80>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32020": {
+      "content": "<extra_id_79>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32021": {
+      "content": "<extra_id_78>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32022": {
+      "content": "<extra_id_77>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32023": {
+      "content": "<extra_id_76>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32024": {
+      "content": "<extra_id_75>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32025": {
+      "content": "<extra_id_74>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32026": {
+      "content": "<extra_id_73>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32027": {
+      "content": "<extra_id_72>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32028": {
+      "content": "<extra_id_71>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32029": {
+      "content": "<extra_id_70>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32030": {
+      "content": "<extra_id_69>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32031": {
+      "content": "<extra_id_68>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32032": {
+      "content": "<extra_id_67>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32033": {
+      "content": "<extra_id_66>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32034": {
+      "content": "<extra_id_65>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32035": {
+      "content": "<extra_id_64>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32036": {
+      "content": "<extra_id_63>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32037": {
+      "content": "<extra_id_62>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32038": {
+      "content": "<extra_id_61>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32039": {
+      "content": "<extra_id_60>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32040": {
+      "content": "<extra_id_59>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32041": {
+      "content": "<extra_id_58>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32042": {
+      "content": "<extra_id_57>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32043": {
+      "content": "<extra_id_56>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32044": {
+      "content": "<extra_id_55>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32045": {
+      "content": "<extra_id_54>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32046": {
+      "content": "<extra_id_53>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32047": {
+      "content": "<extra_id_52>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32048": {
+      "content": "<extra_id_51>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32049": {
+      "content": "<extra_id_50>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32050": {
+      "content": "<extra_id_49>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32051": {
+      "content": "<extra_id_48>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32052": {
+      "content": "<extra_id_47>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32053": {
+      "content": "<extra_id_46>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32054": {
+      "content": "<extra_id_45>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32055": {
+      "content": "<extra_id_44>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32056": {
+      "content": "<extra_id_43>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32057": {
+      "content": "<extra_id_42>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32058": {
+      "content": "<extra_id_41>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32059": {
+      "content": "<extra_id_40>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32060": {
+      "content": "<extra_id_39>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32061": {
+      "content": "<extra_id_38>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32062": {
+      "content": "<extra_id_37>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32063": {
+      "content": "<extra_id_36>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32064": {
+      "content": "<extra_id_35>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32065": {
+      "content": "<extra_id_34>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32066": {
+      "content": "<extra_id_33>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32067": {
+      "content": "<extra_id_32>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32068": {
+      "content": "<extra_id_31>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32069": {
+      "content": "<extra_id_30>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32070": {
+      "content": "<extra_id_29>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32071": {
+      "content": "<extra_id_28>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32072": {
+      "content": "<extra_id_27>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32073": {
+      "content": "<extra_id_26>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32074": {
+      "content": "<extra_id_25>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32075": {
+      "content": "<extra_id_24>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32076": {
+      "content": "<extra_id_23>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32077": {
+      "content": "<extra_id_22>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32078": {
+      "content": "<extra_id_21>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32079": {
+      "content": "<extra_id_20>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32080": {
+      "content": "<extra_id_19>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32081": {
+      "content": "<extra_id_18>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32082": {
+      "content": "<extra_id_17>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32083": {
+      "content": "<extra_id_16>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32084": {
+      "content": "<extra_id_15>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32085": {
+      "content": "<extra_id_14>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32086": {
+      "content": "<extra_id_13>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32087": {
+      "content": "<extra_id_12>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32088": {
+      "content": "<extra_id_11>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32089": {
+      "content": "<extra_id_10>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32090": {
+      "content": "<extra_id_9>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32091": {
+      "content": "<extra_id_8>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32092": {
+      "content": "<extra_id_7>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32093": {
+      "content": "<extra_id_6>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32094": {
+      "content": "<extra_id_5>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32095": {
+      "content": "<extra_id_4>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32096": {
+      "content": "<extra_id_3>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32097": {
+      "content": "<extra_id_2>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32098": {
+      "content": "<extra_id_1>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32099": {
+      "content": "<extra_id_0>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<extra_id_0>",
+    "<extra_id_1>",
+    "<extra_id_2>",
+    "<extra_id_3>",
+    "<extra_id_4>",
+    "<extra_id_5>",
+    "<extra_id_6>",
+    "<extra_id_7>",
+    "<extra_id_8>",
+    "<extra_id_9>",
+    "<extra_id_10>",
+    "<extra_id_11>",
+    "<extra_id_12>",
+    "<extra_id_13>",
+    "<extra_id_14>",
+    "<extra_id_15>",
+    "<extra_id_16>",
+    "<extra_id_17>",
+    "<extra_id_18>",
+    "<extra_id_19>",
+    "<extra_id_20>",
+    "<extra_id_21>",
+    "<extra_id_22>",
+    "<extra_id_23>",
+    "<extra_id_24>",
+    "<extra_id_25>",
+    "<extra_id_26>",
+    "<extra_id_27>",
+    "<extra_id_28>",
+    "<extra_id_29>",
+    "<extra_id_30>",
+    "<extra_id_31>",
+    "<extra_id_32>",
+    "<extra_id_33>",
+    "<extra_id_34>",
+    "<extra_id_35>",
+    "<extra_id_36>",
+    "<extra_id_37>",
+    "<extra_id_38>",
+    "<extra_id_39>",
+    "<extra_id_40>",
+    "<extra_id_41>",
+    "<extra_id_42>",
+    "<extra_id_43>",
+    "<extra_id_44>",
+    "<extra_id_45>",
+    "<extra_id_46>",
+    "<extra_id_47>",
+    "<extra_id_48>",
+    "<extra_id_49>",
+    "<extra_id_50>",
+    "<extra_id_51>",
+    "<extra_id_52>",
+    "<extra_id_53>",
+    "<extra_id_54>",
+    "<extra_id_55>",
+    "<extra_id_56>",
+    "<extra_id_57>",
+    "<extra_id_58>",
+    "<extra_id_59>",
+    "<extra_id_60>",
+    "<extra_id_61>",
+    "<extra_id_62>",
+    "<extra_id_63>",
+    "<extra_id_64>",
+    "<extra_id_65>",
+    "<extra_id_66>",
+    "<extra_id_67>",
+    "<extra_id_68>",
+    "<extra_id_69>",
+    "<extra_id_70>",
+    "<extra_id_71>",
+    "<extra_id_72>",
+    "<extra_id_73>",
+    "<extra_id_74>",
+    "<extra_id_75>",
+    "<extra_id_76>",
+    "<extra_id_77>",
+    "<extra_id_78>",
+    "<extra_id_79>",
+    "<extra_id_80>",
+    "<extra_id_81>",
+    "<extra_id_82>",
+    "<extra_id_83>",
+    "<extra_id_84>",
+    "<extra_id_85>",
+    "<extra_id_86>",
+    "<extra_id_87>",
+    "<extra_id_88>",
+    "<extra_id_89>",
+    "<extra_id_90>",
+    "<extra_id_91>",
+    "<extra_id_92>",
+    "<extra_id_93>",
+    "<extra_id_94>",
+    "<extra_id_95>",
+    "<extra_id_96>",
+    "<extra_id_97>",
+    "<extra_id_98>",
+    "<extra_id_99>"
+  ],
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "extra_ids": 100,
+  "extra_special_tokens": {},
+  "legacy": true,
+  "model_max_length": 32768,
+  "pad_token": "<pad>",
+  "padding_side": "right",
+  "sp_model_kwargs": {},
+  "tokenizer_class": "T5Tokenizer",
+  "unk_token": "<unk>"
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3333cedc941dd98a629e756b246a1bed0370d67ddeedae3add1bcba19419ca60
+size 5368