YWZBrandon commited on
Commit
94686ca
·
verified ·
1 Parent(s): e20eb9b

End of training

Browse files
20250511_092138.log ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [2025-05-11 09:21:38] Created output directory: train_results/google_flan-t5-large_full_upsample3000
2
+ [2025-05-11 09:21:38] Chat mode disabled
3
+ [2025-05-11 09:21:38] Model size is 3B or smaller (0 B). Using full fine-tuning.
4
+ [2025-05-11 09:21:38] Adjusted parameters for t5 model:
5
+ [2025-05-11 09:21:38] - LEARNING_RATE: 1e-4
6
+ [2025-05-11 09:21:38] - BATCH_SIZE: 32
7
+ [2025-05-11 09:21:38] - GRADIENT_ACCUMULATION_STEPS: 1
8
+ [2025-05-11 09:21:38] No QA format data will be used
9
+ [2025-05-11 09:21:38] =======================================
10
+ [2025-05-11 09:21:38] Starting training for model: google/flan-t5-large
11
+ [2025-05-11 09:21:38] =======================================
12
+ [2025-05-11 09:21:38] CUDA_VISIBLE_DEVICES: 0,1
13
+ [2025-05-11 09:21:38] WANDB_PROJECT: wikidyk-ar
14
+ [2025-05-11 09:21:38] DATA_PATH: data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json
15
+ [2025-05-11 09:21:38] Global Batch Size: 64
16
+ [2025-05-11 09:21:38] Data Size: -1
17
+ [2025-05-11 09:21:38] Executing command: torchrun --nproc_per_node "2" --master-port 29502 src/train.py --model_name_or_path "google/flan-t5-large" --data_path "data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json" --output_dir "train_results/google_flan-t5-large_full_upsample3000" --num_upsample "3000" --per_device_train_batch_size "32" --gradient_accumulation_steps "1" --learning_rate "1e-4" --num_train_epochs "1" --model_max_length "32768" --report_to wandb --logging_steps 50 --save_steps 10000 --save_total_limit 3 <<<<<<< Updated upstream:scripts/train_full_flan_t5_large_us3000.sh
18
+ =======
19
+ --resume_from_checkpoint True >>>>>>> Stashed changes:scripts/train_full_flan_t5_large_us_3000.sh
20
+ --bf16 True --use_flash_attention_2 True --qa_data_ratio "-1" --predict_mask "false"
21
+ [2025-05-11 09:21:38] Training started at Sun May 11 09:21:38 UTC 2025
22
+ scripts/train_full_flan_t5_large_us3000.sh: eval: line 272: syntax error near unexpected token `<<<'
23
+ scripts/train_full_flan_t5_large_us3000.sh: eval: line 272: `torchrun --nproc_per_node "2" --master-port 29502 src/train.py --model_name_or_path "google/flan-t5-large" --data_path "data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json" --output_dir "train_results/google_flan-t5-large_full_upsample3000" --num_upsample "3000" --per_device_train_batch_size "32" --gradient_accumulation_steps "1" --learning_rate "1e-4" --num_train_epochs "1" --model_max_length "32768" --report_to wandb --logging_steps 50 --save_steps 10000 --save_total_limit 3 <<<<<<< Updated upstream:scripts/train_full_flan_t5_large_us3000.sh ======= --resume_from_checkpoint True >>>>>>> Stashed changes:scripts/train_full_flan_t5_large_us_3000.sh --bf16 True --use_flash_attention_2 True --qa_data_ratio "-1" --predict_mask "false"'
24
+ [2025-05-11 09:21:38] ERROR: Training failed for google/flan-t5-large with exit code 2
25
+ [2025-05-11 09:21:38] ERROR: Training failed for google/flan-t5-large with exit code 2
26
+ [2025-05-11 09:21:38] Check error log for details: train_results/google_flan-t5-large_full_upsample3000/20250511_092138.log
27
+ [2025-05-11 09:21:38] Resource usage after training google/flan-t5-large:
28
+ [2025-05-11 09:21:38] GPU memory usage:
29
+ 1 MiB, 81920 MiB
30
+ 1 MiB, 81920 MiB
31
+ 1 MiB, 81920 MiB
32
+ 1 MiB, 81920 MiB
33
+ [2025-05-11 09:21:38] Disk space usage for model outputs:
34
+ 27G train_results/google_flan-t5-large_full_upsample3000
35
+ [2025-05-11 09:21:38]
36
+ [2025-05-11 09:21:38] All training runs completed at Sun May 11 09:21:38 UTC 2025
37
+ [2025-05-11 09:21:38] =======================================
38
+ [2025-05-11 09:21:38] Summary of training runs:
39
+ [2025-05-11 09:21:38] Model | Status | Duration | Output Size
20250511_092208.log ADDED
@@ -0,0 +1,163 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0
  0%| | 0/576094 [00:00<?, ?it/s]/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3980: UserWarning: `as_target_tokenizer` is deprecated and will be removed in v5 of Transformers. You can tokenize your labels by using the argument `text_target` of the regular `__call__` method (either in the same call as your input texts if you use the same keyword arguments, or in a separate call.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [2025-05-11 09:22:08] Created output directory: train_results/google_flan-t5-large_full_upsample3000
2
+ [2025-05-11 09:22:08] Chat mode disabled
3
+ [2025-05-11 09:22:08] Model size is 3B or smaller (0 B). Using full fine-tuning.
4
+ [2025-05-11 09:22:08] Adjusted parameters for t5 model:
5
+ [2025-05-11 09:22:08] - LEARNING_RATE: 1e-4
6
+ [2025-05-11 09:22:08] - BATCH_SIZE: 32
7
+ [2025-05-11 09:22:08] - GRADIENT_ACCUMULATION_STEPS: 1
8
+ [2025-05-11 09:22:08] No QA format data will be used
9
+ [2025-05-11 09:22:08] =======================================
10
+ [2025-05-11 09:22:08] Starting training for model: google/flan-t5-large
11
+ [2025-05-11 09:22:08] =======================================
12
+ [2025-05-11 09:22:08] CUDA_VISIBLE_DEVICES: 0,1
13
+ [2025-05-11 09:22:08] WANDB_PROJECT: wikidyk-ar
14
+ [2025-05-11 09:22:08] DATA_PATH: data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json
15
+ [2025-05-11 09:22:08] Global Batch Size: 64
16
+ [2025-05-11 09:22:08] Data Size: -1
17
+ [2025-05-11 09:22:08] Executing command: torchrun --nproc_per_node "2" --master-port 29502 src/train.py --model_name_or_path "google/flan-t5-large" --data_path "data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json" --output_dir "train_results/google_flan-t5-large_full_upsample3000" --num_upsample "3000" --per_device_train_batch_size "32" --gradient_accumulation_steps "1" --learning_rate "1e-4" --num_train_epochs "1" --model_max_length "32768" --report_to wandb --logging_steps 50 --save_steps 10000 --save_total_limit 3 --resume_from_checkpoint True --bf16 True --use_flash_attention_2 True --qa_data_ratio "-1" --predict_mask "false"
18
+ [2025-05-11 09:22:08] Training started at Sun May 11 09:22:08 UTC 2025
19
+ W0511 09:22:09.428000 255573 site-packages/torch/distributed/run.py:792]
20
+ W0511 09:22:09.428000 255573 site-packages/torch/distributed/run.py:792] *****************************************
21
+ W0511 09:22:09.428000 255573 site-packages/torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
22
+ W0511 09:22:09.428000 255573 site-packages/torch/distributed/run.py:792] *****************************************
23
+ WARNING:root:Output directory: train_results/google_flan-t5-large_full_upsample3000
24
+ WARNING:root:Output directory: train_results/google_flan-t5-large_full_upsample3000
25
+ You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
26
+ You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
27
+ WARNING:root:Loading data...
28
+ WARNING:root:Loading data...
29
+ WARNING:root:Dataset initialized with all QA data:
30
+ WARNING:root: - 0 QA examples
31
+ WARNING:root: - 12290 fact examples with upsampling factor 3000
32
+ WARNING:root: - Total examples: 36870000
33
+ WARNING:root:Dataset initialized with all QA data:
34
+ WARNING:root: - 0 QA examples
35
+ WARNING:root: - 12290 fact examples with upsampling factor 3000
36
+ WARNING:root: - Total examples: 36870000
37
+ /root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
38
+ trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
39
+ /root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
40
+ trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
41
+ You are resuming training from a checkpoint trained with 4.51.1 of Transformers but your current version is 4.51.3. This is not recommended and could yield to errors or unwanted behaviors.
42
+ You are resuming training from a checkpoint trained with 4.51.1 of Transformers but your current version is 4.51.3. This is not recommended and could yield to errors or unwanted behaviors.
43
+ There were missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'].
44
+ There were missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'].
45
+ wandb: WARNING The `run_name` is currently set to the same value as `TrainingArguments.output_dir`. If this was not intended, please specify a different run name by setting the `TrainingArguments.run_name` parameter.
46
+ wandb: Currently logged in as: yuweiz to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
47
+ wandb: Tracking run with wandb version 0.19.11
48
+ wandb: Run data is saved locally in /root/yuwei/WikiDYKEvalV2/wandb/run-20250511_092259-8c7r30qb
49
+ wandb: Run `wandb offline` to turn off syncing.
50
+ wandb: Syncing run train_results/google_flan-t5-large_full_upsample3000
51
+ wandb: ⭐️ View project at https://wandb.ai/yuweiz/wikidyk-ar
52
+ wandb: 🚀 View run at https://wandb.ai/yuweiz/wikidyk-ar/runs/8c7r30qb
53
+
54
  0%| | 0/576094 [00:00<?, ?it/s]/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3980: UserWarning: `as_target_tokenizer` is deprecated and will be removed in v5 of Transformers. You can tokenize your labels by using the argument `text_target` of the regular `__call__` method (either in the same call as your input texts if you use the same keyword arguments, or in a separate call.
55
+ warnings.warn(
56
+ [rank1]: Traceback (most recent call last):
57
+ [rank1]: File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 134, in <module>
58
+ [rank1]: train()
59
+ [rank1]: File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 122, in train
60
+ [rank1]: trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
61
+ [rank1]: File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 2245, in train
62
+ [rank1]: return inner_training_loop(
63
+ [rank1]: File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 2534, in _inner_training_loop
64
+ [rank1]: self._load_rng_state(resume_from_checkpoint)
65
+ [rank1]: File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 3130, in _load_rng_state
66
+ [rank1]: checkpoint_rng_state = torch.load(rng_file, weights_only=True)
67
+ [rank1]: File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/serialization.py", line 1470, in load
68
+ [rank1]: raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
69
+ [rank1]: _pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint.
70
+ [rank1]: (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
71
+ [rank1]: (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
72
+ [rank1]: WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray._reconstruct was not an allowed global by default. Please use `torch.serialization.add_safe_globals([_reconstruct])` or the `torch.serialization.safe_globals([_reconstruct])` context manager to allowlist this global if you trust this class/function.
73
+
74
+ [rank1]: Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
75
+ /root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3980: UserWarning: `as_target_tokenizer` is deprecated and will be removed in v5 of Transformers. You can tokenize your labels by using the argument `text_target` of the regular `__call__` method (either in the same call as your input texts if you use the same keyword arguments, or in a separate call.
76
+ warnings.warn(
77
+ Traceback (most recent call last):
78
+ File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 134, in <module>
79
+ train()
80
+ File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 122, in train
81
+ trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
82
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 2245, in train
83
+ return inner_training_loop(
84
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 2534, in _inner_training_loop
85
+ self._load_rng_state(resume_from_checkpoint)
86
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 3130, in _load_rng_state
87
+ checkpoint_rng_state = torch.load(rng_file, weights_only=True)
88
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/serialization.py", line 1470, in load
89
+ raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
90
+ _pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint.
91
+ (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
92
+ (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
93
+ WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray._reconstruct was not an allowed global by default. Please use `torch.serialization.add_safe_globals([_reconstruct])` or the `torch.serialization.safe_globals([_reconstruct])` context manager to allowlist this global if you trust this class/function.
94
+
95
+ Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
96
+ [rank0]: Traceback (most recent call last):
97
+ [rank0]: File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 134, in <module>
98
+ [rank0]: train()
99
+ [rank0]: File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 122, in train
100
+ [rank0]: trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
101
+ [rank0]: File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 2245, in train
102
+ [rank0]: return inner_training_loop(
103
+ [rank0]: File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 2534, in _inner_training_loop
104
+ [rank0]: self._load_rng_state(resume_from_checkpoint)
105
+ [rank0]: File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 3130, in _load_rng_state
106
+ [rank0]: checkpoint_rng_state = torch.load(rng_file, weights_only=True)
107
+ [rank0]: File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/serialization.py", line 1470, in load
108
+ [rank0]: raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
109
+ [rank0]: _pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint.
110
+ [rank0]: (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
111
+ [rank0]: (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
112
+ [rank0]: WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray._reconstruct was not an allowed global by default. Please use `torch.serialization.add_safe_globals([_reconstruct])` or the `torch.serialization.safe_globals([_reconstruct])` context manager to allowlist this global if you trust this class/function.
113
+
114
+ [rank0]: Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
115
+ wandb:
116
+ wandb: 🚀 View run train_results/google_flan-t5-large_full_upsample3000 at: https://wandb.ai/yuweiz/wikidyk-ar/runs/8c7r30qb
117
+ wandb: Find logs at: wandb/run-20250511_092259-8c7r30qb/logs
118
+ W0511 09:23:12.826000 255573 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 255641 closing signal SIGTERM
119
+ E0511 09:23:13.341000 255573 site-packages/torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 1 (pid: 255642) of binary: /root/miniconda3/envs/wikidyk/bin/python
120
+ Traceback (most recent call last):
121
+ File "/root/miniconda3/envs/wikidyk/bin/torchrun", line 8, in <module>
122
+ sys.exit(main())
123
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
124
+ return f(*args, **kwargs)
125
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/run.py", line 918, in main
126
+ run(args)
127
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/run.py", line 909, in run
128
+ elastic_launch(
129
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 138, in __call__
130
+ return launch_agent(self._config, self._entrypoint, list(args))
131
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
132
+ raise ChildFailedError(
133
+ torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
134
+ ============================================================
135
+ src/train.py FAILED
136
+ ------------------------------------------------------------
137
+ Failures:
138
+ <NO_OTHER_FAILURES>
139
+ ------------------------------------------------------------
140
+ Root Cause (first observed failure):
141
+ [0]:
142
+ time : 2025-05-11_09:23:12
143
+ host : bb9aa167977b
144
+ rank : 1 (local_rank: 1)
145
+ exitcode : 1 (pid: 255642)
146
+ error_file: <N/A>
147
+ traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
148
+ ============================================================
149
+ [2025-05-11 09:23:13] ERROR: Training failed for google/flan-t5-large with exit code 1
150
+ [2025-05-11 09:23:13] ERROR: Training failed for google/flan-t5-large with exit code 1
151
+ [2025-05-11 09:23:13] Check error log for details: train_results/google_flan-t5-large_full_upsample3000/20250511_092208.log
152
+ [2025-05-11 09:23:13] Resource usage after training google/flan-t5-large:
153
+ [2025-05-11 09:23:13] GPU memory usage:
154
+ 1 MiB, 81920 MiB
155
+ 1 MiB, 81920 MiB
156
+ 1 MiB, 81920 MiB
157
+ 1 MiB, 81920 MiB
158
+ [2025-05-11 09:23:14] Disk space usage for model outputs:
159
+ 27G train_results/google_flan-t5-large_full_upsample3000
160
+ [2025-05-11 09:23:14]
161
+ [2025-05-11 09:23:14] All training runs completed at Sun May 11 09:23:14 UTC 2025
162
+ [2025-05-11 09:23:14] =======================================
163
+ [2025-05-11 09:23:14] Summary of training runs:
164
+ [2025-05-11 09:23:14] Model | Status | Duration | Output Size
20250511_101930.log ADDED
@@ -0,0 +1,163 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0
  0%| | 0/576094 [00:00<?, ?it/s]/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3980: UserWarning: `as_target_tokenizer` is deprecated and will be removed in v5 of Transformers. You can tokenize your labels by using the argument `text_target` of the regular `__call__` method (either in the same call as your input texts if you use the same keyword arguments, or in a separate call.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [2025-05-11 10:19:30] Created output directory: train_results/google_flan-t5-large_full_upsample3000
2
+ [2025-05-11 10:19:30] Chat mode disabled
3
+ [2025-05-11 10:19:30] Model size is 3B or smaller (0 B). Using full fine-tuning.
4
+ [2025-05-11 10:19:30] Adjusted parameters for t5 model:
5
+ [2025-05-11 10:19:30] - LEARNING_RATE: 1e-4
6
+ [2025-05-11 10:19:30] - BATCH_SIZE: 32
7
+ [2025-05-11 10:19:30] - GRADIENT_ACCUMULATION_STEPS: 1
8
+ [2025-05-11 10:19:30] No QA format data will be used
9
+ [2025-05-11 10:19:30] =======================================
10
+ [2025-05-11 10:19:30] Starting training for model: google/flan-t5-large
11
+ [2025-05-11 10:19:30] =======================================
12
+ [2025-05-11 10:19:30] CUDA_VISIBLE_DEVICES: 0,1
13
+ [2025-05-11 10:19:30] WANDB_PROJECT: wikidyk-ar
14
+ [2025-05-11 10:19:30] DATA_PATH: data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json
15
+ [2025-05-11 10:19:30] Global Batch Size: 64
16
+ [2025-05-11 10:19:30] Data Size: -1
17
+ [2025-05-11 10:19:30] Executing command: torchrun --nproc_per_node "2" --master-port 29502 src/train.py --model_name_or_path "google/flan-t5-large" --data_path "data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json" --output_dir "train_results/google_flan-t5-large_full_upsample3000" --num_upsample "3000" --per_device_train_batch_size "32" --gradient_accumulation_steps "1" --learning_rate "1e-4" --num_train_epochs "1" --model_max_length "32768" --report_to wandb --logging_steps 50 --save_steps 10000 --save_total_limit 3 --resume_from_checkpoint True --bf16 True --use_flash_attention_2 True --qa_data_ratio "-1" --predict_mask "false"
18
+ [2025-05-11 10:19:30] Training started at Sun May 11 10:19:30 UTC 2025
19
+ W0511 10:19:31.717000 266021 site-packages/torch/distributed/run.py:792]
20
+ W0511 10:19:31.717000 266021 site-packages/torch/distributed/run.py:792] *****************************************
21
+ W0511 10:19:31.717000 266021 site-packages/torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
22
+ W0511 10:19:31.717000 266021 site-packages/torch/distributed/run.py:792] *****************************************
23
+ WARNING:root:Output directory: train_results/google_flan-t5-large_full_upsample3000
24
+ WARNING:root:Output directory: train_results/google_flan-t5-large_full_upsample3000
25
+ You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
26
+ WARNING:root:Loading data...
27
+ You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
28
+ WARNING:root:Loading data...
29
+ WARNING:root:Dataset initialized with all QA data:
30
+ WARNING:root: - 0 QA examples
31
+ WARNING:root: - 12290 fact examples with upsampling factor 3000
32
+ WARNING:root: - Total examples: 36870000
33
+ WARNING:root:Dataset initialized with all QA data:
34
+ WARNING:root: - 0 QA examples
35
+ WARNING:root: - 12290 fact examples with upsampling factor 3000
36
+ WARNING:root: - Total examples: 36870000
37
+ /root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
38
+ trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
39
+ /root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
40
+ trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
41
+ You are resuming training from a checkpoint trained with 4.51.1 of Transformers but your current version is 4.51.3. This is not recommended and could yield to errors or unwanted behaviors.
42
+ There were missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'].
43
+ You are resuming training from a checkpoint trained with 4.51.1 of Transformers but your current version is 4.51.3. This is not recommended and could yield to errors or unwanted behaviors.
44
+ There were missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'].
45
+ wandb: WARNING The `run_name` is currently set to the same value as `TrainingArguments.output_dir`. If this was not intended, please specify a different run name by setting the `TrainingArguments.run_name` parameter.
46
+ wandb: Currently logged in as: yuweiz to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
47
+ wandb: Tracking run with wandb version 0.19.11
48
+ wandb: Run data is saved locally in /root/yuwei/WikiDYKEvalV2/wandb/run-20250511_102023-3mrocyhv
49
+ wandb: Run `wandb offline` to turn off syncing.
50
+ wandb: Syncing run train_results/google_flan-t5-large_full_upsample3000
51
+ wandb: ⭐️ View project at https://wandb.ai/yuweiz/wikidyk-ar
52
+ wandb: 🚀 View run at https://wandb.ai/yuweiz/wikidyk-ar/runs/3mrocyhv
53
+
54
  0%| | 0/576094 [00:00<?, ?it/s]/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3980: UserWarning: `as_target_tokenizer` is deprecated and will be removed in v5 of Transformers. You can tokenize your labels by using the argument `text_target` of the regular `__call__` method (either in the same call as your input texts if you use the same keyword arguments, or in a separate call.
55
+ warnings.warn(
56
+ [rank1]: Traceback (most recent call last):
57
+ [rank1]: File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 134, in <module>
58
+ [rank1]: train()
59
+ [rank1]: File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 122, in train
60
+ [rank1]: trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
61
+ [rank1]: File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 2245, in train
62
+ [rank1]: return inner_training_loop(
63
+ [rank1]: File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 2534, in _inner_training_loop
64
+ [rank1]: self._load_rng_state(resume_from_checkpoint)
65
+ [rank1]: File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 3130, in _load_rng_state
66
+ [rank1]: checkpoint_rng_state = torch.load(rng_file, weights_only=True)
67
+ [rank1]: File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/serialization.py", line 1470, in load
68
+ [rank1]: raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
69
+ [rank1]: _pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint.
70
+ [rank1]: (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
71
+ [rank1]: (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
72
+ [rank1]: WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray._reconstruct was not an allowed global by default. Please use `torch.serialization.add_safe_globals([_reconstruct])` or the `torch.serialization.safe_globals([_reconstruct])` context manager to allowlist this global if you trust this class/function.
73
+
74
+ [rank1]: Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
75
+ /root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3980: UserWarning: `as_target_tokenizer` is deprecated and will be removed in v5 of Transformers. You can tokenize your labels by using the argument `text_target` of the regular `__call__` method (either in the same call as your input texts if you use the same keyword arguments, or in a separate call.
76
+ warnings.warn(
77
+ Traceback (most recent call last):
78
+ File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 134, in <module>
79
+ train()
80
+ File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 122, in train
81
+ trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
82
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 2245, in train
83
+ return inner_training_loop(
84
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 2534, in _inner_training_loop
85
+ self._load_rng_state(resume_from_checkpoint)
86
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 3130, in _load_rng_state
87
+ checkpoint_rng_state = torch.load(rng_file, weights_only=True)
88
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/serialization.py", line 1470, in load
89
+ raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
90
+ _pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint.
91
+ (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
92
+ (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
93
+ WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray._reconstruct was not an allowed global by default. Please use `torch.serialization.add_safe_globals([_reconstruct])` or the `torch.serialization.safe_globals([_reconstruct])` context manager to allowlist this global if you trust this class/function.
94
+
95
+ Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
96
+ [rank0]: Traceback (most recent call last):
97
+ [rank0]: File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 134, in <module>
98
+ [rank0]: train()
99
+ [rank0]: File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 122, in train
100
+ [rank0]: trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
101
+ [rank0]: File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 2245, in train
102
+ [rank0]: return inner_training_loop(
103
+ [rank0]: File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 2534, in _inner_training_loop
104
+ [rank0]: self._load_rng_state(resume_from_checkpoint)
105
+ [rank0]: File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 3130, in _load_rng_state
106
+ [rank0]: checkpoint_rng_state = torch.load(rng_file, weights_only=True)
107
+ [rank0]: File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/serialization.py", line 1470, in load
108
+ [rank0]: raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
109
+ [rank0]: _pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint.
110
+ [rank0]: (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
111
+ [rank0]: (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
112
+ [rank0]: WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray._reconstruct was not an allowed global by default. Please use `torch.serialization.add_safe_globals([_reconstruct])` or the `torch.serialization.safe_globals([_reconstruct])` context manager to allowlist this global if you trust this class/function.
113
+
114
+ [rank0]: Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
115
+ wandb:
116
+ wandb: 🚀 View run train_results/google_flan-t5-large_full_upsample3000 at: https://wandb.ai/yuweiz/wikidyk-ar/runs/3mrocyhv
117
+ wandb: Find logs at: wandb/run-20250511_102023-3mrocyhv/logs
118
+ W0511 10:20:35.125000 266021 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 266087 closing signal SIGTERM
119
+ E0511 10:20:36.241000 266021 site-packages/torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 0 (pid: 266086) of binary: /root/miniconda3/envs/wikidyk/bin/python
120
+ Traceback (most recent call last):
121
+ File "/root/miniconda3/envs/wikidyk/bin/torchrun", line 8, in <module>
122
+ sys.exit(main())
123
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
124
+ return f(*args, **kwargs)
125
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/run.py", line 918, in main
126
+ run(args)
127
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/run.py", line 909, in run
128
+ elastic_launch(
129
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 138, in __call__
130
+ return launch_agent(self._config, self._entrypoint, list(args))
131
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
132
+ raise ChildFailedError(
133
+ torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
134
+ ============================================================
135
+ src/train.py FAILED
136
+ ------------------------------------------------------------
137
+ Failures:
138
+ <NO_OTHER_FAILURES>
139
+ ------------------------------------------------------------
140
+ Root Cause (first observed failure):
141
+ [0]:
142
+ time : 2025-05-11_10:20:35
143
+ host : bb9aa167977b
144
+ rank : 0 (local_rank: 0)
145
+ exitcode : 1 (pid: 266086)
146
+ error_file: <N/A>
147
+ traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
148
+ ============================================================
149
+ [2025-05-11 10:20:36] ERROR: Training failed for google/flan-t5-large with exit code 1
150
+ [2025-05-11 10:20:36] ERROR: Training failed for google/flan-t5-large with exit code 1
151
+ [2025-05-11 10:20:36] Check error log for details: train_results/google_flan-t5-large_full_upsample3000/20250511_101930.log
152
+ [2025-05-11 10:20:36] Resource usage after training google/flan-t5-large:
153
+ [2025-05-11 10:20:36] GPU memory usage:
154
+ 1 MiB, 81920 MiB
155
+ 1 MiB, 81920 MiB
156
+ 38923 MiB, 81920 MiB
157
+ 39333 MiB, 81920 MiB
158
+ [2025-05-11 10:20:36] Disk space usage for model outputs:
159
+ 18G train_results/google_flan-t5-large_full_upsample3000
160
+ [2025-05-11 10:20:36]
161
+ [2025-05-11 10:20:36] All training runs completed at Sun May 11 10:20:36 UTC 2025
162
+ [2025-05-11 10:20:36] =======================================
163
+ [2025-05-11 10:20:36] Summary of training runs:
164
+ [2025-05-11 10:20:36] Model | Status | Duration | Output Size
20250511_102227.log ADDED
@@ -0,0 +1,163 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0
  0%| | 0/288047 [00:00<?, ?it/s]/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3980: UserWarning: `as_target_tokenizer` is deprecated and will be removed in v5 of Transformers. You can tokenize your labels by using the argument `text_target` of the regular `__call__` method (either in the same call as your input texts if you use the same keyword arguments, or in a separate call.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [2025-05-11 10:22:27] Created output directory: train_results/google_flan-t5-large_full_upsample3000
2
+ [2025-05-11 10:22:27] Chat mode disabled
3
+ [2025-05-11 10:22:27] Model size is 3B or smaller (0 B). Using full fine-tuning.
4
+ [2025-05-11 10:22:27] Adjusted parameters for t5 model:
5
+ [2025-05-11 10:22:27] - LEARNING_RATE: 1e-4
6
+ [2025-05-11 10:22:27] - BATCH_SIZE: 32
7
+ [2025-05-11 10:22:27] - GRADIENT_ACCUMULATION_STEPS: 2
8
+ [2025-05-11 10:22:27] No QA format data will be used
9
+ [2025-05-11 10:22:27] =======================================
10
+ [2025-05-11 10:22:27] Starting training for model: google/flan-t5-large
11
+ [2025-05-11 10:22:27] =======================================
12
+ [2025-05-11 10:22:27] CUDA_VISIBLE_DEVICES: 0,1
13
+ [2025-05-11 10:22:27] WANDB_PROJECT: wikidyk-ar
14
+ [2025-05-11 10:22:27] DATA_PATH: data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json
15
+ [2025-05-11 10:22:27] Global Batch Size: 128
16
+ [2025-05-11 10:22:27] Data Size: -1
17
+ [2025-05-11 10:22:27] Executing command: torchrun --nproc_per_node "2" --master-port 29502 src/train.py --model_name_or_path "google/flan-t5-large" --data_path "data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json" --output_dir "train_results/google_flan-t5-large_full_upsample3000" --num_upsample "3000" --per_device_train_batch_size "32" --gradient_accumulation_steps "2" --learning_rate "1e-4" --num_train_epochs "1" --model_max_length "32768" --report_to wandb --logging_steps 50 --save_steps 10000 --save_total_limit 3 --resume_from_checkpoint True --bf16 True --use_flash_attention_2 True --qa_data_ratio "-1" --predict_mask "false"
18
+ [2025-05-11 10:22:27] Training started at Sun May 11 10:22:27 UTC 2025
19
+ W0511 10:22:28.415000 266988 site-packages/torch/distributed/run.py:792]
20
+ W0511 10:22:28.415000 266988 site-packages/torch/distributed/run.py:792] *****************************************
21
+ W0511 10:22:28.415000 266988 site-packages/torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
22
+ W0511 10:22:28.415000 266988 site-packages/torch/distributed/run.py:792] *****************************************
23
+ WARNING:root:Output directory: train_results/google_flan-t5-large_full_upsample3000
24
+ WARNING:root:Output directory: train_results/google_flan-t5-large_full_upsample3000
25
+ You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
26
+ WARNING:root:Loading data...
27
+ You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
28
+ WARNING:root:Loading data...
29
+ WARNING:root:Dataset initialized with all QA data:
30
+ WARNING:root: - 0 QA examples
31
+ WARNING:root: - 12290 fact examples with upsampling factor 3000
32
+ WARNING:root: - Total examples: 36870000
33
+ WARNING:root:Dataset initialized with all QA data:
34
+ WARNING:root: - 0 QA examples
35
+ WARNING:root: - 12290 fact examples with upsampling factor 3000
36
+ WARNING:root: - Total examples: 36870000
37
+ /root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
38
+ trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
39
+ You are resuming training from a checkpoint trained with 4.51.1 of Transformers but your current version is 4.51.3. This is not recommended and could yield to errors or unwanted behaviors.
40
+ There were missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'].
41
+ /root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
42
+ trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
43
+ You are resuming training from a checkpoint trained with 4.51.1 of Transformers but your current version is 4.51.3. This is not recommended and could yield to errors or unwanted behaviors.
44
+ There were missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'].
45
+ wandb: WARNING The `run_name` is currently set to the same value as `TrainingArguments.output_dir`. If this was not intended, please specify a different run name by setting the `TrainingArguments.run_name` parameter.
46
+ wandb: Currently logged in as: yuweiz to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
47
+ wandb: Tracking run with wandb version 0.19.11
48
+ wandb: Run data is saved locally in /root/yuwei/WikiDYKEvalV2/wandb/run-20250511_102316-ofl0xql6
49
+ wandb: Run `wandb offline` to turn off syncing.
50
+ wandb: Syncing run train_results/google_flan-t5-large_full_upsample3000
51
+ wandb: ⭐️ View project at https://wandb.ai/yuweiz/wikidyk-ar
52
+ wandb: 🚀 View run at https://wandb.ai/yuweiz/wikidyk-ar/runs/ofl0xql6
53
+
54
  0%| | 0/288047 [00:00<?, ?it/s]/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3980: UserWarning: `as_target_tokenizer` is deprecated and will be removed in v5 of Transformers. You can tokenize your labels by using the argument `text_target` of the regular `__call__` method (either in the same call as your input texts if you use the same keyword arguments, or in a separate call.
55
+ warnings.warn(
56
+ [rank1]: Traceback (most recent call last):
57
+ [rank1]: File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 134, in <module>
58
+ [rank1]: train()
59
+ [rank1]: File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 122, in train
60
+ [rank1]: trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
61
+ [rank1]: File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 2245, in train
62
+ [rank1]: return inner_training_loop(
63
+ [rank1]: File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 2534, in _inner_training_loop
64
+ [rank1]: self._load_rng_state(resume_from_checkpoint)
65
+ [rank1]: File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 3130, in _load_rng_state
66
+ [rank1]: checkpoint_rng_state = torch.load(rng_file, weights_only=True)
67
+ [rank1]: File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/serialization.py", line 1470, in load
68
+ [rank1]: raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
69
+ [rank1]: _pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint.
70
+ [rank1]: (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
71
+ [rank1]: (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
72
+ [rank1]: WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray._reconstruct was not an allowed global by default. Please use `torch.serialization.add_safe_globals([_reconstruct])` or the `torch.serialization.safe_globals([_reconstruct])` context manager to allowlist this global if you trust this class/function.
73
+
74
+ [rank1]: Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
75
+ /root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3980: UserWarning: `as_target_tokenizer` is deprecated and will be removed in v5 of Transformers. You can tokenize your labels by using the argument `text_target` of the regular `__call__` method (either in the same call as your input texts if you use the same keyword arguments, or in a separate call.
76
+ warnings.warn(
77
+ Traceback (most recent call last):
78
+ File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 134, in <module>
79
+ train()
80
+ File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 122, in train
81
+ trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
82
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 2245, in train
83
+ return inner_training_loop(
84
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 2534, in _inner_training_loop
85
+ self._load_rng_state(resume_from_checkpoint)
86
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 3130, in _load_rng_state
87
+ checkpoint_rng_state = torch.load(rng_file, weights_only=True)
88
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/serialization.py", line 1470, in load
89
+ raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
90
+ _pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint.
91
+ (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
92
+ (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
93
+ WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray._reconstruct was not an allowed global by default. Please use `torch.serialization.add_safe_globals([_reconstruct])` or the `torch.serialization.safe_globals([_reconstruct])` context manager to allowlist this global if you trust this class/function.
94
+
95
+ Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
96
+ [rank0]: Traceback (most recent call last):
97
+ [rank0]: File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 134, in <module>
98
+ [rank0]: train()
99
+ [rank0]: File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 122, in train
100
+ [rank0]: trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
101
+ [rank0]: File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 2245, in train
102
+ [rank0]: return inner_training_loop(
103
+ [rank0]: File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 2534, in _inner_training_loop
104
+ [rank0]: self._load_rng_state(resume_from_checkpoint)
105
+ [rank0]: File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 3130, in _load_rng_state
106
+ [rank0]: checkpoint_rng_state = torch.load(rng_file, weights_only=True)
107
+ [rank0]: File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/serialization.py", line 1470, in load
108
+ [rank0]: raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
109
+ [rank0]: _pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint.
110
+ [rank0]: (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
111
+ [rank0]: (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
112
+ [rank0]: WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray._reconstruct was not an allowed global by default. Please use `torch.serialization.add_safe_globals([_reconstruct])` or the `torch.serialization.safe_globals([_reconstruct])` context manager to allowlist this global if you trust this class/function.
113
+
114
+ [rank0]: Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
115
+ wandb:
116
+ wandb: 🚀 View run train_results/google_flan-t5-large_full_upsample3000 at: https://wandb.ai/yuweiz/wikidyk-ar/runs/ofl0xql6
117
+ wandb: Find logs at: wandb/run-20250511_102316-ofl0xql6/logs
118
+ W0511 10:23:29.022000 266988 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 267057 closing signal SIGTERM
119
+ E0511 10:23:30.388000 266988 site-packages/torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 0 (pid: 267056) of binary: /root/miniconda3/envs/wikidyk/bin/python
120
+ Traceback (most recent call last):
121
+ File "/root/miniconda3/envs/wikidyk/bin/torchrun", line 8, in <module>
122
+ sys.exit(main())
123
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
124
+ return f(*args, **kwargs)
125
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/run.py", line 918, in main
126
+ run(args)
127
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/run.py", line 909, in run
128
+ elastic_launch(
129
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 138, in __call__
130
+ return launch_agent(self._config, self._entrypoint, list(args))
131
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
132
+ raise ChildFailedError(
133
+ torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
134
+ ============================================================
135
+ src/train.py FAILED
136
+ ------------------------------------------------------------
137
+ Failures:
138
+ <NO_OTHER_FAILURES>
139
+ ------------------------------------------------------------
140
+ Root Cause (first observed failure):
141
+ [0]:
142
+ time : 2025-05-11_10:23:29
143
+ host : bb9aa167977b
144
+ rank : 0 (local_rank: 0)
145
+ exitcode : 1 (pid: 267056)
146
+ error_file: <N/A>
147
+ traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
148
+ ============================================================
149
+ [2025-05-11 10:23:30] ERROR: Training failed for google/flan-t5-large with exit code 1
150
+ [2025-05-11 10:23:30] ERROR: Training failed for google/flan-t5-large with exit code 1
151
+ [2025-05-11 10:23:30] Check error log for details: train_results/google_flan-t5-large_full_upsample3000/20250511_102227.log
152
+ [2025-05-11 10:23:30] Resource usage after training google/flan-t5-large:
153
+ [2025-05-11 10:23:30] GPU memory usage:
154
+ 1 MiB, 81920 MiB
155
+ 1 MiB, 81920 MiB
156
+ 38923 MiB, 81920 MiB
157
+ 39333 MiB, 81920 MiB
158
+ [2025-05-11 10:23:30] Disk space usage for model outputs:
159
+ 18G train_results/google_flan-t5-large_full_upsample3000
160
+ [2025-05-11 10:23:30]
161
+ [2025-05-11 10:23:30] All training runs completed at Sun May 11 10:23:30 UTC 2025
162
+ [2025-05-11 10:23:30] =======================================
163
+ [2025-05-11 10:23:30] Summary of training runs:
164
+ [2025-05-11 10:23:30] Model | Status | Duration | Output Size
20250511_110511.log ADDED
@@ -0,0 +1,237 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [2025-05-11 11:05:11] Created output directory: train_results/google_flan-t5-large_full_upsample3000
2
+ [2025-05-11 11:05:11] Chat mode disabled
3
+ [2025-05-11 11:05:11] Model size is 3B or smaller (0 B). Using full fine-tuning.
4
+ [2025-05-11 11:05:11] Adjusted parameters for t5 model:
5
+ [2025-05-11 11:05:11] - LEARNING_RATE: 1e-4
6
+ [2025-05-11 11:05:11] - BATCH_SIZE: 32
7
+ [2025-05-11 11:05:11] - GRADIENT_ACCUMULATION_STEPS: 2
8
+ [2025-05-11 11:05:11] No QA format data will be used
9
+ [2025-05-11 11:05:11] =======================================
10
+ [2025-05-11 11:05:11] Starting training for model: google/flan-t5-large
11
+ [2025-05-11 11:05:11] =======================================
12
+ [2025-05-11 11:05:11] CUDA_VISIBLE_DEVICES: 0,1
13
+ [2025-05-11 11:05:11] WANDB_PROJECT: wikidyk-ar
14
+ [2025-05-11 11:05:11] DATA_PATH: data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json
15
+ [2025-05-11 11:05:11] Global Batch Size: 128
16
+ [2025-05-11 11:05:11] Data Size: -1
17
+ [2025-05-11 11:05:11] Executing command: torchrun --nproc_per_node "2" --master-port 29502 src/train.py --model_name_or_path "google/flan-t5-large" --data_path "data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json" --output_dir "train_results/google_flan-t5-large_full_upsample3000" --num_upsample "3000" --per_device_train_batch_size "32" --gradient_accumulation_steps "2" --learning_rate "1e-4" --num_train_epochs "1" --model_max_length "32768" --report_to wandb --logging_steps 50 --save_steps 10000 --save_total_limit 3 --resume_from_checkpoint True --bf16 True --use_flash_attention_2 True --qa_data_ratio "-1" --predict_mask "false"
18
+ [2025-05-11 11:05:11] Training started at Sun May 11 11:05:11 UTC 2025
19
+ W0511 11:05:13.027000 275137 site-packages/torch/distributed/run.py:793]
20
+ W0511 11:05:13.027000 275137 site-packages/torch/distributed/run.py:793] *****************************************
21
+ W0511 11:05:13.027000 275137 site-packages/torch/distributed/run.py:793] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
22
+ W0511 11:05:13.027000 275137 site-packages/torch/distributed/run.py:793] *****************************************
23
+ Traceback (most recent call last):
24
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1967, in _get_module
25
+ return importlib.import_module("." + module_name, self.__name__)
26
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/importlib/__init__.py", line 126, in import_module
27
+ return _bootstrap._gcd_import(name[level:], package, level)
28
+ File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
29
+ File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
30
+ File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
31
+ File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
32
+ File "<frozen importlib._bootstrap_external>", line 883, in exec_module
33
+ File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
34
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/modeling_utils.py", line 62, in <module>
35
+ from .integrations.flash_attention import flash_attention_forward
36
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/integrations/flash_attention.py", line 5, in <module>
37
+ from ..modeling_flash_attention_utils import _flash_attention_forward, flash_attn_supports_top_left_mask
38
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/modeling_flash_attention_utils.py", line 36, in <module>
39
+ from flash_attn.bert_padding import index_first_axis, pad_input, unpad_input # noqa
40
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/flash_attn/__init__.py", line 3, in <module>
41
+ from flash_attn.flash_attn_interface import (
42
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 15, in <module>
43
+ import flash_attn_2_cuda as flash_attn_gpu
44
+ ImportError: /root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNK3c1011StorageImpl27throw_data_ptr_access_errorEv
45
+
46
+ The above exception was the direct cause of the following exception:
47
+
48
+ Traceback (most recent call last):
49
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1967, in _get_module
50
+ return importlib.import_module("." + module_name, self.__name__)
51
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/importlib/__init__.py", line 126, in import_module
52
+ return _bootstrap._gcd_import(name[level:], package, level)
53
+ File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
54
+ File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
55
+ File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
56
+ File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
57
+ File "<frozen importlib._bootstrap_external>", line 883, in exec_module
58
+ File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
59
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/integrations/integration_utils.py", line 36, in <module>
60
+ from .. import PreTrainedModel, TFPreTrainedModel
61
+ File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
62
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1955, in __getattr__
63
+ module = self._get_module(self._class_to_module[name])
64
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1969, in _get_module
65
+ raise RuntimeError(
66
+ RuntimeError: Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
67
+ /root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNK3c1011StorageImpl27throw_data_ptr_access_errorEv
68
+
69
+ The above exception was the direct cause of the following exception:
70
+
71
+ Traceback (most recent call last):
72
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1967, in _get_module
73
+ return importlib.import_module("." + module_name, self.__name__)
74
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/importlib/__init__.py", line 126, in import_module
75
+ return _bootstrap._gcd_import(name[level:], package, level)
76
+ File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
77
+ File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
78
+ File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
79
+ File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
80
+ File "<frozen importlib._bootstrap_external>", line 883, in exec_module
81
+ File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
82
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 41, in <module>
83
+ from .integrations import (
84
+ File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
85
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1955, in __getattr__
86
+ module = self._get_module(self._class_to_module[name])
87
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1969, in _get_module
88
+ raise RuntimeError(
89
+ RuntimeError: Failed to import transformers.integrations.integration_utils because of the following error (look up to see its traceback):
90
+ Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
91
+ /root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNK3c1011StorageImpl27throw_data_ptr_access_errorEv
92
+
93
+ The above exception was the direct cause of the following exception:
94
+
95
+ Traceback (most recent call last):
96
+ File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 5, in <module>
97
+ from transformers import Trainer, TrainingArguments, PreTrainedTokenizer, HfArgumentParser, AutoTokenizer
98
+ File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
99
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1955, in __getattr__
100
+ module = self._get_module(self._class_to_module[name])
101
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1969, in _get_module
102
+ Traceback (most recent call last):
103
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1967, in _get_module
104
+ raise RuntimeError(
105
+ RuntimeError: Failed to import transformers.trainer because of the following error (look up to see its traceback):
106
+ Failed to import transformers.integrations.integration_utils because of the following error (look up to see its traceback):
107
+ Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
108
+ /root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNK3c1011StorageImpl27throw_data_ptr_access_errorEv
109
+ return importlib.import_module("." + module_name, self.__name__)
110
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/importlib/__init__.py", line 126, in import_module
111
+ return _bootstrap._gcd_import(name[level:], package, level)
112
+ File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
113
+ File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
114
+ File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
115
+ File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
116
+ File "<frozen importlib._bootstrap_external>", line 883, in exec_module
117
+ File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
118
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/modeling_utils.py", line 62, in <module>
119
+ from .integrations.flash_attention import flash_attention_forward
120
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/integrations/flash_attention.py", line 5, in <module>
121
+ from ..modeling_flash_attention_utils import _flash_attention_forward, flash_attn_supports_top_left_mask
122
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/modeling_flash_attention_utils.py", line 36, in <module>
123
+ from flash_attn.bert_padding import index_first_axis, pad_input, unpad_input # noqa
124
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/flash_attn/__init__.py", line 3, in <module>
125
+ from flash_attn.flash_attn_interface import (
126
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 15, in <module>
127
+ import flash_attn_2_cuda as flash_attn_gpu
128
+ ImportError: /root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNK3c1011StorageImpl27throw_data_ptr_access_errorEv
129
+
130
+ The above exception was the direct cause of the following exception:
131
+
132
+ Traceback (most recent call last):
133
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1967, in _get_module
134
+ return importlib.import_module("." + module_name, self.__name__)
135
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/importlib/__init__.py", line 126, in import_module
136
+ return _bootstrap._gcd_import(name[level:], package, level)
137
+ File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
138
+ File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
139
+ File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
140
+ File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
141
+ File "<frozen importlib._bootstrap_external>", line 883, in exec_module
142
+ File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
143
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/integrations/integration_utils.py", line 36, in <module>
144
+ from .. import PreTrainedModel, TFPreTrainedModel
145
+ File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
146
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1955, in __getattr__
147
+ module = self._get_module(self._class_to_module[name])
148
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1969, in _get_module
149
+ raise RuntimeError(
150
+ RuntimeError: Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
151
+ /root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNK3c1011StorageImpl27throw_data_ptr_access_errorEv
152
+
153
+ The above exception was the direct cause of the following exception:
154
+
155
+ Traceback (most recent call last):
156
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1967, in _get_module
157
+ return importlib.import_module("." + module_name, self.__name__)
158
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/importlib/__init__.py", line 126, in import_module
159
+ return _bootstrap._gcd_import(name[level:], package, level)
160
+ File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
161
+ File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
162
+ File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
163
+ File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
164
+ File "<frozen importlib._bootstrap_external>", line 883, in exec_module
165
+ File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
166
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/trainer.py", line 41, in <module>
167
+ from .integrations import (
168
+ File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
169
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1955, in __getattr__
170
+ module = self._get_module(self._class_to_module[name])
171
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1969, in _get_module
172
+ raise RuntimeError(
173
+ RuntimeError: Failed to import transformers.integrations.integration_utils because of the following error (look up to see its traceback):
174
+ Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
175
+ /root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNK3c1011StorageImpl27throw_data_ptr_access_errorEv
176
+
177
+ The above exception was the direct cause of the following exception:
178
+
179
+ Traceback (most recent call last):
180
+ File "/root/yuwei/WikiDYKEvalV2/src/train.py", line 5, in <module>
181
+ from transformers import Trainer, TrainingArguments, PreTrainedTokenizer, HfArgumentParser, AutoTokenizer
182
+ File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
183
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1955, in __getattr__
184
+ module = self._get_module(self._class_to_module[name])
185
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1969, in _get_module
186
+ raise RuntimeError(
187
+ RuntimeError: Failed to import transformers.trainer because of the following error (look up to see its traceback):
188
+ Failed to import transformers.integrations.integration_utils because of the following error (look up to see its traceback):
189
+ Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
190
+ /root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNK3c1011StorageImpl27throw_data_ptr_access_errorEv
191
+ W0511 11:05:16.241000 275137 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 275202 closing signal SIGTERM
192
+ E0511 11:05:16.506000 275137 site-packages/torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 1 (pid: 275203) of binary: /root/miniconda3/envs/wikidyk/bin/python
193
+ Traceback (most recent call last):
194
+ File "/root/miniconda3/envs/wikidyk/bin/torchrun", line 8, in <module>
195
+ sys.exit(main())
196
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
197
+ return f(*args, **kwargs)
198
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/run.py", line 919, in main
199
+ run(args)
200
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/run.py", line 910, in run
201
+ elastic_launch(
202
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 138, in __call__
203
+ return launch_agent(self._config, self._entrypoint, list(args))
204
+ File "/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
205
+ raise ChildFailedError(
206
+ torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
207
+ ============================================================
208
+ src/train.py FAILED
209
+ ------------------------------------------------------------
210
+ Failures:
211
+ <NO_OTHER_FAILURES>
212
+ ------------------------------------------------------------
213
+ Root Cause (first observed failure):
214
+ [0]:
215
+ time : 2025-05-11_11:05:16
216
+ host : bb9aa167977b
217
+ rank : 1 (local_rank: 1)
218
+ exitcode : 1 (pid: 275203)
219
+ error_file: <N/A>
220
+ traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
221
+ ============================================================
222
+ [2025-05-11 11:05:16] ERROR: Training failed for google/flan-t5-large with exit code 1
223
+ [2025-05-11 11:05:16] ERROR: Training failed for google/flan-t5-large with exit code 1
224
+ [2025-05-11 11:05:16] Check error log for details: train_results/google_flan-t5-large_full_upsample3000/20250511_110511.log
225
+ [2025-05-11 11:05:16] Resource usage after training google/flan-t5-large:
226
+ [2025-05-11 11:05:16] GPU memory usage:
227
+ 1 MiB, 81920 MiB
228
+ 1 MiB, 81920 MiB
229
+ 40409 MiB, 81920 MiB
230
+ 40721 MiB, 81920 MiB
231
+ [2025-05-11 11:05:16] Disk space usage for model outputs:
232
+ 18G train_results/google_flan-t5-large_full_upsample3000
233
+ [2025-05-11 11:05:16]
234
+ [2025-05-11 11:05:16] All training runs completed at Sun May 11 11:05:16 UTC 2025
235
+ [2025-05-11 11:05:16] =======================================
236
+ [2025-05-11 11:05:16] Summary of training runs:
237
+ [2025-05-11 11:05:16] Model | Status | Duration | Output Size
20250511_110815.log ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0
  0%| | 0/288047 [00:00<?, ?it/s]/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3980: UserWarning: `as_target_tokenizer` is deprecated and will be removed in v5 of Transformers. You can tokenize your labels by using the argument `text_target` of the regular `__call__` method (either in the same call as your input texts if you use the same keyword arguments, or in a separate call.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  62%|██████▏ | 180001/288047 [00:05<00:03, 31170.87it/s]
2
  62%|██████▏ | 180025/288047 [00:20<00:03, 31170.87it/s]
3
  62%|██████▏ | 180026/288047 [00:20<00:15, 6839.34it/s]
4
  62%|██████▏ | 180027/288047 [00:20<00:16, 6580.38it/s]
5
 
 
6
  63%|██████▎ | 180050/288047 [00:32<00:16, 6580.38it/s]
7
  63%|██████▎ | 180064/288047 [00:40<00:16, 6580.38it/s]
8
  63%|██████▎ | 180065/288047 [00:40<00:50, 2126.11it/s]
9
  63%|██████▎ | 180066/288047 [00:40<00:51, 2079.67it/s][2025-05-11 11:10:04] ERROR: Training failed for google/flan-t5-large with exit code 1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [2025-05-11 11:08:15] Created output directory: train_results/google_flan-t5-large_full_upsample3000
2
+ [2025-05-11 11:08:15] Chat mode disabled
3
+ [2025-05-11 11:08:15] Model size is 3B or smaller (0 B). Using full fine-tuning.
4
+ [2025-05-11 11:08:15] Adjusted parameters for t5 model:
5
+ [2025-05-11 11:08:15] - LEARNING_RATE: 1e-4
6
+ [2025-05-11 11:08:15] - BATCH_SIZE: 32
7
+ [2025-05-11 11:08:15] - GRADIENT_ACCUMULATION_STEPS: 2
8
+ [2025-05-11 11:08:15] No QA format data will be used
9
+ [2025-05-11 11:08:15] =======================================
10
+ [2025-05-11 11:08:15] Starting training for model: google/flan-t5-large
11
+ [2025-05-11 11:08:15] =======================================
12
+ [2025-05-11 11:08:15] CUDA_VISIBLE_DEVICES: 0,1
13
+ [2025-05-11 11:08:15] WANDB_PROJECT: wikidyk-ar
14
+ [2025-05-11 11:08:15] DATA_PATH: data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json
15
+ [2025-05-11 11:08:15] Global Batch Size: 128
16
+ [2025-05-11 11:08:15] Data Size: -1
17
+ [2025-05-11 11:08:15] Executing command: torchrun --nproc_per_node "2" --master-port 29502 src/train.py --model_name_or_path "google/flan-t5-large" --data_path "data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json" --output_dir "train_results/google_flan-t5-large_full_upsample3000" --num_upsample "3000" --per_device_train_batch_size "32" --gradient_accumulation_steps "2" --learning_rate "1e-4" --num_train_epochs "1" --model_max_length "32768" --report_to wandb --logging_steps 50 --save_steps 10000 --save_total_limit 3 --resume_from_checkpoint True --bf16 True --use_flash_attention_2 True --qa_data_ratio "-1" --predict_mask "false"
18
+ [2025-05-11 11:08:15] Training started at Sun May 11 11:08:15 UTC 2025
19
+ W0511 11:08:16.439000 275986 site-packages/torch/distributed/run.py:792]
20
+ W0511 11:08:16.439000 275986 site-packages/torch/distributed/run.py:792] *****************************************
21
+ W0511 11:08:16.439000 275986 site-packages/torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
22
+ W0511 11:08:16.439000 275986 site-packages/torch/distributed/run.py:792] *****************************************
23
+ WARNING:root:Output directory: train_results/google_flan-t5-large_full_upsample3000
24
+ WARNING:root:Output directory: train_results/google_flan-t5-large_full_upsample3000
25
+ You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
26
+ WARNING:root:Loading data...
27
+ You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
28
+ WARNING:root:Loading data...
29
+ WARNING:root:Dataset initialized with all QA data:
30
+ WARNING:root: - 0 QA examples
31
+ WARNING:root: - 12290 fact examples with upsampling factor 3000
32
+ WARNING:root: - Total examples: 36870000
33
+ WARNING:root:Dataset initialized with all QA data:
34
+ WARNING:root: - 0 QA examples
35
+ WARNING:root: - 12290 fact examples with upsampling factor 3000
36
+ WARNING:root: - Total examples: 36870000
37
+ /root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
38
+ trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
39
+ You are resuming training from a checkpoint trained with 4.51.1 of Transformers but your current version is 4.51.3. This is not recommended and could yield to errors or unwanted behaviors.
40
+ There were missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'].
41
+ /root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
42
+ trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
43
+ You are resuming training from a checkpoint trained with 4.51.1 of Transformers but your current version is 4.51.3. This is not recommended and could yield to errors or unwanted behaviors.
44
+ There were missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'].
45
+ wandb: WARNING The `run_name` is currently set to the same value as `TrainingArguments.output_dir`. If this was not intended, please specify a different run name by setting the `TrainingArguments.run_name` parameter.
46
+ wandb: Currently logged in as: yuweiz to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
47
+ wandb: Tracking run with wandb version 0.19.11
48
+ wandb: Run data is saved locally in /root/yuwei/WikiDYKEvalV2/wandb/run-20250511_110911-cbretwes
49
+ wandb: Run `wandb offline` to turn off syncing.
50
+ wandb: Syncing run train_results/google_flan-t5-large_full_upsample3000
51
+ wandb: ⭐️ View project at https://wandb.ai/yuweiz/wikidyk-ar
52
+ wandb: 🚀 View run at https://wandb.ai/yuweiz/wikidyk-ar/runs/cbretwes
53
+
54
  0%| | 0/288047 [00:00<?, ?it/s]/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3980: UserWarning: `as_target_tokenizer` is deprecated and will be removed in v5 of Transformers. You can tokenize your labels by using the argument `text_target` of the regular `__call__` method (either in the same call as your input texts if you use the same keyword arguments, or in a separate call.
55
+ warnings.warn(
56
+ Didn't manage to set back the RNG states of the CUDA because of the following error:
57
+ tuple index out of range
58
+ This won't yield the same results as if the training had not been interrupted.
59
+ Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.48.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.
60
+ [rank1]:[W511 11:09:14.166081835 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
61
+ /root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3980: UserWarning: `as_target_tokenizer` is deprecated and will be removed in v5 of Transformers. You can tokenize your labels by using the argument `text_target` of the regular `__call__` method (either in the same call as your input texts if you use the same keyword arguments, or in a separate call.
62
+ warnings.warn(
63
+ Didn't manage to set back the RNG states of the CUDA because of the following error:
64
+ tuple index out of range
65
+ This won't yield the same results as if the training had not been interrupted.
66
+ Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.48.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.
67
+ [rank0]:[W511 11:09:17.716570778 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
68
+
69
  62%|██████▏ | 180001/288047 [00:05<00:03, 31170.87it/s]
70
  62%|██████▏ | 180025/288047 [00:20<00:03, 31170.87it/s]
71
  62%|██████▏ | 180026/288047 [00:20<00:15, 6839.34it/s]
72
  62%|██████▏ | 180027/288047 [00:20<00:16, 6580.38it/s]
73
 
74
+
75
  63%|██████▎ | 180050/288047 [00:32<00:16, 6580.38it/s]
76
  63%|██████▎ | 180064/288047 [00:40<00:16, 6580.38it/s]
77
  63%|██████▎ | 180065/288047 [00:40<00:50, 2126.11it/s]
78
  63%|██████▎ | 180066/288047 [00:40<00:51, 2079.67it/s][2025-05-11 11:10:04] ERROR: Training failed for google/flan-t5-large with exit code 1
79
+ [2025-05-11 11:10:04] ERROR: Training failed for google/flan-t5-large with exit code 1
80
+ [2025-05-11 11:10:04] Check error log for details: train_results/google_flan-t5-large_full_upsample3000/20250511_110815.log
81
+ [2025-05-11 11:10:04] Resource usage after training google/flan-t5-large:
82
+ [2025-05-11 11:10:04] GPU memory usage:
83
+ 1 MiB, 81920 MiB
84
+ 1 MiB, 81920 MiB
85
+ 40409 MiB, 81920 MiB
86
+ 40721 MiB, 81920 MiB
87
+ [2025-05-11 11:10:05] Disk space usage for model outputs:
88
+ 18G train_results/google_flan-t5-large_full_upsample3000
89
+ [2025-05-11 11:10:05]
90
+ [2025-05-11 11:10:05] All training runs completed at Sun May 11 11:10:05 UTC 2025
91
+ [2025-05-11 11:10:05] =======================================
92
+ [2025-05-11 11:10:05] Summary of training runs:
93
+ [2025-05-11 11:10:05] Model | Status | Duration | Output Size
20250511_111054.log ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0
  0%| | 0/288047 [00:00<?, ?it/s]/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3980: UserWarning: `as_target_tokenizer` is deprecated and will be removed in v5 of Transformers. You can tokenize your labels by using the argument `text_target` of the regular `__call__` method (either in the same call as your input texts if you use the same keyword arguments, or in a separate call.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  66%|██████▌ | 190001/288047 [00:05<00:02, 33662.28it/s]
2
  66%|██████▌ | 190025/288047 [00:20<00:02, 33662.28it/s]
3
  66%|██████▌ | 190026/288047 [00:20<00:13, 7188.28it/s]
4
  66%|██████▌ | 190027/288047 [00:20<00:14, 6904.23it/s]
5
  {'loss': 0.0021, 'grad_norm': 0.019677339121699333, 'learning_rate': 3.402153120844862e-05, 'epoch': 0.66}
 
6
  66%|██████▌ | 190050/288047 [00:34<00:14, 6904.23it/s]
7
  66%|██████▌ | 190060/288047 [00:40<00:14, 6904.23it/s]
8
  66%|██████▌ | 190061/288047 [00:40<00:43, 2257.82it/s]
9
  66%|██████▌ | 190062/288047 [00:40<00:44, 2192.95it/s]
10
 
 
11
  66%|██████▌ | 190100/288047 [00:59<00:44, 2192.95it/s]
12
  66%|██████▌ | 190101/288047 [01:00<00:44, 2192.95it/s]
13
  66%|██████▌ | 190102/288047 [01:00<01:44, 934.46it/s]
14
  66%|██████▌ | 190103/288047 [01:00<01:47, 914.55it/s]
15
  66%|██████▌ | 190137/288047 [01:20<01:47, 914.55it/s]
16
  66%|██████▌ | 190138/288047 [01:20<03:47, 429.43it/s]
17
  66%|██████▌ | 190139/288047 [01:20<03:53, 419.89it/s]
18
 
 
19
  66%|██████▌ | 190150/288047 [01:27<03:53, 419.89it/s][2025-05-11 11:13:24] ERROR: Training failed for google/flan-t5-large with exit code 1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [2025-05-11 11:10:54] Created output directory: train_results/google_flan-t5-large_full_upsample3000
2
+ [2025-05-11 11:10:54] Chat mode disabled
3
+ [2025-05-11 11:10:54] Model size is 3B or smaller (0 B). Using full fine-tuning.
4
+ [2025-05-11 11:10:54] Adjusted parameters for t5 model:
5
+ [2025-05-11 11:10:54] - LEARNING_RATE: 1e-4
6
+ [2025-05-11 11:10:54] - BATCH_SIZE: 32
7
+ [2025-05-11 11:10:54] - GRADIENT_ACCUMULATION_STEPS: 2
8
+ [2025-05-11 11:10:54] No QA format data will be used
9
+ [2025-05-11 11:10:54] =======================================
10
+ [2025-05-11 11:10:54] Starting training for model: google/flan-t5-large
11
+ [2025-05-11 11:10:54] =======================================
12
+ [2025-05-11 11:10:54] CUDA_VISIBLE_DEVICES: 0,1
13
+ [2025-05-11 11:10:54] WANDB_PROJECT: wikidyk-ar
14
+ [2025-05-11 11:10:54] DATA_PATH: data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json
15
+ [2025-05-11 11:10:54] Global Batch Size: 128
16
+ [2025-05-11 11:10:54] Data Size: -1
17
+ [2025-05-11 11:10:54] Executing command: torchrun --nproc_per_node "2" --master-port 29502 src/train.py --model_name_or_path "google/flan-t5-large" --data_path "data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json" --output_dir "train_results/google_flan-t5-large_full_upsample3000" --num_upsample "3000" --per_device_train_batch_size "32" --gradient_accumulation_steps "2" --learning_rate "1e-4" --num_train_epochs "1" --model_max_length "32768" --report_to wandb --logging_steps 50 --save_steps 10000 --save_total_limit 3 --resume_from_checkpoint True --bf16 True --use_flash_attention_2 True --qa_data_ratio "-1" --predict_mask "false"
18
+ [2025-05-11 11:10:54] Training started at Sun May 11 11:10:54 UTC 2025
19
+ W0511 11:10:55.754000 276498 site-packages/torch/distributed/run.py:792]
20
+ W0511 11:10:55.754000 276498 site-packages/torch/distributed/run.py:792] *****************************************
21
+ W0511 11:10:55.754000 276498 site-packages/torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
22
+ W0511 11:10:55.754000 276498 site-packages/torch/distributed/run.py:792] *****************************************
23
+ WARNING:root:Output directory: train_results/google_flan-t5-large_full_upsample3000
24
+ You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
25
+ WARNING:root:Loading data...
26
+ WARNING:root:Output directory: train_results/google_flan-t5-large_full_upsample3000
27
+ You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
28
+ WARNING:root:Loading data...
29
+ WARNING:root:Dataset initialized with all QA data:
30
+ WARNING:root: - 0 QA examples
31
+ WARNING:root: - 12290 fact examples with upsampling factor 3000
32
+ WARNING:root: - Total examples: 36870000
33
+ WARNING:root:Dataset initialized with all QA data:
34
+ WARNING:root: - 0 QA examples
35
+ WARNING:root: - 12290 fact examples with upsampling factor 3000
36
+ WARNING:root: - Total examples: 36870000
37
+ /root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
38
+ trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
39
+ You are resuming training from a checkpoint trained with 4.51.1 of Transformers but your current version is 4.51.3. This is not recommended and could yield to errors or unwanted behaviors.
40
+ There were missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'].
41
+ /root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
42
+ trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
43
+ You are resuming training from a checkpoint trained with 4.51.1 of Transformers but your current version is 4.51.3. This is not recommended and could yield to errors or unwanted behaviors.
44
+ There were missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'].
45
+ wandb: WARNING The `run_name` is currently set to the same value as `TrainingArguments.output_dir`. If this was not intended, please specify a different run name by setting the `TrainingArguments.run_name` parameter.
46
+ wandb: Currently logged in as: yuweiz to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
47
+ wandb: Tracking run with wandb version 0.19.11
48
+ wandb: Run data is saved locally in /root/yuwei/WikiDYKEvalV2/wandb/run-20250511_111150-2u5am4ts
49
+ wandb: Run `wandb offline` to turn off syncing.
50
+ wandb: Syncing run train_results/google_flan-t5-large_full_upsample3000
51
+ wandb: ⭐️ View project at https://wandb.ai/yuweiz/wikidyk-ar
52
+ wandb: 🚀 View run at https://wandb.ai/yuweiz/wikidyk-ar/runs/2u5am4ts
53
+
54
  0%| | 0/288047 [00:00<?, ?it/s]/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3980: UserWarning: `as_target_tokenizer` is deprecated and will be removed in v5 of Transformers. You can tokenize your labels by using the argument `text_target` of the regular `__call__` method (either in the same call as your input texts if you use the same keyword arguments, or in a separate call.
55
+ warnings.warn(
56
+ Didn't manage to set back the RNG states of the CUDA because of the following error:
57
+ tuple index out of range
58
+ This won't yield the same results as if the training had not been interrupted.
59
+ /root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3980: UserWarning: `as_target_tokenizer` is deprecated and will be removed in v5 of Transformers. You can tokenize your labels by using the argument `text_target` of the regular `__call__` method (either in the same call as your input texts if you use the same keyword arguments, or in a separate call.
60
+ warnings.warn(
61
+ Didn't manage to set back the RNG states of the CUDA because of the following error:
62
+ tuple index out of range
63
+ This won't yield the same results as if the training had not been interrupted.
64
+ Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.48.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.
65
+ [rank0]:[W511 11:11:55.768036568 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
66
+ Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.48.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.
67
+ [rank1]:[W511 11:11:55.383761995 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
68
+
69
  66%|██████▌ | 190001/288047 [00:05<00:02, 33662.28it/s]
70
  66%|██████▌ | 190025/288047 [00:20<00:02, 33662.28it/s]
71
  66%|██████▌ | 190026/288047 [00:20<00:13, 7188.28it/s]
72
  66%|██████▌ | 190027/288047 [00:20<00:14, 6904.23it/s]
73
  {'loss': 0.0021, 'grad_norm': 0.019677339121699333, 'learning_rate': 3.402153120844862e-05, 'epoch': 0.66}
74
+
75
  66%|██████▌ | 190050/288047 [00:34<00:14, 6904.23it/s]
76
  66%|██████▌ | 190060/288047 [00:40<00:14, 6904.23it/s]
77
  66%|██████▌ | 190061/288047 [00:40<00:43, 2257.82it/s]
78
  66%|██████▌ | 190062/288047 [00:40<00:44, 2192.95it/s]
79
 
80
+
81
  66%|██████▌ | 190100/288047 [00:59<00:44, 2192.95it/s]
82
  66%|██████▌ | 190101/288047 [01:00<00:44, 2192.95it/s]
83
  66%|██████▌ | 190102/288047 [01:00<01:44, 934.46it/s]
84
  66%|██████▌ | 190103/288047 [01:00<01:47, 914.55it/s]
85
  66%|██████▌ | 190137/288047 [01:20<01:47, 914.55it/s]
86
  66%|██████▌ | 190138/288047 [01:20<03:47, 429.43it/s]
87
  66%|██████▌ | 190139/288047 [01:20<03:53, 419.89it/s]
88
 
89
+
90
  66%|██████▌ | 190150/288047 [01:27<03:53, 419.89it/s][2025-05-11 11:13:24] ERROR: Training failed for google/flan-t5-large with exit code 1
91
+ [2025-05-11 11:13:24] ERROR: Training failed for google/flan-t5-large with exit code 1
92
+ [2025-05-11 11:13:24] Check error log for details: train_results/google_flan-t5-large_full_upsample3000/20250511_111054.log
93
+ [2025-05-11 11:13:24] Resource usage after training google/flan-t5-large:
94
+ [2025-05-11 11:13:24] GPU memory usage:
95
+ 1 MiB, 81920 MiB
96
+ 1 MiB, 81920 MiB
97
+ 40409 MiB, 81920 MiB
98
+ 40721 MiB, 81920 MiB
99
+ [2025-05-11 11:13:24] Disk space usage for model outputs:
100
+ 27G train_results/google_flan-t5-large_full_upsample3000
101
+ [2025-05-11 11:13:24]
102
+ [2025-05-11 11:13:24] All training runs completed at Sun May 11 11:13:24 UTC 2025
103
+ [2025-05-11 11:13:24] =======================================
104
+ [2025-05-11 11:13:24] Summary of training runs:
105
+ [2025-05-11 11:13:24] Model | Status | Duration | Output Size
20250511_111333.log ADDED
The diff for this file is too large to render. See raw diff
 
20250511_121707.log ADDED
The diff for this file is too large to render. See raw diff
 
20250511_225651.log ADDED
The diff for this file is too large to render. See raw diff
 
README.md ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ base_model: google/flan-t5-large
5
+ tags:
6
+ - generated_from_trainer
7
+ model-index:
8
+ - name: google_flan-t5-large_full_upsample3000
9
+ results: []
10
+ ---
11
+
12
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
+ should probably proofread and complete it, then remove this comment. -->
14
+
15
+ # google_flan-t5-large_full_upsample3000
16
+
17
+ This model is a fine-tuned version of [google/flan-t5-large](https://huggingface.co/google/flan-t5-large) on an unknown dataset.
18
+
19
+ ## Model description
20
+
21
+ More information needed
22
+
23
+ ## Intended uses & limitations
24
+
25
+ More information needed
26
+
27
+ ## Training and evaluation data
28
+
29
+ More information needed
30
+
31
+ ## Training procedure
32
+
33
+ ### Training hyperparameters
34
+
35
+ The following hyperparameters were used during training:
36
+ - learning_rate: 0.0001
37
+ - train_batch_size: 32
38
+ - eval_batch_size: 8
39
+ - seed: 42
40
+ - distributed_type: multi-GPU
41
+ - num_devices: 2
42
+ - gradient_accumulation_steps: 2
43
+ - total_train_batch_size: 128
44
+ - total_eval_batch_size: 16
45
+ - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
46
+ - lr_scheduler_type: linear
47
+ - num_epochs: 1.0
48
+
49
+ ### Training results
50
+
51
+
52
+
53
+ ### Framework versions
54
+
55
+ - Transformers 4.51.3
56
+ - Pytorch 2.6.0+cu124
57
+ - Datasets 3.6.0
58
+ - Tokenizers 0.21.1
added_tokens.json ADDED
@@ -0,0 +1,102 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "<extra_id_0>": 32099,
3
+ "<extra_id_10>": 32089,
4
+ "<extra_id_11>": 32088,
5
+ "<extra_id_12>": 32087,
6
+ "<extra_id_13>": 32086,
7
+ "<extra_id_14>": 32085,
8
+ "<extra_id_15>": 32084,
9
+ "<extra_id_16>": 32083,
10
+ "<extra_id_17>": 32082,
11
+ "<extra_id_18>": 32081,
12
+ "<extra_id_19>": 32080,
13
+ "<extra_id_1>": 32098,
14
+ "<extra_id_20>": 32079,
15
+ "<extra_id_21>": 32078,
16
+ "<extra_id_22>": 32077,
17
+ "<extra_id_23>": 32076,
18
+ "<extra_id_24>": 32075,
19
+ "<extra_id_25>": 32074,
20
+ "<extra_id_26>": 32073,
21
+ "<extra_id_27>": 32072,
22
+ "<extra_id_28>": 32071,
23
+ "<extra_id_29>": 32070,
24
+ "<extra_id_2>": 32097,
25
+ "<extra_id_30>": 32069,
26
+ "<extra_id_31>": 32068,
27
+ "<extra_id_32>": 32067,
28
+ "<extra_id_33>": 32066,
29
+ "<extra_id_34>": 32065,
30
+ "<extra_id_35>": 32064,
31
+ "<extra_id_36>": 32063,
32
+ "<extra_id_37>": 32062,
33
+ "<extra_id_38>": 32061,
34
+ "<extra_id_39>": 32060,
35
+ "<extra_id_3>": 32096,
36
+ "<extra_id_40>": 32059,
37
+ "<extra_id_41>": 32058,
38
+ "<extra_id_42>": 32057,
39
+ "<extra_id_43>": 32056,
40
+ "<extra_id_44>": 32055,
41
+ "<extra_id_45>": 32054,
42
+ "<extra_id_46>": 32053,
43
+ "<extra_id_47>": 32052,
44
+ "<extra_id_48>": 32051,
45
+ "<extra_id_49>": 32050,
46
+ "<extra_id_4>": 32095,
47
+ "<extra_id_50>": 32049,
48
+ "<extra_id_51>": 32048,
49
+ "<extra_id_52>": 32047,
50
+ "<extra_id_53>": 32046,
51
+ "<extra_id_54>": 32045,
52
+ "<extra_id_55>": 32044,
53
+ "<extra_id_56>": 32043,
54
+ "<extra_id_57>": 32042,
55
+ "<extra_id_58>": 32041,
56
+ "<extra_id_59>": 32040,
57
+ "<extra_id_5>": 32094,
58
+ "<extra_id_60>": 32039,
59
+ "<extra_id_61>": 32038,
60
+ "<extra_id_62>": 32037,
61
+ "<extra_id_63>": 32036,
62
+ "<extra_id_64>": 32035,
63
+ "<extra_id_65>": 32034,
64
+ "<extra_id_66>": 32033,
65
+ "<extra_id_67>": 32032,
66
+ "<extra_id_68>": 32031,
67
+ "<extra_id_69>": 32030,
68
+ "<extra_id_6>": 32093,
69
+ "<extra_id_70>": 32029,
70
+ "<extra_id_71>": 32028,
71
+ "<extra_id_72>": 32027,
72
+ "<extra_id_73>": 32026,
73
+ "<extra_id_74>": 32025,
74
+ "<extra_id_75>": 32024,
75
+ "<extra_id_76>": 32023,
76
+ "<extra_id_77>": 32022,
77
+ "<extra_id_78>": 32021,
78
+ "<extra_id_79>": 32020,
79
+ "<extra_id_7>": 32092,
80
+ "<extra_id_80>": 32019,
81
+ "<extra_id_81>": 32018,
82
+ "<extra_id_82>": 32017,
83
+ "<extra_id_83>": 32016,
84
+ "<extra_id_84>": 32015,
85
+ "<extra_id_85>": 32014,
86
+ "<extra_id_86>": 32013,
87
+ "<extra_id_87>": 32012,
88
+ "<extra_id_88>": 32011,
89
+ "<extra_id_89>": 32010,
90
+ "<extra_id_8>": 32091,
91
+ "<extra_id_90>": 32009,
92
+ "<extra_id_91>": 32008,
93
+ "<extra_id_92>": 32007,
94
+ "<extra_id_93>": 32006,
95
+ "<extra_id_94>": 32005,
96
+ "<extra_id_95>": 32004,
97
+ "<extra_id_96>": 32003,
98
+ "<extra_id_97>": 32002,
99
+ "<extra_id_98>": 32001,
100
+ "<extra_id_99>": 32000,
101
+ "<extra_id_9>": 32090
102
+ }
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "T5ForConditionalGeneration"
4
+ ],
5
+ "classifier_dropout": 0.0,
6
+ "d_ff": 2816,
7
+ "d_kv": 64,
8
+ "d_model": 1024,
9
+ "decoder_start_token_id": 0,
10
+ "dense_act_fn": "gelu_new",
11
+ "dropout_rate": 0.1,
12
+ "eos_token_id": 1,
13
+ "feed_forward_proj": "gated-gelu",
14
+ "initializer_factor": 1.0,
15
+ "is_encoder_decoder": true,
16
+ "is_gated_act": true,
17
+ "layer_norm_epsilon": 1e-06,
18
+ "model_type": "t5",
19
+ "n_positions": 512,
20
+ "num_decoder_layers": 24,
21
+ "num_heads": 16,
22
+ "num_layers": 24,
23
+ "output_past": true,
24
+ "pad_token_id": 0,
25
+ "relative_attention_max_distance": 128,
26
+ "relative_attention_num_buckets": 32,
27
+ "tie_word_embeddings": false,
28
+ "torch_dtype": "float32",
29
+ "transformers_version": "4.51.3",
30
+ "use_cache": true,
31
+ "vocab_size": 32128
32
+ }
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "decoder_start_token_id": 0,
4
+ "eos_token_id": 1,
5
+ "pad_token_id": 0,
6
+ "transformers_version": "4.51.3"
7
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0a0f6221b9dcbcdce6374d0958d35cc9f3ce103485e6b9640c7b85bbf68f71ce
3
+ size 3132668808
special_tokens_map.json ADDED
@@ -0,0 +1,125 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<extra_id_0>",
4
+ "<extra_id_1>",
5
+ "<extra_id_2>",
6
+ "<extra_id_3>",
7
+ "<extra_id_4>",
8
+ "<extra_id_5>",
9
+ "<extra_id_6>",
10
+ "<extra_id_7>",
11
+ "<extra_id_8>",
12
+ "<extra_id_9>",
13
+ "<extra_id_10>",
14
+ "<extra_id_11>",
15
+ "<extra_id_12>",
16
+ "<extra_id_13>",
17
+ "<extra_id_14>",
18
+ "<extra_id_15>",
19
+ "<extra_id_16>",
20
+ "<extra_id_17>",
21
+ "<extra_id_18>",
22
+ "<extra_id_19>",
23
+ "<extra_id_20>",
24
+ "<extra_id_21>",
25
+ "<extra_id_22>",
26
+ "<extra_id_23>",
27
+ "<extra_id_24>",
28
+ "<extra_id_25>",
29
+ "<extra_id_26>",
30
+ "<extra_id_27>",
31
+ "<extra_id_28>",
32
+ "<extra_id_29>",
33
+ "<extra_id_30>",
34
+ "<extra_id_31>",
35
+ "<extra_id_32>",
36
+ "<extra_id_33>",
37
+ "<extra_id_34>",
38
+ "<extra_id_35>",
39
+ "<extra_id_36>",
40
+ "<extra_id_37>",
41
+ "<extra_id_38>",
42
+ "<extra_id_39>",
43
+ "<extra_id_40>",
44
+ "<extra_id_41>",
45
+ "<extra_id_42>",
46
+ "<extra_id_43>",
47
+ "<extra_id_44>",
48
+ "<extra_id_45>",
49
+ "<extra_id_46>",
50
+ "<extra_id_47>",
51
+ "<extra_id_48>",
52
+ "<extra_id_49>",
53
+ "<extra_id_50>",
54
+ "<extra_id_51>",
55
+ "<extra_id_52>",
56
+ "<extra_id_53>",
57
+ "<extra_id_54>",
58
+ "<extra_id_55>",
59
+ "<extra_id_56>",
60
+ "<extra_id_57>",
61
+ "<extra_id_58>",
62
+ "<extra_id_59>",
63
+ "<extra_id_60>",
64
+ "<extra_id_61>",
65
+ "<extra_id_62>",
66
+ "<extra_id_63>",
67
+ "<extra_id_64>",
68
+ "<extra_id_65>",
69
+ "<extra_id_66>",
70
+ "<extra_id_67>",
71
+ "<extra_id_68>",
72
+ "<extra_id_69>",
73
+ "<extra_id_70>",
74
+ "<extra_id_71>",
75
+ "<extra_id_72>",
76
+ "<extra_id_73>",
77
+ "<extra_id_74>",
78
+ "<extra_id_75>",
79
+ "<extra_id_76>",
80
+ "<extra_id_77>",
81
+ "<extra_id_78>",
82
+ "<extra_id_79>",
83
+ "<extra_id_80>",
84
+ "<extra_id_81>",
85
+ "<extra_id_82>",
86
+ "<extra_id_83>",
87
+ "<extra_id_84>",
88
+ "<extra_id_85>",
89
+ "<extra_id_86>",
90
+ "<extra_id_87>",
91
+ "<extra_id_88>",
92
+ "<extra_id_89>",
93
+ "<extra_id_90>",
94
+ "<extra_id_91>",
95
+ "<extra_id_92>",
96
+ "<extra_id_93>",
97
+ "<extra_id_94>",
98
+ "<extra_id_95>",
99
+ "<extra_id_96>",
100
+ "<extra_id_97>",
101
+ "<extra_id_98>",
102
+ "<extra_id_99>"
103
+ ],
104
+ "eos_token": {
105
+ "content": "</s>",
106
+ "lstrip": false,
107
+ "normalized": false,
108
+ "rstrip": false,
109
+ "single_word": false
110
+ },
111
+ "pad_token": {
112
+ "content": "<pad>",
113
+ "lstrip": false,
114
+ "normalized": false,
115
+ "rstrip": false,
116
+ "single_word": false
117
+ },
118
+ "unk_token": {
119
+ "content": "<unk>",
120
+ "lstrip": false,
121
+ "normalized": false,
122
+ "rstrip": false,
123
+ "single_word": false
124
+ }
125
+ }
spiece.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d60acb128cf7b7f2536e8f38a5b18a05535c9e14c7a355904270e15b0945ea86
3
+ size 791656
tokenizer_config.json ADDED
@@ -0,0 +1,942 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": true,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<pad>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "</s>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "<unk>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "32000": {
29
+ "content": "<extra_id_99>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "32001": {
37
+ "content": "<extra_id_98>",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "32002": {
45
+ "content": "<extra_id_97>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": true
51
+ },
52
+ "32003": {
53
+ "content": "<extra_id_96>",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": true
59
+ },
60
+ "32004": {
61
+ "content": "<extra_id_95>",
62
+ "lstrip": false,
63
+ "normalized": false,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": true
67
+ },
68
+ "32005": {
69
+ "content": "<extra_id_94>",
70
+ "lstrip": false,
71
+ "normalized": false,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": true
75
+ },
76
+ "32006": {
77
+ "content": "<extra_id_93>",
78
+ "lstrip": false,
79
+ "normalized": false,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": true
83
+ },
84
+ "32007": {
85
+ "content": "<extra_id_92>",
86
+ "lstrip": false,
87
+ "normalized": false,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": true
91
+ },
92
+ "32008": {
93
+ "content": "<extra_id_91>",
94
+ "lstrip": false,
95
+ "normalized": false,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": true
99
+ },
100
+ "32009": {
101
+ "content": "<extra_id_90>",
102
+ "lstrip": false,
103
+ "normalized": false,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": true
107
+ },
108
+ "32010": {
109
+ "content": "<extra_id_89>",
110
+ "lstrip": false,
111
+ "normalized": false,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": true
115
+ },
116
+ "32011": {
117
+ "content": "<extra_id_88>",
118
+ "lstrip": false,
119
+ "normalized": false,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": true
123
+ },
124
+ "32012": {
125
+ "content": "<extra_id_87>",
126
+ "lstrip": false,
127
+ "normalized": false,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": true
131
+ },
132
+ "32013": {
133
+ "content": "<extra_id_86>",
134
+ "lstrip": false,
135
+ "normalized": false,
136
+ "rstrip": false,
137
+ "single_word": false,
138
+ "special": true
139
+ },
140
+ "32014": {
141
+ "content": "<extra_id_85>",
142
+ "lstrip": false,
143
+ "normalized": false,
144
+ "rstrip": false,
145
+ "single_word": false,
146
+ "special": true
147
+ },
148
+ "32015": {
149
+ "content": "<extra_id_84>",
150
+ "lstrip": false,
151
+ "normalized": false,
152
+ "rstrip": false,
153
+ "single_word": false,
154
+ "special": true
155
+ },
156
+ "32016": {
157
+ "content": "<extra_id_83>",
158
+ "lstrip": false,
159
+ "normalized": false,
160
+ "rstrip": false,
161
+ "single_word": false,
162
+ "special": true
163
+ },
164
+ "32017": {
165
+ "content": "<extra_id_82>",
166
+ "lstrip": false,
167
+ "normalized": false,
168
+ "rstrip": false,
169
+ "single_word": false,
170
+ "special": true
171
+ },
172
+ "32018": {
173
+ "content": "<extra_id_81>",
174
+ "lstrip": false,
175
+ "normalized": false,
176
+ "rstrip": false,
177
+ "single_word": false,
178
+ "special": true
179
+ },
180
+ "32019": {
181
+ "content": "<extra_id_80>",
182
+ "lstrip": false,
183
+ "normalized": false,
184
+ "rstrip": false,
185
+ "single_word": false,
186
+ "special": true
187
+ },
188
+ "32020": {
189
+ "content": "<extra_id_79>",
190
+ "lstrip": false,
191
+ "normalized": false,
192
+ "rstrip": false,
193
+ "single_word": false,
194
+ "special": true
195
+ },
196
+ "32021": {
197
+ "content": "<extra_id_78>",
198
+ "lstrip": false,
199
+ "normalized": false,
200
+ "rstrip": false,
201
+ "single_word": false,
202
+ "special": true
203
+ },
204
+ "32022": {
205
+ "content": "<extra_id_77>",
206
+ "lstrip": false,
207
+ "normalized": false,
208
+ "rstrip": false,
209
+ "single_word": false,
210
+ "special": true
211
+ },
212
+ "32023": {
213
+ "content": "<extra_id_76>",
214
+ "lstrip": false,
215
+ "normalized": false,
216
+ "rstrip": false,
217
+ "single_word": false,
218
+ "special": true
219
+ },
220
+ "32024": {
221
+ "content": "<extra_id_75>",
222
+ "lstrip": false,
223
+ "normalized": false,
224
+ "rstrip": false,
225
+ "single_word": false,
226
+ "special": true
227
+ },
228
+ "32025": {
229
+ "content": "<extra_id_74>",
230
+ "lstrip": false,
231
+ "normalized": false,
232
+ "rstrip": false,
233
+ "single_word": false,
234
+ "special": true
235
+ },
236
+ "32026": {
237
+ "content": "<extra_id_73>",
238
+ "lstrip": false,
239
+ "normalized": false,
240
+ "rstrip": false,
241
+ "single_word": false,
242
+ "special": true
243
+ },
244
+ "32027": {
245
+ "content": "<extra_id_72>",
246
+ "lstrip": false,
247
+ "normalized": false,
248
+ "rstrip": false,
249
+ "single_word": false,
250
+ "special": true
251
+ },
252
+ "32028": {
253
+ "content": "<extra_id_71>",
254
+ "lstrip": false,
255
+ "normalized": false,
256
+ "rstrip": false,
257
+ "single_word": false,
258
+ "special": true
259
+ },
260
+ "32029": {
261
+ "content": "<extra_id_70>",
262
+ "lstrip": false,
263
+ "normalized": false,
264
+ "rstrip": false,
265
+ "single_word": false,
266
+ "special": true
267
+ },
268
+ "32030": {
269
+ "content": "<extra_id_69>",
270
+ "lstrip": false,
271
+ "normalized": false,
272
+ "rstrip": false,
273
+ "single_word": false,
274
+ "special": true
275
+ },
276
+ "32031": {
277
+ "content": "<extra_id_68>",
278
+ "lstrip": false,
279
+ "normalized": false,
280
+ "rstrip": false,
281
+ "single_word": false,
282
+ "special": true
283
+ },
284
+ "32032": {
285
+ "content": "<extra_id_67>",
286
+ "lstrip": false,
287
+ "normalized": false,
288
+ "rstrip": false,
289
+ "single_word": false,
290
+ "special": true
291
+ },
292
+ "32033": {
293
+ "content": "<extra_id_66>",
294
+ "lstrip": false,
295
+ "normalized": false,
296
+ "rstrip": false,
297
+ "single_word": false,
298
+ "special": true
299
+ },
300
+ "32034": {
301
+ "content": "<extra_id_65>",
302
+ "lstrip": false,
303
+ "normalized": false,
304
+ "rstrip": false,
305
+ "single_word": false,
306
+ "special": true
307
+ },
308
+ "32035": {
309
+ "content": "<extra_id_64>",
310
+ "lstrip": false,
311
+ "normalized": false,
312
+ "rstrip": false,
313
+ "single_word": false,
314
+ "special": true
315
+ },
316
+ "32036": {
317
+ "content": "<extra_id_63>",
318
+ "lstrip": false,
319
+ "normalized": false,
320
+ "rstrip": false,
321
+ "single_word": false,
322
+ "special": true
323
+ },
324
+ "32037": {
325
+ "content": "<extra_id_62>",
326
+ "lstrip": false,
327
+ "normalized": false,
328
+ "rstrip": false,
329
+ "single_word": false,
330
+ "special": true
331
+ },
332
+ "32038": {
333
+ "content": "<extra_id_61>",
334
+ "lstrip": false,
335
+ "normalized": false,
336
+ "rstrip": false,
337
+ "single_word": false,
338
+ "special": true
339
+ },
340
+ "32039": {
341
+ "content": "<extra_id_60>",
342
+ "lstrip": false,
343
+ "normalized": false,
344
+ "rstrip": false,
345
+ "single_word": false,
346
+ "special": true
347
+ },
348
+ "32040": {
349
+ "content": "<extra_id_59>",
350
+ "lstrip": false,
351
+ "normalized": false,
352
+ "rstrip": false,
353
+ "single_word": false,
354
+ "special": true
355
+ },
356
+ "32041": {
357
+ "content": "<extra_id_58>",
358
+ "lstrip": false,
359
+ "normalized": false,
360
+ "rstrip": false,
361
+ "single_word": false,
362
+ "special": true
363
+ },
364
+ "32042": {
365
+ "content": "<extra_id_57>",
366
+ "lstrip": false,
367
+ "normalized": false,
368
+ "rstrip": false,
369
+ "single_word": false,
370
+ "special": true
371
+ },
372
+ "32043": {
373
+ "content": "<extra_id_56>",
374
+ "lstrip": false,
375
+ "normalized": false,
376
+ "rstrip": false,
377
+ "single_word": false,
378
+ "special": true
379
+ },
380
+ "32044": {
381
+ "content": "<extra_id_55>",
382
+ "lstrip": false,
383
+ "normalized": false,
384
+ "rstrip": false,
385
+ "single_word": false,
386
+ "special": true
387
+ },
388
+ "32045": {
389
+ "content": "<extra_id_54>",
390
+ "lstrip": false,
391
+ "normalized": false,
392
+ "rstrip": false,
393
+ "single_word": false,
394
+ "special": true
395
+ },
396
+ "32046": {
397
+ "content": "<extra_id_53>",
398
+ "lstrip": false,
399
+ "normalized": false,
400
+ "rstrip": false,
401
+ "single_word": false,
402
+ "special": true
403
+ },
404
+ "32047": {
405
+ "content": "<extra_id_52>",
406
+ "lstrip": false,
407
+ "normalized": false,
408
+ "rstrip": false,
409
+ "single_word": false,
410
+ "special": true
411
+ },
412
+ "32048": {
413
+ "content": "<extra_id_51>",
414
+ "lstrip": false,
415
+ "normalized": false,
416
+ "rstrip": false,
417
+ "single_word": false,
418
+ "special": true
419
+ },
420
+ "32049": {
421
+ "content": "<extra_id_50>",
422
+ "lstrip": false,
423
+ "normalized": false,
424
+ "rstrip": false,
425
+ "single_word": false,
426
+ "special": true
427
+ },
428
+ "32050": {
429
+ "content": "<extra_id_49>",
430
+ "lstrip": false,
431
+ "normalized": false,
432
+ "rstrip": false,
433
+ "single_word": false,
434
+ "special": true
435
+ },
436
+ "32051": {
437
+ "content": "<extra_id_48>",
438
+ "lstrip": false,
439
+ "normalized": false,
440
+ "rstrip": false,
441
+ "single_word": false,
442
+ "special": true
443
+ },
444
+ "32052": {
445
+ "content": "<extra_id_47>",
446
+ "lstrip": false,
447
+ "normalized": false,
448
+ "rstrip": false,
449
+ "single_word": false,
450
+ "special": true
451
+ },
452
+ "32053": {
453
+ "content": "<extra_id_46>",
454
+ "lstrip": false,
455
+ "normalized": false,
456
+ "rstrip": false,
457
+ "single_word": false,
458
+ "special": true
459
+ },
460
+ "32054": {
461
+ "content": "<extra_id_45>",
462
+ "lstrip": false,
463
+ "normalized": false,
464
+ "rstrip": false,
465
+ "single_word": false,
466
+ "special": true
467
+ },
468
+ "32055": {
469
+ "content": "<extra_id_44>",
470
+ "lstrip": false,
471
+ "normalized": false,
472
+ "rstrip": false,
473
+ "single_word": false,
474
+ "special": true
475
+ },
476
+ "32056": {
477
+ "content": "<extra_id_43>",
478
+ "lstrip": false,
479
+ "normalized": false,
480
+ "rstrip": false,
481
+ "single_word": false,
482
+ "special": true
483
+ },
484
+ "32057": {
485
+ "content": "<extra_id_42>",
486
+ "lstrip": false,
487
+ "normalized": false,
488
+ "rstrip": false,
489
+ "single_word": false,
490
+ "special": true
491
+ },
492
+ "32058": {
493
+ "content": "<extra_id_41>",
494
+ "lstrip": false,
495
+ "normalized": false,
496
+ "rstrip": false,
497
+ "single_word": false,
498
+ "special": true
499
+ },
500
+ "32059": {
501
+ "content": "<extra_id_40>",
502
+ "lstrip": false,
503
+ "normalized": false,
504
+ "rstrip": false,
505
+ "single_word": false,
506
+ "special": true
507
+ },
508
+ "32060": {
509
+ "content": "<extra_id_39>",
510
+ "lstrip": false,
511
+ "normalized": false,
512
+ "rstrip": false,
513
+ "single_word": false,
514
+ "special": true
515
+ },
516
+ "32061": {
517
+ "content": "<extra_id_38>",
518
+ "lstrip": false,
519
+ "normalized": false,
520
+ "rstrip": false,
521
+ "single_word": false,
522
+ "special": true
523
+ },
524
+ "32062": {
525
+ "content": "<extra_id_37>",
526
+ "lstrip": false,
527
+ "normalized": false,
528
+ "rstrip": false,
529
+ "single_word": false,
530
+ "special": true
531
+ },
532
+ "32063": {
533
+ "content": "<extra_id_36>",
534
+ "lstrip": false,
535
+ "normalized": false,
536
+ "rstrip": false,
537
+ "single_word": false,
538
+ "special": true
539
+ },
540
+ "32064": {
541
+ "content": "<extra_id_35>",
542
+ "lstrip": false,
543
+ "normalized": false,
544
+ "rstrip": false,
545
+ "single_word": false,
546
+ "special": true
547
+ },
548
+ "32065": {
549
+ "content": "<extra_id_34>",
550
+ "lstrip": false,
551
+ "normalized": false,
552
+ "rstrip": false,
553
+ "single_word": false,
554
+ "special": true
555
+ },
556
+ "32066": {
557
+ "content": "<extra_id_33>",
558
+ "lstrip": false,
559
+ "normalized": false,
560
+ "rstrip": false,
561
+ "single_word": false,
562
+ "special": true
563
+ },
564
+ "32067": {
565
+ "content": "<extra_id_32>",
566
+ "lstrip": false,
567
+ "normalized": false,
568
+ "rstrip": false,
569
+ "single_word": false,
570
+ "special": true
571
+ },
572
+ "32068": {
573
+ "content": "<extra_id_31>",
574
+ "lstrip": false,
575
+ "normalized": false,
576
+ "rstrip": false,
577
+ "single_word": false,
578
+ "special": true
579
+ },
580
+ "32069": {
581
+ "content": "<extra_id_30>",
582
+ "lstrip": false,
583
+ "normalized": false,
584
+ "rstrip": false,
585
+ "single_word": false,
586
+ "special": true
587
+ },
588
+ "32070": {
589
+ "content": "<extra_id_29>",
590
+ "lstrip": false,
591
+ "normalized": false,
592
+ "rstrip": false,
593
+ "single_word": false,
594
+ "special": true
595
+ },
596
+ "32071": {
597
+ "content": "<extra_id_28>",
598
+ "lstrip": false,
599
+ "normalized": false,
600
+ "rstrip": false,
601
+ "single_word": false,
602
+ "special": true
603
+ },
604
+ "32072": {
605
+ "content": "<extra_id_27>",
606
+ "lstrip": false,
607
+ "normalized": false,
608
+ "rstrip": false,
609
+ "single_word": false,
610
+ "special": true
611
+ },
612
+ "32073": {
613
+ "content": "<extra_id_26>",
614
+ "lstrip": false,
615
+ "normalized": false,
616
+ "rstrip": false,
617
+ "single_word": false,
618
+ "special": true
619
+ },
620
+ "32074": {
621
+ "content": "<extra_id_25>",
622
+ "lstrip": false,
623
+ "normalized": false,
624
+ "rstrip": false,
625
+ "single_word": false,
626
+ "special": true
627
+ },
628
+ "32075": {
629
+ "content": "<extra_id_24>",
630
+ "lstrip": false,
631
+ "normalized": false,
632
+ "rstrip": false,
633
+ "single_word": false,
634
+ "special": true
635
+ },
636
+ "32076": {
637
+ "content": "<extra_id_23>",
638
+ "lstrip": false,
639
+ "normalized": false,
640
+ "rstrip": false,
641
+ "single_word": false,
642
+ "special": true
643
+ },
644
+ "32077": {
645
+ "content": "<extra_id_22>",
646
+ "lstrip": false,
647
+ "normalized": false,
648
+ "rstrip": false,
649
+ "single_word": false,
650
+ "special": true
651
+ },
652
+ "32078": {
653
+ "content": "<extra_id_21>",
654
+ "lstrip": false,
655
+ "normalized": false,
656
+ "rstrip": false,
657
+ "single_word": false,
658
+ "special": true
659
+ },
660
+ "32079": {
661
+ "content": "<extra_id_20>",
662
+ "lstrip": false,
663
+ "normalized": false,
664
+ "rstrip": false,
665
+ "single_word": false,
666
+ "special": true
667
+ },
668
+ "32080": {
669
+ "content": "<extra_id_19>",
670
+ "lstrip": false,
671
+ "normalized": false,
672
+ "rstrip": false,
673
+ "single_word": false,
674
+ "special": true
675
+ },
676
+ "32081": {
677
+ "content": "<extra_id_18>",
678
+ "lstrip": false,
679
+ "normalized": false,
680
+ "rstrip": false,
681
+ "single_word": false,
682
+ "special": true
683
+ },
684
+ "32082": {
685
+ "content": "<extra_id_17>",
686
+ "lstrip": false,
687
+ "normalized": false,
688
+ "rstrip": false,
689
+ "single_word": false,
690
+ "special": true
691
+ },
692
+ "32083": {
693
+ "content": "<extra_id_16>",
694
+ "lstrip": false,
695
+ "normalized": false,
696
+ "rstrip": false,
697
+ "single_word": false,
698
+ "special": true
699
+ },
700
+ "32084": {
701
+ "content": "<extra_id_15>",
702
+ "lstrip": false,
703
+ "normalized": false,
704
+ "rstrip": false,
705
+ "single_word": false,
706
+ "special": true
707
+ },
708
+ "32085": {
709
+ "content": "<extra_id_14>",
710
+ "lstrip": false,
711
+ "normalized": false,
712
+ "rstrip": false,
713
+ "single_word": false,
714
+ "special": true
715
+ },
716
+ "32086": {
717
+ "content": "<extra_id_13>",
718
+ "lstrip": false,
719
+ "normalized": false,
720
+ "rstrip": false,
721
+ "single_word": false,
722
+ "special": true
723
+ },
724
+ "32087": {
725
+ "content": "<extra_id_12>",
726
+ "lstrip": false,
727
+ "normalized": false,
728
+ "rstrip": false,
729
+ "single_word": false,
730
+ "special": true
731
+ },
732
+ "32088": {
733
+ "content": "<extra_id_11>",
734
+ "lstrip": false,
735
+ "normalized": false,
736
+ "rstrip": false,
737
+ "single_word": false,
738
+ "special": true
739
+ },
740
+ "32089": {
741
+ "content": "<extra_id_10>",
742
+ "lstrip": false,
743
+ "normalized": false,
744
+ "rstrip": false,
745
+ "single_word": false,
746
+ "special": true
747
+ },
748
+ "32090": {
749
+ "content": "<extra_id_9>",
750
+ "lstrip": false,
751
+ "normalized": false,
752
+ "rstrip": false,
753
+ "single_word": false,
754
+ "special": true
755
+ },
756
+ "32091": {
757
+ "content": "<extra_id_8>",
758
+ "lstrip": false,
759
+ "normalized": false,
760
+ "rstrip": false,
761
+ "single_word": false,
762
+ "special": true
763
+ },
764
+ "32092": {
765
+ "content": "<extra_id_7>",
766
+ "lstrip": false,
767
+ "normalized": false,
768
+ "rstrip": false,
769
+ "single_word": false,
770
+ "special": true
771
+ },
772
+ "32093": {
773
+ "content": "<extra_id_6>",
774
+ "lstrip": false,
775
+ "normalized": false,
776
+ "rstrip": false,
777
+ "single_word": false,
778
+ "special": true
779
+ },
780
+ "32094": {
781
+ "content": "<extra_id_5>",
782
+ "lstrip": false,
783
+ "normalized": false,
784
+ "rstrip": false,
785
+ "single_word": false,
786
+ "special": true
787
+ },
788
+ "32095": {
789
+ "content": "<extra_id_4>",
790
+ "lstrip": false,
791
+ "normalized": false,
792
+ "rstrip": false,
793
+ "single_word": false,
794
+ "special": true
795
+ },
796
+ "32096": {
797
+ "content": "<extra_id_3>",
798
+ "lstrip": false,
799
+ "normalized": false,
800
+ "rstrip": false,
801
+ "single_word": false,
802
+ "special": true
803
+ },
804
+ "32097": {
805
+ "content": "<extra_id_2>",
806
+ "lstrip": false,
807
+ "normalized": false,
808
+ "rstrip": false,
809
+ "single_word": false,
810
+ "special": true
811
+ },
812
+ "32098": {
813
+ "content": "<extra_id_1>",
814
+ "lstrip": false,
815
+ "normalized": false,
816
+ "rstrip": false,
817
+ "single_word": false,
818
+ "special": true
819
+ },
820
+ "32099": {
821
+ "content": "<extra_id_0>",
822
+ "lstrip": false,
823
+ "normalized": false,
824
+ "rstrip": false,
825
+ "single_word": false,
826
+ "special": true
827
+ }
828
+ },
829
+ "additional_special_tokens": [
830
+ "<extra_id_0>",
831
+ "<extra_id_1>",
832
+ "<extra_id_2>",
833
+ "<extra_id_3>",
834
+ "<extra_id_4>",
835
+ "<extra_id_5>",
836
+ "<extra_id_6>",
837
+ "<extra_id_7>",
838
+ "<extra_id_8>",
839
+ "<extra_id_9>",
840
+ "<extra_id_10>",
841
+ "<extra_id_11>",
842
+ "<extra_id_12>",
843
+ "<extra_id_13>",
844
+ "<extra_id_14>",
845
+ "<extra_id_15>",
846
+ "<extra_id_16>",
847
+ "<extra_id_17>",
848
+ "<extra_id_18>",
849
+ "<extra_id_19>",
850
+ "<extra_id_20>",
851
+ "<extra_id_21>",
852
+ "<extra_id_22>",
853
+ "<extra_id_23>",
854
+ "<extra_id_24>",
855
+ "<extra_id_25>",
856
+ "<extra_id_26>",
857
+ "<extra_id_27>",
858
+ "<extra_id_28>",
859
+ "<extra_id_29>",
860
+ "<extra_id_30>",
861
+ "<extra_id_31>",
862
+ "<extra_id_32>",
863
+ "<extra_id_33>",
864
+ "<extra_id_34>",
865
+ "<extra_id_35>",
866
+ "<extra_id_36>",
867
+ "<extra_id_37>",
868
+ "<extra_id_38>",
869
+ "<extra_id_39>",
870
+ "<extra_id_40>",
871
+ "<extra_id_41>",
872
+ "<extra_id_42>",
873
+ "<extra_id_43>",
874
+ "<extra_id_44>",
875
+ "<extra_id_45>",
876
+ "<extra_id_46>",
877
+ "<extra_id_47>",
878
+ "<extra_id_48>",
879
+ "<extra_id_49>",
880
+ "<extra_id_50>",
881
+ "<extra_id_51>",
882
+ "<extra_id_52>",
883
+ "<extra_id_53>",
884
+ "<extra_id_54>",
885
+ "<extra_id_55>",
886
+ "<extra_id_56>",
887
+ "<extra_id_57>",
888
+ "<extra_id_58>",
889
+ "<extra_id_59>",
890
+ "<extra_id_60>",
891
+ "<extra_id_61>",
892
+ "<extra_id_62>",
893
+ "<extra_id_63>",
894
+ "<extra_id_64>",
895
+ "<extra_id_65>",
896
+ "<extra_id_66>",
897
+ "<extra_id_67>",
898
+ "<extra_id_68>",
899
+ "<extra_id_69>",
900
+ "<extra_id_70>",
901
+ "<extra_id_71>",
902
+ "<extra_id_72>",
903
+ "<extra_id_73>",
904
+ "<extra_id_74>",
905
+ "<extra_id_75>",
906
+ "<extra_id_76>",
907
+ "<extra_id_77>",
908
+ "<extra_id_78>",
909
+ "<extra_id_79>",
910
+ "<extra_id_80>",
911
+ "<extra_id_81>",
912
+ "<extra_id_82>",
913
+ "<extra_id_83>",
914
+ "<extra_id_84>",
915
+ "<extra_id_85>",
916
+ "<extra_id_86>",
917
+ "<extra_id_87>",
918
+ "<extra_id_88>",
919
+ "<extra_id_89>",
920
+ "<extra_id_90>",
921
+ "<extra_id_91>",
922
+ "<extra_id_92>",
923
+ "<extra_id_93>",
924
+ "<extra_id_94>",
925
+ "<extra_id_95>",
926
+ "<extra_id_96>",
927
+ "<extra_id_97>",
928
+ "<extra_id_98>",
929
+ "<extra_id_99>"
930
+ ],
931
+ "clean_up_tokenization_spaces": false,
932
+ "eos_token": "</s>",
933
+ "extra_ids": 100,
934
+ "extra_special_tokens": {},
935
+ "legacy": true,
936
+ "model_max_length": 32768,
937
+ "pad_token": "<pad>",
938
+ "padding_side": "right",
939
+ "sp_model_kwargs": {},
940
+ "tokenizer_class": "T5Tokenizer",
941
+ "unk_token": "<unk>"
942
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3333cedc941dd98a629e756b246a1bed0370d67ddeedae3add1bcba19419ca60
3
+ size 5368