| + deepspeed |
| [rank5]:[W529 18:13:22.451915742 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 5] using GPU 5 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| [rank7]:[W529 18:13:22.469154856 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 7] using GPU 7 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| [rank6]:[W529 18:13:22.498902306 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 6] using GPU 6 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| [rank2]:[W529 18:13:22.574437152 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 2] using GPU 2 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| [rank4]:[W529 18:13:22.606443043 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 4] using GPU 4 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| [rank0]:[W529 18:13:22.897906089 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 0] using GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| [rank1]:[W529 18:13:22.030226754 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 1] using GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| [rank3]:[W529 18:13:22.044069371 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 3] using GPU 3 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/config.json |
| Model config LlamaConfig { |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.52.1", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| Model config LlamaConfig { |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.52.1", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| Model config LlamaConfig { |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.52.1", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| Model config LlamaConfig { |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.52.1", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| Model config LlamaConfig { |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.52.1", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| Model config LlamaConfig { |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.52.1", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| Model config LlamaConfig { |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.52.1", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| Model config LlamaConfig { |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.52.1", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/model.safetensors |
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/model.safetensors |
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/model.safetensors |
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/model.safetensors |
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/model.safetensors |
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/model.safetensors |
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/model.safetensors |
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/model.safetensors |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
|
|
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/generation_config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/generation_config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/generation_config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/generation_config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/generation_config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/generation_config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/generation_config.json |
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| loading file tokenizer.model |
| loading file tokenizer.json |
| loading file added_tokens.json |
| loading file special_tokens_map.json |
| loading file tokenizer.model |
| loading file tokenizer_config.json |
| loading file chat_template.jinja |
| loading file tokenizer.json |
| loading file tokenizer.model |
| loading file added_tokens.json |
| loading file tokenizer.json |
| loading file added_tokens.json |
| loading file tokenizer.model |
| loading file special_tokens_map.json |
| loading file tokenizer_config.json |
| loading file chat_template.jinja |
| loading file tokenizer.json |
| loading file special_tokens_map.json |
| loading file added_tokens.json |
| loading file tokenizer_config.json |
| loading file chat_template.jinja |
| loading file special_tokens_map.json |
| loading file tokenizer_config.json |
| loading file chat_template.jinja |
| loading file tokenizer.model |
| loading file tokenizer.json |
| loading file added_tokens.json |
| loading file special_tokens_map.json |
| loading file tokenizer_config.json |
| loading file chat_template.jinja |
| loading file tokenizer.model |
| loading file tokenizer.json |
| loading file added_tokens.json |
| loading file special_tokens_map.json |
| loading file tokenizer_config.json |
| loading file chat_template.jinja |
| loading file tokenizer.model |
| loading file tokenizer.json |
| loading file added_tokens.json |
| loading file special_tokens_map.json |
| loading file tokenizer_config.json |
| loading file chat_template.jinja |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/generation_config.json |
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| loading file tokenizer.model |
| loading file tokenizer.json |
| loading file added_tokens.json |
| loading file special_tokens_map.json |
| loading file tokenizer_config.json |
| loading file chat_template.jinja |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| Detected CUDA files, patching ldflags |
| Emitting ninja build file /home/hansirui_1st/.cache/torch_extensions/py311_cu124/fused_adam/build.ninja... |
| /aifs4su/hansirui_1st/miniconda3/envs/jy-resist/lib/python3.11/site-packages/torch/utils/cpp_extension.py:2059: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. |
| If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST']. |
| warnings.warn( |
| Building extension module fused_adam... |
| Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) |
| Loading extension module fused_adam... |
| Loading extension module fused_adam...Loading extension module fused_adam...Loading extension module fused_adam... |
|
|
|
|
| Loading extension module fused_adam... |
| Loading extension module fused_adam... |
| Loading extension module fused_adam... |
| Loading extension module fused_adam... |
| `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| wandb: Currently logged in as: xtom to https://api.wandb.ai. Use `wandb login |
| wandb: Tracking run with wandb version 0.19.11 |
| wandb: Run data is saved locally in /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-1.5T/tinyllama-1.5T-s3-Q1-2000/wandb/run-20250529_181334-7urk1q92 |
| wandb: Run `wandb offline` to turn off syncing. |
| wandb: Syncing run imdb-tinyllama-1.5T-s3-Q1-2000 |
| wandb: βοΈ View project at https://wandb.ai/xtom/Inverse_Alignment_IMDb |
| wandb: π View run at https://wandb.ai/xtom/Inverse_Alignment_IMDb/runs/7urk1q92 |
|
Training 1/1 epoch: 0%| | 0/250 [00:00<?, ?it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
|
Training 1/1 epoch (loss 2.8352): 0%| | 0/250 [00:05<?, ?it/s]
Training 1/1 epoch (loss 2.8352): 0%| | 1/250 [00:05<23:32, 5.67s/it]
Training 1/1 epoch (loss 2.8290): 0%| | 1/250 [00:07<23:32, 5.67s/it]
Training 1/1 epoch (loss 2.8290): 1%| | 2/250 [00:07<13:38, 3.30s/it]
Training 1/1 epoch (loss 2.6587): 1%| | 2/250 [00:07<13:38, 3.30s/it]
Training 1/1 epoch (loss 2.6587): 1%| | 3/250 [00:07<08:04, 1.96s/it]
Training 1/1 epoch (loss 2.9351): 1%| | 3/250 [00:08<08:04, 1.96s/it]
Training 1/1 epoch (loss 2.9351): 2%|β | 4/250 [00:08<05:27, 1.33s/it]
Training 1/1 epoch (loss 2.6689): 2%|β | 4/250 [00:08<05:27, 1.33s/it]
Training 1/1 epoch (loss 2.6689): 2%|β | 5/250 [00:08<04:03, 1.00it/s]
Training 1/1 epoch (loss 2.7614): 2%|β | 5/250 [00:08<04:03, 1.00it/s]
Training 1/1 epoch (loss 2.7614): 2%|β | 6/250 [00:08<03:05, 1.32it/s]
Training 1/1 epoch (loss 3.0575): 2%|β | 6/250 [00:09<03:05, 1.32it/s]
Training 1/1 epoch (loss 3.0575): 3%|β | 7/250 [00:09<02:29, 1.63it/s]
Training 1/1 epoch (loss 2.8513): 3%|β | 7/250 [00:09<02:29, 1.63it/s]
Training 1/1 epoch (loss 2.8513): 3%|β | 8/250 [00:09<02:17, 1.76it/s]
Training 1/1 epoch (loss 2.7018): 3%|β | 8/250 [00:09<02:17, 1.76it/s]
Training 1/1 epoch (loss 2.7018): 4%|β | 9/250 [00:09<02:02, 1.96it/s]
Training 1/1 epoch (loss 2.5642): 4%|β | 9/250 [00:10<02:02, 1.96it/s]
Training 1/1 epoch (loss 2.5642): 4%|β | 10/250 [00:10<01:49, 2.18it/s]
Training 1/1 epoch (loss 2.6031): 4%|β | 10/250 [00:10<01:49, 2.18it/s]
Training 1/1 epoch (loss 2.6031): 4%|β | 11/250 [00:10<01:41, 2.35it/s]
Training 1/1 epoch (loss 2.8545): 4%|β | 11/250 [00:10<01:41, 2.35it/s]
Training 1/1 epoch (loss 2.8545): 5%|β | 12/250 [00:10<01:33, 2.55it/s]
Training 1/1 epoch (loss 2.8703): 5%|β | 12/250 [00:11<01:33, 2.55it/s]
Training 1/1 epoch (loss 2.8703): 5%|β | 13/250 [00:11<01:28, 2.69it/s]
Training 1/1 epoch (loss 2.9454): 5%|β | 13/250 [00:11<01:28, 2.69it/s]
Training 1/1 epoch (loss 2.9454): 6%|β | 14/250 [00:11<01:31, 2.58it/s]
Training 1/1 epoch (loss 2.9174): 6%|β | 14/250 [00:11<01:31, 2.58it/s]
Training 1/1 epoch (loss 2.9174): 6%|β | 15/250 [00:11<01:26, 2.72it/s]
Training 1/1 epoch (loss 2.7430): 6%|β | 15/250 [00:12<01:26, 2.72it/s]
Training 1/1 epoch (loss 2.7430): 6%|β | 16/250 [00:12<01:26, 2.72it/s]
Training 1/1 epoch (loss 2.8415): 6%|β | 16/250 [00:12<01:26, 2.72it/s]
Training 1/1 epoch (loss 2.8415): 7%|β | 17/250 [00:12<01:23, 2.78it/s]
Training 1/1 epoch (loss 2.6439): 7%|β | 17/250 [00:13<01:23, 2.78it/s]
Training 1/1 epoch (loss 2.6439): 7%|β | 18/250 [00:13<01:21, 2.86it/s]
Training 1/1 epoch (loss 2.8988): 7%|β | 18/250 [00:13<01:21, 2.86it/s]
Training 1/1 epoch (loss 2.8988): 8%|β | 19/250 [00:13<01:20, 2.88it/s]
Training 1/1 epoch (loss 2.8997): 8%|β | 19/250 [00:13<01:20, 2.88it/s]
Training 1/1 epoch (loss 2.8997): 8%|β | 20/250 [00:13<01:28, 2.59it/s]
Training 1/1 epoch (loss 2.9435): 8%|β | 20/250 [00:14<01:28, 2.59it/s]
Training 1/1 epoch (loss 2.9435): 8%|β | 21/250 [00:14<01:27, 2.62it/s]
Training 1/1 epoch (loss 2.8130): 8%|β | 21/250 [00:14<01:27, 2.62it/s]
Training 1/1 epoch (loss 2.8130): 9%|β | 22/250 [00:14<01:28, 2.56it/s]
Training 1/1 epoch (loss 2.9014): 9%|β | 22/250 [00:14<01:28, 2.56it/s]
Training 1/1 epoch (loss 2.9014): 9%|β | 23/250 [00:14<01:23, 2.72it/s]
Training 1/1 epoch (loss 2.7844): 9%|β | 23/250 [00:15<01:23, 2.72it/s]
Training 1/1 epoch (loss 2.7844): 10%|β | 24/250 [00:15<01:20, 2.80it/s]
Training 1/1 epoch (loss 2.8675): 10%|β | 24/250 [00:15<01:20, 2.80it/s]
Training 1/1 epoch (loss 2.8675): 10%|β | 25/250 [00:15<01:19, 2.82it/s]
Training 1/1 epoch (loss 2.7030): 10%|β | 25/250 [00:15<01:19, 2.82it/s]
Training 1/1 epoch (loss 2.7030): 10%|β | 26/250 [00:15<01:17, 2.89it/s]
Training 1/1 epoch (loss 2.8378): 10%|β | 26/250 [00:16<01:17, 2.89it/s]
Training 1/1 epoch (loss 2.8378): 11%|β | 27/250 [00:16<01:17, 2.87it/s]
Training 1/1 epoch (loss 2.6961): 11%|β | 27/250 [00:16<01:17, 2.87it/s]
Training 1/1 epoch (loss 2.6961): 11%|β | 28/250 [00:16<01:16, 2.90it/s]
Training 1/1 epoch (loss 2.7384): 11%|β | 28/250 [00:16<01:16, 2.90it/s]
Training 1/1 epoch (loss 2.7384): 12%|ββ | 29/250 [00:16<01:12, 3.05it/s]
Training 1/1 epoch (loss 2.8854): 12%|ββ | 29/250 [00:17<01:12, 3.05it/s]
Training 1/1 epoch (loss 2.8854): 12%|ββ | 30/250 [00:17<01:11, 3.07it/s]
Training 1/1 epoch (loss 2.6835): 12%|ββ | 30/250 [00:17<01:11, 3.07it/s]
Training 1/1 epoch (loss 2.6835): 12%|ββ | 31/250 [00:17<01:10, 3.09it/s]
Training 1/1 epoch (loss 2.8927): 12%|ββ | 31/250 [00:17<01:10, 3.09it/s]
Training 1/1 epoch (loss 2.8927): 13%|ββ | 32/250 [00:17<01:10, 3.07it/s]
Training 1/1 epoch (loss 3.0574): 13%|ββ | 32/250 [00:18<01:10, 3.07it/s]
Training 1/1 epoch (loss 3.0574): 13%|ββ | 33/250 [00:18<01:16, 2.82it/s]
Training 1/1 epoch (loss 2.7764): 13%|ββ | 33/250 [00:18<01:16, 2.82it/s]
Training 1/1 epoch (loss 2.7764): 14%|ββ | 34/250 [00:18<01:15, 2.85it/s]
Training 1/1 epoch (loss 2.8676): 14%|ββ | 34/250 [00:18<01:15, 2.85it/s]
Training 1/1 epoch (loss 2.8676): 14%|ββ | 35/250 [00:18<01:12, 2.97it/s]
Training 1/1 epoch (loss 2.7883): 14%|ββ | 35/250 [00:19<01:12, 2.97it/s]
Training 1/1 epoch (loss 2.7883): 14%|ββ | 36/250 [00:19<01:11, 3.00it/s]
Training 1/1 epoch (loss 2.6516): 14%|ββ | 36/250 [00:19<01:11, 3.00it/s]
Training 1/1 epoch (loss 2.6516): 15%|ββ | 37/250 [00:19<01:10, 3.03it/s]
Training 1/1 epoch (loss 2.9913): 15%|ββ | 37/250 [00:20<01:10, 3.03it/s]
Training 1/1 epoch (loss 2.9913): 15%|ββ | 38/250 [00:20<01:20, 2.62it/s]
Training 1/1 epoch (loss 2.9763): 15%|ββ | 38/250 [00:20<01:20, 2.62it/s]
Training 1/1 epoch (loss 2.9763): 16%|ββ | 39/250 [00:20<01:19, 2.67it/s]
Training 1/1 epoch (loss 3.0361): 16%|ββ | 39/250 [00:20<01:19, 2.67it/s]
Training 1/1 epoch (loss 3.0361): 16%|ββ | 40/250 [00:20<01:18, 2.68it/s]
Training 1/1 epoch (loss 2.8263): 16%|ββ | 40/250 [00:21<01:18, 2.68it/s]
Training 1/1 epoch (loss 2.8263): 16%|ββ | 41/250 [00:21<01:15, 2.78it/s]
Training 1/1 epoch (loss 2.6876): 16%|ββ | 41/250 [00:21<01:15, 2.78it/s]
Training 1/1 epoch (loss 2.6876): 17%|ββ | 42/250 [00:21<01:10, 2.93it/s]
Training 1/1 epoch (loss 2.7876): 17%|ββ | 42/250 [00:21<01:10, 2.93it/s]
Training 1/1 epoch (loss 2.7876): 17%|ββ | 43/250 [00:21<01:10, 2.96it/s]
Training 1/1 epoch (loss 2.8222): 17%|ββ | 43/250 [00:22<01:10, 2.96it/s]
Training 1/1 epoch (loss 2.8222): 18%|ββ | 44/250 [00:22<01:11, 2.89it/s]
Training 1/1 epoch (loss 2.8466): 18%|ββ | 44/250 [00:22<01:11, 2.89it/s]
Training 1/1 epoch (loss 2.8466): 18%|ββ | 45/250 [00:22<01:11, 2.85it/s]
Training 1/1 epoch (loss 2.9594): 18%|ββ | 45/250 [00:22<01:11, 2.85it/s]
Training 1/1 epoch (loss 2.9594): 18%|ββ | 46/250 [00:22<01:08, 2.97it/s]
Training 1/1 epoch (loss 2.7902): 18%|ββ | 46/250 [00:23<01:08, 2.97it/s]
Training 1/1 epoch (loss 2.7902): 19%|ββ | 47/250 [00:23<01:07, 3.02it/s]
Training 1/1 epoch (loss 2.7739): 19%|ββ | 47/250 [00:23<01:07, 3.02it/s]
Training 1/1 epoch (loss 2.7739): 19%|ββ | 48/250 [00:23<01:09, 2.90it/s]
Training 1/1 epoch (loss 2.5421): 19%|ββ | 48/250 [00:23<01:09, 2.90it/s]
Training 1/1 epoch (loss 2.5421): 20%|ββ | 49/250 [00:23<01:09, 2.89it/s]
Training 1/1 epoch (loss 2.7902): 20%|ββ | 49/250 [00:24<01:09, 2.89it/s]
Training 1/1 epoch (loss 2.7902): 20%|ββ | 50/250 [00:24<01:10, 2.84it/s]
Training 1/1 epoch (loss 2.7128): 20%|ββ | 50/250 [00:24<01:10, 2.84it/s]
Training 1/1 epoch (loss 2.7128): 20%|ββ | 51/250 [00:24<01:14, 2.67it/s]
Training 1/1 epoch (loss 2.7058): 20%|ββ | 51/250 [00:24<01:14, 2.67it/s]
Training 1/1 epoch (loss 2.7058): 21%|ββ | 52/250 [00:24<01:09, 2.85it/s]
Training 1/1 epoch (loss 2.6257): 21%|ββ | 52/250 [00:25<01:09, 2.85it/s]
Training 1/1 epoch (loss 2.6257): 21%|ββ | 53/250 [00:25<01:08, 2.87it/s]
Training 1/1 epoch (loss 2.7161): 21%|ββ | 53/250 [00:25<01:08, 2.87it/s]
Training 1/1 epoch (loss 2.7161): 22%|βββ | 54/250 [00:25<01:09, 2.84it/s]
Training 1/1 epoch (loss 2.8045): 22%|βββ | 54/250 [00:26<01:09, 2.84it/s]
Training 1/1 epoch (loss 2.8045): 22%|βββ | 55/250 [00:26<01:08, 2.85it/s]
Training 1/1 epoch (loss 2.3875): 22%|βββ | 55/250 [00:26<01:08, 2.85it/s]
Training 1/1 epoch (loss 2.3875): 22%|βββ | 56/250 [00:26<01:09, 2.80it/s]
Training 1/1 epoch (loss 2.8273): 22%|βββ | 56/250 [00:26<01:09, 2.80it/s]
Training 1/1 epoch (loss 2.8273): 23%|βββ | 57/250 [00:26<01:09, 2.79it/s]
Training 1/1 epoch (loss 2.6195): 23%|βββ | 57/250 [00:27<01:09, 2.79it/s]
Training 1/1 epoch (loss 2.6195): 23%|βββ | 58/250 [00:27<01:05, 2.93it/s]
Training 1/1 epoch (loss 2.6104): 23%|βββ | 58/250 [00:27<01:05, 2.93it/s]
Training 1/1 epoch (loss 2.6104): 24%|βββ | 59/250 [00:27<01:02, 3.04it/s]
Training 1/1 epoch (loss 2.8397): 24%|βββ | 59/250 [00:27<01:02, 3.04it/s]
Training 1/1 epoch (loss 2.8397): 24%|βββ | 60/250 [00:27<01:05, 2.92it/s]
Training 1/1 epoch (loss 2.6690): 24%|βββ | 60/250 [00:28<01:05, 2.92it/s]
Training 1/1 epoch (loss 2.6690): 24%|βββ | 61/250 [00:28<01:07, 2.79it/s]
Training 1/1 epoch (loss 2.7013): 24%|βββ | 61/250 [00:28<01:07, 2.79it/s]
Training 1/1 epoch (loss 2.7013): 25%|βββ | 62/250 [00:28<01:07, 2.79it/s]
Training 1/1 epoch (loss 2.8418): 25%|βββ | 62/250 [00:28<01:07, 2.79it/s]
Training 1/1 epoch (loss 2.8418): 25%|βββ | 63/250 [00:28<01:04, 2.90it/s]
Training 1/1 epoch (loss 2.5476): 25%|βββ | 63/250 [00:29<01:04, 2.90it/s]
Training 1/1 epoch (loss 2.5476): 26%|βββ | 64/250 [00:29<01:03, 2.94it/s]
Training 1/1 epoch (loss 2.7415): 26%|βββ | 64/250 [00:29<01:03, 2.94it/s]
Training 1/1 epoch (loss 2.7415): 26%|βββ | 65/250 [00:29<01:03, 2.93it/s]
Training 1/1 epoch (loss 2.7800): 26%|βββ | 65/250 [00:29<01:03, 2.93it/s]
Training 1/1 epoch (loss 2.7800): 26%|βββ | 66/250 [00:29<01:03, 2.89it/s]
Training 1/1 epoch (loss 2.5102): 26%|βββ | 66/250 [00:30<01:03, 2.89it/s]
Training 1/1 epoch (loss 2.5102): 27%|βββ | 67/250 [00:30<01:05, 2.78it/s]
Training 1/1 epoch (loss 2.5412): 27%|βββ | 67/250 [00:30<01:05, 2.78it/s]
Training 1/1 epoch (loss 2.5412): 27%|βββ | 68/250 [00:30<01:08, 2.66it/s]
Training 1/1 epoch (loss 2.6111): 27%|βββ | 68/250 [00:31<01:08, 2.66it/s]
Training 1/1 epoch (loss 2.6111): 28%|βββ | 69/250 [00:31<01:09, 2.59it/s]
Training 1/1 epoch (loss 2.8257): 28%|βββ | 69/250 [00:31<01:09, 2.59it/s]
Training 1/1 epoch (loss 2.8257): 28%|βββ | 70/250 [00:31<01:05, 2.74it/s]
Training 1/1 epoch (loss 2.8727): 28%|βββ | 70/250 [00:31<01:05, 2.74it/s]
Training 1/1 epoch (loss 2.8727): 28%|βββ | 71/250 [00:31<01:02, 2.88it/s]
Training 1/1 epoch (loss 2.6406): 28%|βββ | 71/250 [00:32<01:02, 2.88it/s]
Training 1/1 epoch (loss 2.6406): 29%|βββ | 72/250 [00:32<01:02, 2.84it/s]
Training 1/1 epoch (loss 2.6690): 29%|βββ | 72/250 [00:32<01:02, 2.84it/s]
Training 1/1 epoch (loss 2.6690): 29%|βββ | 73/250 [00:32<01:03, 2.79it/s]
Training 1/1 epoch (loss 2.6996): 29%|βββ | 73/250 [00:32<01:03, 2.79it/s]
Training 1/1 epoch (loss 2.6996): 30%|βββ | 74/250 [00:32<01:01, 2.87it/s]
Training 1/1 epoch (loss 2.9194): 30%|βββ | 74/250 [00:33<01:01, 2.87it/s]
Training 1/1 epoch (loss 2.9194): 30%|βββ | 75/250 [00:33<00:59, 2.93it/s]
Training 1/1 epoch (loss 2.7140): 30%|βββ | 75/250 [00:33<00:59, 2.93it/s]
Training 1/1 epoch (loss 2.7140): 30%|βββ | 76/250 [00:33<00:57, 3.00it/s]
Training 1/1 epoch (loss 2.6595): 30%|βββ | 76/250 [00:33<00:57, 3.00it/s]
Training 1/1 epoch (loss 2.6595): 31%|βββ | 77/250 [00:33<01:08, 2.52it/s]
Training 1/1 epoch (loss 2.6982): 31%|βββ | 77/250 [00:34<01:08, 2.52it/s]
Training 1/1 epoch (loss 2.6982): 31%|βββ | 78/250 [00:34<01:08, 2.51it/s]
Training 1/1 epoch (loss 2.8596): 31%|βββ | 78/250 [00:34<01:08, 2.51it/s]
Training 1/1 epoch (loss 2.8596): 32%|ββββ | 79/250 [00:34<01:06, 2.58it/s]
Training 1/1 epoch (loss 2.7123): 32%|ββββ | 79/250 [00:34<01:06, 2.58it/s]
Training 1/1 epoch (loss 2.7123): 32%|ββββ | 80/250 [00:34<01:02, 2.71it/s]
Training 1/1 epoch (loss 2.6964): 32%|ββββ | 80/250 [00:35<01:02, 2.71it/s]
Training 1/1 epoch (loss 2.6964): 32%|ββββ | 81/250 [00:35<00:59, 2.82it/s]
Training 1/1 epoch (loss 2.9901): 32%|ββββ | 81/250 [00:35<00:59, 2.82it/s]
Training 1/1 epoch (loss 2.9901): 33%|ββββ | 82/250 [00:35<00:59, 2.83it/s]
Training 1/1 epoch (loss 2.7908): 33%|ββββ | 82/250 [00:35<00:59, 2.83it/s]
Training 1/1 epoch (loss 2.7908): 33%|ββββ | 83/250 [00:35<00:57, 2.92it/s]
Training 1/1 epoch (loss 2.8891): 33%|ββββ | 83/250 [00:36<00:57, 2.92it/s]
Training 1/1 epoch (loss 2.8891): 34%|ββββ | 84/250 [00:36<00:58, 2.84it/s]
Training 1/1 epoch (loss 2.8744): 34%|ββββ | 84/250 [00:36<00:58, 2.84it/s]
Training 1/1 epoch (loss 2.8744): 34%|ββββ | 85/250 [00:36<00:58, 2.83it/s]
Training 1/1 epoch (loss 2.9466): 34%|ββββ | 85/250 [00:37<00:58, 2.83it/s]
Training 1/1 epoch (loss 2.9466): 34%|ββββ | 86/250 [00:37<01:01, 2.68it/s]
Training 1/1 epoch (loss 2.7889): 34%|ββββ | 86/250 [00:37<01:01, 2.68it/s]
Training 1/1 epoch (loss 2.7889): 35%|ββββ | 87/250 [00:37<00:57, 2.84it/s]
Training 1/1 epoch (loss 2.7938): 35%|ββββ | 87/250 [00:37<00:57, 2.84it/s]
Training 1/1 epoch (loss 2.7938): 35%|ββββ | 88/250 [00:37<00:56, 2.87it/s]
Training 1/1 epoch (loss 2.7574): 35%|ββββ | 88/250 [00:38<00:56, 2.87it/s]
Training 1/1 epoch (loss 2.7574): 36%|ββββ | 89/250 [00:38<00:57, 2.78it/s]
Training 1/1 epoch (loss 2.8993): 36%|ββββ | 89/250 [00:38<00:57, 2.78it/s]
Training 1/1 epoch (loss 2.8993): 36%|ββββ | 90/250 [00:38<00:56, 2.81it/s]
Training 1/1 epoch (loss 2.7021): 36%|ββββ | 90/250 [00:38<00:56, 2.81it/s]
Training 1/1 epoch (loss 2.7021): 36%|ββββ | 91/250 [00:38<00:53, 2.96it/s]
Training 1/1 epoch (loss 2.7516): 36%|ββββ | 91/250 [00:39<00:53, 2.96it/s]
Training 1/1 epoch (loss 2.7516): 37%|ββββ | 92/250 [00:39<00:50, 3.12it/s]
Training 1/1 epoch (loss 2.8353): 37%|ββββ | 92/250 [00:39<00:50, 3.12it/s]
Training 1/1 epoch (loss 2.8353): 37%|ββββ | 93/250 [00:39<00:51, 3.03it/s]
Training 1/1 epoch (loss 2.8563): 37%|ββββ | 93/250 [00:39<00:51, 3.03it/s]
Training 1/1 epoch (loss 2.8563): 38%|ββββ | 94/250 [00:39<00:50, 3.10it/s]
Training 1/1 epoch (loss 2.7530): 38%|ββββ | 94/250 [00:40<00:50, 3.10it/s]
Training 1/1 epoch (loss 2.7530): 38%|ββββ | 95/250 [00:40<00:51, 3.00it/s]
Training 1/1 epoch (loss 2.6618): 38%|ββββ | 95/250 [00:40<00:51, 3.00it/s]
Training 1/1 epoch (loss 2.6618): 38%|ββββ | 96/250 [00:40<00:51, 2.99it/s]
Training 1/1 epoch (loss 2.7826): 38%|ββββ | 96/250 [00:40<00:51, 2.99it/s]
Training 1/1 epoch (loss 2.7826): 39%|ββββ | 97/250 [00:40<00:53, 2.86it/s]
Training 1/1 epoch (loss 3.0240): 39%|ββββ | 97/250 [00:41<00:53, 2.86it/s]
Training 1/1 epoch (loss 3.0240): 39%|ββββ | 98/250 [00:41<00:52, 2.90it/s]
Training 1/1 epoch (loss 2.7730): 39%|ββββ | 98/250 [00:41<00:52, 2.90it/s]
Training 1/1 epoch (loss 2.7730): 40%|ββββ | 99/250 [00:41<00:50, 3.01it/s]
Training 1/1 epoch (loss 2.7565): 40%|ββββ | 99/250 [00:41<00:50, 3.01it/s]
Training 1/1 epoch (loss 2.7565): 40%|ββββ | 100/250 [00:41<00:49, 3.02it/s]
Training 1/1 epoch (loss 2.6436): 40%|ββββ | 100/250 [00:42<00:49, 3.02it/s]
Training 1/1 epoch (loss 2.6436): 40%|ββββ | 101/250 [00:42<00:51, 2.90it/s]
Training 1/1 epoch (loss 2.5813): 40%|ββββ | 101/250 [00:42<00:51, 2.90it/s]
Training 1/1 epoch (loss 2.5813): 41%|ββββ | 102/250 [00:42<00:50, 2.92it/s]
Training 1/1 epoch (loss 2.6972): 41%|ββββ | 102/250 [00:42<00:50, 2.92it/s]
Training 1/1 epoch (loss 2.6972): 41%|ββββ | 103/250 [00:42<00:51, 2.86it/s]
Training 1/1 epoch (loss 2.7830): 41%|ββββ | 103/250 [00:43<00:51, 2.86it/s]
Training 1/1 epoch (loss 2.7830): 42%|βββββ | 104/250 [00:43<00:51, 2.85it/s]
Training 1/1 epoch (loss 2.6746): 42%|βββββ | 104/250 [00:43<00:51, 2.85it/s]
Training 1/1 epoch (loss 2.6746): 42%|βββββ | 105/250 [00:43<00:53, 2.70it/s]
Training 1/1 epoch (loss 2.8233): 42%|βββββ | 105/250 [00:44<00:53, 2.70it/s]
Training 1/1 epoch (loss 2.8233): 42%|βββββ | 106/250 [00:44<00:56, 2.56it/s]
Training 1/1 epoch (loss 2.6857): 42%|βββββ | 106/250 [00:44<00:56, 2.56it/s]
Training 1/1 epoch (loss 2.6857): 43%|βββββ | 107/250 [00:44<00:53, 2.65it/s]
Training 1/1 epoch (loss 2.7455): 43%|βββββ | 107/250 [00:44<00:53, 2.65it/s]
Training 1/1 epoch (loss 2.7455): 43%|βββββ | 108/250 [00:44<00:53, 2.67it/s]
Training 1/1 epoch (loss 2.6864): 43%|βββββ | 108/250 [00:45<00:53, 2.67it/s]
Training 1/1 epoch (loss 2.6864): 44%|βββββ | 109/250 [00:45<00:55, 2.54it/s]
Training 1/1 epoch (loss 2.6578): 44%|βββββ | 109/250 [00:45<00:55, 2.54it/s]
Training 1/1 epoch (loss 2.6578): 44%|βββββ | 110/250 [00:45<00:54, 2.58it/s]
Training 1/1 epoch (loss 2.7856): 44%|βββββ | 110/250 [00:45<00:54, 2.58it/s]
Training 1/1 epoch (loss 2.7856): 44%|βββββ | 111/250 [00:45<00:53, 2.58it/s]
Training 1/1 epoch (loss 2.7794): 44%|βββββ | 111/250 [00:46<00:53, 2.58it/s]
Training 1/1 epoch (loss 2.7794): 45%|βββββ | 112/250 [00:46<00:56, 2.45it/s]
Training 1/1 epoch (loss 2.5878): 45%|βββββ | 112/250 [00:46<00:56, 2.45it/s]
Training 1/1 epoch (loss 2.5878): 45%|βββββ | 113/250 [00:46<00:52, 2.63it/s]
Training 1/1 epoch (loss 3.0365): 45%|βββββ | 113/250 [00:47<00:52, 2.63it/s]
Training 1/1 epoch (loss 3.0365): 46%|βββββ | 114/250 [00:47<00:48, 2.83it/s]
Training 1/1 epoch (loss 2.9455): 46%|βββββ | 114/250 [00:47<00:48, 2.83it/s]
Training 1/1 epoch (loss 2.9455): 46%|βββββ | 115/250 [00:47<00:46, 2.87it/s]
Training 1/1 epoch (loss 2.6709): 46%|βββββ | 115/250 [00:47<00:46, 2.87it/s]
Training 1/1 epoch (loss 2.6709): 46%|βββββ | 116/250 [00:47<00:46, 2.89it/s]
Training 1/1 epoch (loss 2.6349): 46%|βββββ | 116/250 [00:48<00:46, 2.89it/s]
Training 1/1 epoch (loss 2.6349): 47%|βββββ | 117/250 [00:48<00:47, 2.82it/s]
Training 1/1 epoch (loss 2.2828): 47%|βββββ | 117/250 [00:48<00:47, 2.82it/s]
Training 1/1 epoch (loss 2.2828): 47%|βββββ | 118/250 [00:48<00:45, 2.89it/s]
Training 1/1 epoch (loss 2.5880): 47%|βββββ | 118/250 [00:48<00:45, 2.89it/s]
Training 1/1 epoch (loss 2.5880): 48%|βββββ | 119/250 [00:48<00:44, 2.92it/s]
Training 1/1 epoch (loss 2.7548): 48%|βββββ | 119/250 [00:49<00:44, 2.92it/s]
Training 1/1 epoch (loss 2.7548): 48%|βββββ | 120/250 [00:49<00:44, 2.94it/s]
Training 1/1 epoch (loss 2.5070): 48%|βββββ | 120/250 [00:49<00:44, 2.94it/s]
Training 1/1 epoch (loss 2.5070): 48%|βββββ | 121/250 [00:49<00:44, 2.93it/s]
Training 1/1 epoch (loss 2.6535): 48%|βββββ | 121/250 [00:49<00:44, 2.93it/s]
Training 1/1 epoch (loss 2.6535): 49%|βββββ | 122/250 [00:49<00:42, 3.04it/s]
Training 1/1 epoch (loss 2.6384): 49%|βββββ | 122/250 [00:50<00:42, 3.04it/s]
Training 1/1 epoch (loss 2.6384): 49%|βββββ | 123/250 [00:50<00:43, 2.92it/s]
Training 1/1 epoch (loss 2.6073): 49%|βββββ | 123/250 [00:50<00:43, 2.92it/s]
Training 1/1 epoch (loss 2.6073): 50%|βββββ | 124/250 [00:50<00:44, 2.86it/s]
Training 1/1 epoch (loss 2.7094): 50%|βββββ | 124/250 [00:50<00:44, 2.86it/s]
Training 1/1 epoch (loss 2.7094): 50%|βββββ | 125/250 [00:50<00:43, 2.87it/s]
Training 1/1 epoch (loss 2.6129): 50%|βββββ | 125/250 [00:51<00:43, 2.87it/s]
Training 1/1 epoch (loss 2.6129): 50%|βββββ | 126/250 [00:51<00:42, 2.95it/s]
Training 1/1 epoch (loss 2.7364): 50%|βββββ | 126/250 [00:51<00:42, 2.95it/s]
Training 1/1 epoch (loss 2.7364): 51%|βββββ | 127/250 [00:51<00:43, 2.80it/s]
Training 1/1 epoch (loss 2.7520): 51%|βββββ | 127/250 [00:51<00:43, 2.80it/s]
Training 1/1 epoch (loss 2.7520): 51%|βββββ | 128/250 [00:51<00:43, 2.83it/s]
Training 1/1 epoch (loss 2.6546): 51%|βββββ | 128/250 [00:52<00:43, 2.83it/s]
Training 1/1 epoch (loss 2.6546): 52%|ββββββ | 129/250 [00:52<00:42, 2.82it/s]
Training 1/1 epoch (loss 3.1195): 52%|ββββββ | 129/250 [00:52<00:42, 2.82it/s]
Training 1/1 epoch (loss 3.1195): 52%|ββββββ | 130/250 [00:52<00:44, 2.69it/s]
Training 1/1 epoch (loss 2.7957): 52%|ββββββ | 130/250 [00:52<00:44, 2.69it/s]
Training 1/1 epoch (loss 2.7957): 52%|ββββββ | 131/250 [00:52<00:41, 2.89it/s]
Training 1/1 epoch (loss 2.7223): 52%|ββββββ | 131/250 [00:53<00:41, 2.89it/s]
Training 1/1 epoch (loss 2.7223): 53%|ββββββ | 132/250 [00:53<00:40, 2.94it/s]
Training 1/1 epoch (loss 2.6676): 53%|ββββββ | 132/250 [00:53<00:40, 2.94it/s]
Training 1/1 epoch (loss 2.6676): 53%|ββββββ | 133/250 [00:53<00:42, 2.76it/s]
Training 1/1 epoch (loss 2.8371): 53%|ββββββ | 133/250 [00:53<00:42, 2.76it/s]
Training 1/1 epoch (loss 2.8371): 54%|ββββββ | 134/250 [00:53<00:40, 2.89it/s]
Training 1/1 epoch (loss 2.5318): 54%|ββββββ | 134/250 [00:54<00:40, 2.89it/s]
Training 1/1 epoch (loss 2.5318): 54%|ββββββ | 135/250 [00:54<00:39, 2.90it/s]
Training 1/1 epoch (loss 2.8636): 54%|ββββββ | 135/250 [00:54<00:39, 2.90it/s]
Training 1/1 epoch (loss 2.8636): 54%|ββββββ | 136/250 [00:54<00:40, 2.79it/s]
Training 1/1 epoch (loss 2.6957): 54%|ββββββ | 136/250 [00:55<00:40, 2.79it/s]
Training 1/1 epoch (loss 2.6957): 55%|ββββββ | 137/250 [00:55<00:40, 2.81it/s]
Training 1/1 epoch (loss 2.8642): 55%|ββββββ | 137/250 [00:55<00:40, 2.81it/s]
Training 1/1 epoch (loss 2.8642): 55%|ββββββ | 138/250 [00:55<00:37, 2.99it/s]
Training 1/1 epoch (loss 2.5317): 55%|ββββββ | 138/250 [00:55<00:37, 2.99it/s]
Training 1/1 epoch (loss 2.5317): 56%|ββββββ | 139/250 [00:55<00:37, 2.95it/s]
Training 1/1 epoch (loss 2.8716): 56%|ββββββ | 139/250 [00:56<00:37, 2.95it/s]
Training 1/1 epoch (loss 2.8716): 56%|ββββββ | 140/250 [00:56<00:37, 2.95it/s]
Training 1/1 epoch (loss 2.7362): 56%|ββββββ | 140/250 [00:56<00:37, 2.95it/s]
Training 1/1 epoch (loss 2.7362): 56%|ββββββ | 141/250 [00:56<00:36, 2.95it/s]
Training 1/1 epoch (loss 2.9054): 56%|ββββββ | 141/250 [00:56<00:36, 2.95it/s]
Training 1/1 epoch (loss 2.9054): 57%|ββββββ | 142/250 [00:56<00:38, 2.80it/s]
Training 1/1 epoch (loss 2.7129): 57%|ββββββ | 142/250 [00:57<00:38, 2.80it/s]
Training 1/1 epoch (loss 2.7129): 57%|ββββββ | 143/250 [00:57<00:36, 2.97it/s]
Training 1/1 epoch (loss 2.7051): 57%|ββββββ | 143/250 [00:57<00:36, 2.97it/s]
Training 1/1 epoch (loss 2.7051): 58%|ββββββ | 144/250 [00:57<00:35, 2.96it/s]
Training 1/1 epoch (loss 2.6919): 58%|ββββββ | 144/250 [00:57<00:35, 2.96it/s]
Training 1/1 epoch (loss 2.6919): 58%|ββββββ | 145/250 [00:57<00:37, 2.82it/s]
Training 1/1 epoch (loss 2.7285): 58%|ββββββ | 145/250 [00:58<00:37, 2.82it/s]
Training 1/1 epoch (loss 2.7285): 58%|ββββββ | 146/250 [00:58<00:36, 2.87it/s]
Training 1/1 epoch (loss 2.8703): 58%|ββββββ | 146/250 [00:58<00:36, 2.87it/s]
Training 1/1 epoch (loss 2.8703): 59%|ββββββ | 147/250 [00:58<00:36, 2.85it/s]
Training 1/1 epoch (loss 2.7281): 59%|ββββββ | 147/250 [00:58<00:36, 2.85it/s]
Training 1/1 epoch (loss 2.7281): 59%|ββββββ | 148/250 [00:58<00:34, 2.92it/s]
Training 1/1 epoch (loss 2.8347): 59%|ββββββ | 148/250 [00:59<00:34, 2.92it/s]
Training 1/1 epoch (loss 2.8347): 60%|ββββββ | 149/250 [00:59<00:33, 3.05it/s]
Training 1/1 epoch (loss 2.7203): 60%|ββββββ | 149/250 [00:59<00:33, 3.05it/s]
Training 1/1 epoch (loss 2.7203): 60%|ββββββ | 150/250 [00:59<00:32, 3.08it/s]
Training 1/1 epoch (loss 2.7732): 60%|ββββββ | 150/250 [00:59<00:32, 3.08it/s]
Training 1/1 epoch (loss 2.7732): 60%|ββββββ | 151/250 [00:59<00:32, 3.03it/s]
Training 1/1 epoch (loss 2.8149): 60%|ββββββ | 151/250 [01:00<00:32, 3.03it/s]
Training 1/1 epoch (loss 2.8149): 61%|ββββββ | 152/250 [01:00<00:34, 2.84it/s]
Training 1/1 epoch (loss 2.7193): 61%|ββββββ | 152/250 [01:00<00:34, 2.84it/s]
Training 1/1 epoch (loss 2.7193): 61%|ββββββ | 153/250 [01:00<00:34, 2.78it/s]
Training 1/1 epoch (loss 2.7369): 61%|ββββββ | 153/250 [01:00<00:34, 2.78it/s]
Training 1/1 epoch (loss 2.7369): 62%|βββββββ | 154/250 [01:00<00:35, 2.70it/s]
Training 1/1 epoch (loss 2.9504): 62%|βββββββ | 154/250 [01:01<00:35, 2.70it/s]
Training 1/1 epoch (loss 2.9504): 62%|βββββββ | 155/250 [01:01<00:35, 2.64it/s]
Training 1/1 epoch (loss 2.7171): 62%|βββββββ | 155/250 [01:01<00:35, 2.64it/s]
Training 1/1 epoch (loss 2.7171): 62%|βββββββ | 156/250 [01:01<00:35, 2.61it/s]
Training 1/1 epoch (loss 2.7810): 62%|βββββββ | 156/250 [01:02<00:35, 2.61it/s]
Training 1/1 epoch (loss 2.7810): 63%|βββββββ | 157/250 [01:02<00:35, 2.65it/s]
Training 1/1 epoch (loss 2.7306): 63%|βββββββ | 157/250 [01:02<00:35, 2.65it/s]
Training 1/1 epoch (loss 2.7306): 63%|βββββββ | 158/250 [01:02<00:34, 2.69it/s]
Training 1/1 epoch (loss 2.9774): 63%|βββββββ | 158/250 [01:02<00:34, 2.69it/s]
Training 1/1 epoch (loss 2.9774): 64%|βββββββ | 159/250 [01:02<00:32, 2.79it/s]
Training 1/1 epoch (loss 2.4633): 64%|βββββββ | 159/250 [01:03<00:32, 2.79it/s]
Training 1/1 epoch (loss 2.4633): 64%|βββββββ | 160/250 [01:03<00:31, 2.82it/s]
Training 1/1 epoch (loss 3.0724): 64%|βββββββ | 160/250 [01:03<00:31, 2.82it/s]
Training 1/1 epoch (loss 3.0724): 64%|βββββββ | 161/250 [01:03<00:32, 2.76it/s]
Training 1/1 epoch (loss 2.6242): 64%|βββββββ | 161/250 [01:04<00:32, 2.76it/s]
Training 1/1 epoch (loss 2.6242): 65%|βββββββ | 162/250 [01:04<00:36, 2.40it/s]
Training 1/1 epoch (loss 2.8966): 65%|βββββββ | 162/250 [01:04<00:36, 2.40it/s]
Training 1/1 epoch (loss 2.8966): 65%|βββββββ | 163/250 [01:04<00:34, 2.49it/s]
Training 1/1 epoch (loss 2.7619): 65%|βββββββ | 163/250 [01:04<00:34, 2.49it/s]
Training 1/1 epoch (loss 2.7619): 66%|βββββββ | 164/250 [01:04<00:33, 2.55it/s]
Training 1/1 epoch (loss 2.6469): 66%|βββββββ | 164/250 [01:05<00:33, 2.55it/s]
Training 1/1 epoch (loss 2.6469): 66%|βββββββ | 165/250 [01:05<00:31, 2.73it/s]
Training 1/1 epoch (loss 2.5406): 66%|βββββββ | 165/250 [01:05<00:31, 2.73it/s]
Training 1/1 epoch (loss 2.5406): 66%|βββββββ | 166/250 [01:05<00:30, 2.78it/s]
Training 1/1 epoch (loss 2.7249): 66%|βββββββ | 166/250 [01:05<00:30, 2.78it/s]
Training 1/1 epoch (loss 2.7249): 67%|βββββββ | 167/250 [01:05<00:30, 2.74it/s]
Training 1/1 epoch (loss 2.7109): 67%|βββββββ | 167/250 [01:06<00:30, 2.74it/s]
Training 1/1 epoch (loss 2.7109): 67%|βββββββ | 168/250 [01:06<00:29, 2.76it/s]
Training 1/1 epoch (loss 2.8027): 67%|βββββββ | 168/250 [01:06<00:29, 2.76it/s]
Training 1/1 epoch (loss 2.8027): 68%|βββββββ | 169/250 [01:06<00:30, 2.70it/s]
Training 1/1 epoch (loss 2.6236): 68%|βββββββ | 169/250 [01:07<00:30, 2.70it/s]
Training 1/1 epoch (loss 2.6236): 68%|βββββββ | 170/250 [01:07<00:31, 2.56it/s]
Training 1/1 epoch (loss 2.9005): 68%|βββββββ | 170/250 [01:07<00:31, 2.56it/s]
Training 1/1 epoch (loss 2.9005): 68%|βββββββ | 171/250 [01:07<00:29, 2.70it/s]
Training 1/1 epoch (loss 2.6634): 68%|βββββββ | 171/250 [01:07<00:29, 2.70it/s]
Training 1/1 epoch (loss 2.6634): 69%|βββββββ | 172/250 [01:07<00:28, 2.77it/s]
Training 1/1 epoch (loss 2.3324): 69%|βββββββ | 172/250 [01:07<00:28, 2.77it/s]
Training 1/1 epoch (loss 2.3324): 69%|βββββββ | 173/250 [01:07<00:26, 2.88it/s]
Training 1/1 epoch (loss 2.8038): 69%|βββββββ | 173/250 [01:08<00:26, 2.88it/s]
Training 1/1 epoch (loss 2.8038): 70%|βββββββ | 174/250 [01:08<00:25, 2.94it/s]
Training 1/1 epoch (loss 2.7558): 70%|βββββββ | 174/250 [01:08<00:25, 2.94it/s]
Training 1/1 epoch (loss 2.7558): 70%|βββββββ | 175/250 [01:08<00:25, 2.91it/s]
Training 1/1 epoch (loss 2.6259): 70%|βββββββ | 175/250 [01:08<00:25, 2.91it/s]
Training 1/1 epoch (loss 2.6259): 70%|βββββββ | 176/250 [01:08<00:24, 3.02it/s]
Training 1/1 epoch (loss 2.7150): 70%|βββββββ | 176/250 [01:09<00:24, 3.02it/s]
Training 1/1 epoch (loss 2.7150): 71%|βββββββ | 177/250 [01:09<00:24, 2.99it/s]
Training 1/1 epoch (loss 2.5410): 71%|βββββββ | 177/250 [01:09<00:24, 2.99it/s]
Training 1/1 epoch (loss 2.5410): 71%|βββββββ | 178/250 [01:09<00:23, 3.02it/s]
Training 1/1 epoch (loss 2.9172): 71%|βββββββ | 178/250 [01:09<00:23, 3.02it/s]
Training 1/1 epoch (loss 2.9172): 72%|ββββββββ | 179/250 [01:09<00:23, 3.09it/s]
Training 1/1 epoch (loss 2.8098): 72%|ββββββββ | 179/250 [01:10<00:23, 3.09it/s]
Training 1/1 epoch (loss 2.8098): 72%|ββββββββ | 180/250 [01:10<00:23, 2.99it/s]
Training 1/1 epoch (loss 2.5417): 72%|ββββββββ | 180/250 [01:10<00:23, 2.99it/s]
Training 1/1 epoch (loss 2.5417): 72%|ββββββββ | 181/250 [01:10<00:23, 2.96it/s]
Training 1/1 epoch (loss 2.7776): 72%|ββββββββ | 181/250 [01:10<00:23, 2.96it/s]
Training 1/1 epoch (loss 2.7776): 73%|ββββββββ | 182/250 [01:10<00:22, 3.05it/s]
Training 1/1 epoch (loss 2.6927): 73%|ββββββββ | 182/250 [01:11<00:22, 3.05it/s]
Training 1/1 epoch (loss 2.6927): 73%|ββββββββ | 183/250 [01:11<00:22, 2.97it/s]
Training 1/1 epoch (loss 2.6127): 73%|ββββββββ | 183/250 [01:11<00:22, 2.97it/s]
Training 1/1 epoch (loss 2.6127): 74%|ββββββββ | 184/250 [01:11<00:22, 2.93it/s]
Training 1/1 epoch (loss 2.5833): 74%|ββββββββ | 184/250 [01:12<00:22, 2.93it/s]
Training 1/1 epoch (loss 2.5833): 74%|ββββββββ | 185/250 [01:12<00:23, 2.72it/s]
Training 1/1 epoch (loss 2.7380): 74%|ββββββββ | 185/250 [01:12<00:23, 2.72it/s]
Training 1/1 epoch (loss 2.7380): 74%|ββββββββ | 186/250 [01:12<00:23, 2.69it/s]
Training 1/1 epoch (loss 2.5186): 74%|ββββββββ | 186/250 [01:12<00:23, 2.69it/s]
Training 1/1 epoch (loss 2.5186): 75%|ββββββββ | 187/250 [01:12<00:23, 2.72it/s]
Training 1/1 epoch (loss 2.8502): 75%|ββββββββ | 187/250 [01:13<00:23, 2.72it/s]
Training 1/1 epoch (loss 2.8502): 75%|ββββββββ | 188/250 [01:13<00:21, 2.88it/s]
Training 1/1 epoch (loss 2.6304): 75%|ββββββββ | 188/250 [01:13<00:21, 2.88it/s]
Training 1/1 epoch (loss 2.6304): 76%|ββββββββ | 189/250 [01:13<00:20, 2.97it/s]
Training 1/1 epoch (loss 2.6835): 76%|ββββββββ | 189/250 [01:13<00:20, 2.97it/s]
Training 1/1 epoch (loss 2.6835): 76%|ββββββββ | 190/250 [01:13<00:21, 2.80it/s]
Training 1/1 epoch (loss 2.6515): 76%|ββββββββ | 190/250 [01:14<00:21, 2.80it/s]
Training 1/1 epoch (loss 2.6515): 76%|ββββββββ | 191/250 [01:14<00:21, 2.75it/s]
Training 1/1 epoch (loss 2.7262): 76%|ββββββββ | 191/250 [01:14<00:21, 2.75it/s]
Training 1/1 epoch (loss 2.7262): 77%|ββββββββ | 192/250 [01:14<00:22, 2.61it/s]
Training 1/1 epoch (loss 2.8937): 77%|ββββββββ | 192/250 [01:14<00:22, 2.61it/s]
Training 1/1 epoch (loss 2.8937): 77%|ββββββββ | 193/250 [01:14<00:21, 2.68it/s]
Training 1/1 epoch (loss 2.6253): 77%|ββββββββ | 193/250 [01:15<00:21, 2.68it/s]
Training 1/1 epoch (loss 2.6253): 78%|ββββββββ | 194/250 [01:15<00:20, 2.76it/s]
Training 1/1 epoch (loss 2.6631): 78%|ββββββββ | 194/250 [01:15<00:20, 2.76it/s]
Training 1/1 epoch (loss 2.6631): 78%|ββββββββ | 195/250 [01:15<00:18, 2.91it/s]
Training 1/1 epoch (loss 2.6318): 78%|ββββββββ | 195/250 [01:15<00:18, 2.91it/s]
Training 1/1 epoch (loss 2.6318): 78%|ββββββββ | 196/250 [01:15<00:18, 2.99it/s]
Training 1/1 epoch (loss 2.7193): 78%|ββββββββ | 196/250 [01:16<00:18, 2.99it/s]
Training 1/1 epoch (loss 2.7193): 79%|ββββββββ | 197/250 [01:16<00:17, 2.99it/s]
Training 1/1 epoch (loss 2.6242): 79%|ββββββββ | 197/250 [01:16<00:17, 2.99it/s]
Training 1/1 epoch (loss 2.6242): 79%|ββββββββ | 198/250 [01:16<00:17, 2.98it/s]
Training 1/1 epoch (loss 2.9030): 79%|ββββββββ | 198/250 [01:16<00:17, 2.98it/s]
Training 1/1 epoch (loss 2.9030): 80%|ββββββββ | 199/250 [01:16<00:16, 3.12it/s]
Training 1/1 epoch (loss 2.5614): 80%|ββββββββ | 199/250 [01:17<00:16, 3.12it/s]
Training 1/1 epoch (loss 2.5614): 80%|ββββββββ | 200/250 [01:17<00:16, 2.99it/s]
Training 1/1 epoch (loss 2.7062): 80%|ββββββββ | 200/250 [01:17<00:16, 2.99it/s]
Training 1/1 epoch (loss 2.7062): 80%|ββββββββ | 201/250 [01:17<00:17, 2.86it/s]
Training 1/1 epoch (loss 2.8739): 80%|ββββββββ | 201/250 [01:17<00:17, 2.86it/s]
Training 1/1 epoch (loss 2.8739): 81%|ββββββββ | 202/250 [01:17<00:16, 2.91it/s]
Training 1/1 epoch (loss 2.6527): 81%|ββββββββ | 202/250 [01:18<00:16, 2.91it/s]
Training 1/1 epoch (loss 2.6527): 81%|ββββββββ | 203/250 [01:18<00:15, 2.96it/s]
Training 1/1 epoch (loss 2.7072): 81%|ββββββββ | 203/250 [01:18<00:15, 2.96it/s]
Training 1/1 epoch (loss 2.7072): 82%|βββββββββ | 204/250 [01:18<00:16, 2.82it/s]
Training 1/1 epoch (loss 2.8000): 82%|βββββββββ | 204/250 [01:19<00:16, 2.82it/s]
Training 1/1 epoch (loss 2.8000): 82%|βββββββββ | 205/250 [01:19<00:15, 2.95it/s]
Training 1/1 epoch (loss 2.5312): 82%|βββββββββ | 205/250 [01:19<00:15, 2.95it/s]
Training 1/1 epoch (loss 2.5312): 82%|βββββββββ | 206/250 [01:19<00:14, 3.02it/s]
Training 1/1 epoch (loss 2.5572): 82%|βββββββββ | 206/250 [01:19<00:14, 3.02it/s]
Training 1/1 epoch (loss 2.5572): 83%|βββββββββ | 207/250 [01:19<00:13, 3.09it/s]
Training 1/1 epoch (loss 2.7255): 83%|βββββββββ | 207/250 [01:19<00:13, 3.09it/s]
Training 1/1 epoch (loss 2.7255): 83%|βββββββββ | 208/250 [01:19<00:13, 3.01it/s]
Training 1/1 epoch (loss 2.7690): 83%|βββββββββ | 208/250 [01:20<00:13, 3.01it/s]
Training 1/1 epoch (loss 2.7690): 84%|βββββββββ | 209/250 [01:20<00:15, 2.71it/s]
Training 1/1 epoch (loss 2.5639): 84%|βββββββββ | 209/250 [01:20<00:15, 2.71it/s]
Training 1/1 epoch (loss 2.5639): 84%|βββββββββ | 210/250 [01:20<00:14, 2.69it/s]
Training 1/1 epoch (loss 2.6051): 84%|βββββββββ | 210/250 [01:21<00:14, 2.69it/s]
Training 1/1 epoch (loss 2.6051): 84%|βββββββββ | 211/250 [01:21<00:13, 2.85it/s]
Training 1/1 epoch (loss 2.9637): 84%|βββββββββ | 211/250 [01:21<00:13, 2.85it/s]
Training 1/1 epoch (loss 2.9637): 85%|βββββββββ | 212/250 [01:21<00:12, 2.99it/s]
Training 1/1 epoch (loss 2.9755): 85%|βββββββββ | 212/250 [01:21<00:12, 2.99it/s]
Training 1/1 epoch (loss 2.9755): 85%|βββββββββ | 213/250 [01:21<00:13, 2.82it/s]
Training 1/1 epoch (loss 2.8653): 85%|βββββββββ | 213/250 [01:22<00:13, 2.82it/s]
Training 1/1 epoch (loss 2.8653): 86%|βββββββββ | 214/250 [01:22<00:12, 2.93it/s]
Training 1/1 epoch (loss 2.6640): 86%|βββββββββ | 214/250 [01:22<00:12, 2.93it/s]
Training 1/1 epoch (loss 2.6640): 86%|βββββββββ | 215/250 [01:22<00:12, 2.90it/s]
Training 1/1 epoch (loss 2.6206): 86%|βββββββββ | 215/250 [01:22<00:12, 2.90it/s]
Training 1/1 epoch (loss 2.6206): 86%|βββββββββ | 216/250 [01:22<00:13, 2.61it/s]
Training 1/1 epoch (loss 2.7458): 86%|βββββββββ | 216/250 [01:23<00:13, 2.61it/s]
Training 1/1 epoch (loss 2.7458): 87%|βββββββββ | 217/250 [01:23<00:12, 2.63it/s]
Training 1/1 epoch (loss 2.6784): 87%|βββββββββ | 217/250 [01:23<00:12, 2.63it/s]
Training 1/1 epoch (loss 2.6784): 87%|βββββββββ | 218/250 [01:23<00:12, 2.56it/s]
Training 1/1 epoch (loss 2.5921): 87%|βββββββββ | 218/250 [01:24<00:12, 2.56it/s]
Training 1/1 epoch (loss 2.5921): 88%|βββββββββ | 219/250 [01:24<00:12, 2.58it/s]
Training 1/1 epoch (loss 2.6005): 88%|βββββββββ | 219/250 [01:24<00:12, 2.58it/s]
Training 1/1 epoch (loss 2.6005): 88%|βββββββββ | 220/250 [01:24<00:11, 2.60it/s]
Training 1/1 epoch (loss 2.8086): 88%|βββββββββ | 220/250 [01:24<00:11, 2.60it/s]
Training 1/1 epoch (loss 2.8086): 88%|βββββββββ | 221/250 [01:24<00:11, 2.61it/s]
Training 1/1 epoch (loss 2.9320): 88%|βββββββββ | 221/250 [01:25<00:11, 2.61it/s]
Training 1/1 epoch (loss 2.9320): 89%|βββββββββ | 222/250 [01:25<00:09, 2.81it/s]
Training 1/1 epoch (loss 2.6922): 89%|βββββββββ | 222/250 [01:25<00:09, 2.81it/s]
Training 1/1 epoch (loss 2.6922): 89%|βββββββββ | 223/250 [01:25<00:09, 2.91it/s]
Training 1/1 epoch (loss 2.4549): 89%|βββββββββ | 223/250 [01:25<00:09, 2.91it/s]
Training 1/1 epoch (loss 2.4549): 90%|βββββββββ | 224/250 [01:25<00:08, 2.96it/s]
Training 1/1 epoch (loss 2.7877): 90%|βββββββββ | 224/250 [01:26<00:08, 2.96it/s]
Training 1/1 epoch (loss 2.7877): 90%|βββββββββ | 225/250 [01:26<00:08, 2.92it/s]
Training 1/1 epoch (loss 2.9588): 90%|βββββββββ | 225/250 [01:26<00:08, 2.92it/s]
Training 1/1 epoch (loss 2.9588): 90%|βββββββββ | 226/250 [01:26<00:08, 2.91it/s]
Training 1/1 epoch (loss 2.7142): 90%|βββββββββ | 226/250 [01:26<00:08, 2.91it/s]
Training 1/1 epoch (loss 2.7142): 91%|βββββββββ | 227/250 [01:26<00:07, 2.97it/s]
Training 1/1 epoch (loss 2.7869): 91%|βββββββββ | 227/250 [01:27<00:07, 2.97it/s]
Training 1/1 epoch (loss 2.7869): 91%|βββββββββ | 228/250 [01:27<00:07, 2.98it/s]
Training 1/1 epoch (loss 2.6158): 91%|βββββββββ | 228/250 [01:27<00:07, 2.98it/s]
Training 1/1 epoch (loss 2.6158): 92%|ββββββββββ| 229/250 [01:27<00:06, 3.04it/s]
Training 1/1 epoch (loss 2.7024): 92%|ββββββββββ| 229/250 [01:27<00:06, 3.04it/s]
Training 1/1 epoch (loss 2.7024): 92%|ββββββββββ| 230/250 [01:27<00:06, 3.09it/s]
Training 1/1 epoch (loss 2.6332): 92%|ββββββββββ| 230/250 [01:28<00:06, 3.09it/s]
Training 1/1 epoch (loss 2.6332): 92%|ββββββββββ| 231/250 [01:28<00:06, 2.77it/s]
Training 1/1 epoch (loss 2.7531): 92%|ββββββββββ| 231/250 [01:28<00:06, 2.77it/s]
Training 1/1 epoch (loss 2.7531): 93%|ββββββββββ| 232/250 [01:28<00:06, 2.85it/s]
Training 1/1 epoch (loss 2.6932): 93%|ββββββββββ| 232/250 [01:28<00:06, 2.85it/s]
Training 1/1 epoch (loss 2.6932): 93%|ββββββββββ| 233/250 [01:28<00:06, 2.81it/s]
Training 1/1 epoch (loss 2.6686): 93%|ββββββββββ| 233/250 [01:29<00:06, 2.81it/s]
Training 1/1 epoch (loss 2.6686): 94%|ββββββββββ| 234/250 [01:29<00:05, 2.88it/s]
Training 1/1 epoch (loss 2.8471): 94%|ββββββββββ| 234/250 [01:29<00:05, 2.88it/s]
Training 1/1 epoch (loss 2.8471): 94%|ββββββββββ| 235/250 [01:29<00:05, 2.97it/s]
Training 1/1 epoch (loss 2.9470): 94%|ββββββββββ| 235/250 [01:29<00:05, 2.97it/s]
Training 1/1 epoch (loss 2.9470): 94%|ββββββββββ| 236/250 [01:29<00:04, 3.01it/s]
Training 1/1 epoch (loss 2.6913): 94%|ββββββββββ| 236/250 [01:30<00:04, 3.01it/s]
Training 1/1 epoch (loss 2.6913): 95%|ββββββββββ| 237/250 [01:30<00:04, 3.04it/s]
Training 1/1 epoch (loss 2.8391): 95%|ββββββββββ| 237/250 [01:30<00:04, 3.04it/s]
Training 1/1 epoch (loss 2.8391): 95%|ββββββββββ| 238/250 [01:30<00:03, 3.04it/s]
Training 1/1 epoch (loss 2.5142): 95%|ββββββββββ| 238/250 [01:30<00:03, 3.04it/s]
Training 1/1 epoch (loss 2.5142): 96%|ββββββββββ| 239/250 [01:30<00:03, 2.89it/s]
Training 1/1 epoch (loss 2.9172): 96%|ββββββββββ| 239/250 [01:31<00:03, 2.89it/s]
Training 1/1 epoch (loss 2.9172): 96%|ββββββββββ| 240/250 [01:31<00:03, 2.88it/s]
Training 1/1 epoch (loss 2.7920): 96%|ββββββββββ| 240/250 [01:31<00:03, 2.88it/s]
Training 1/1 epoch (loss 2.7920): 96%|ββββββββββ| 241/250 [01:31<00:03, 2.97it/s]
Training 1/1 epoch (loss 2.7850): 96%|ββββββββββ| 241/250 [01:31<00:03, 2.97it/s]
Training 1/1 epoch (loss 2.7850): 97%|ββββββββββ| 242/250 [01:31<00:02, 2.89it/s]
Training 1/1 epoch (loss 2.8507): 97%|ββββββββββ| 242/250 [01:32<00:02, 2.89it/s]
Training 1/1 epoch (loss 2.8507): 97%|ββββββββββ| 243/250 [01:32<00:02, 2.88it/s]
Training 1/1 epoch (loss 2.7694): 97%|ββββββββββ| 243/250 [01:32<00:02, 2.88it/s]
Training 1/1 epoch (loss 2.7694): 98%|ββββββββββ| 244/250 [01:32<00:02, 2.88it/s]
Training 1/1 epoch (loss 2.5797): 98%|ββββββββββ| 244/250 [01:33<00:02, 2.88it/s]
Training 1/1 epoch (loss 2.5797): 98%|ββββββββββ| 245/250 [01:33<00:01, 2.77it/s]
Training 1/1 epoch (loss 2.7787): 98%|ββββββββββ| 245/250 [01:33<00:01, 2.77it/s]
Training 1/1 epoch (loss 2.7787): 98%|ββββββββββ| 246/250 [01:33<00:01, 2.54it/s]
Training 1/1 epoch (loss 2.5896): 98%|ββββββββββ| 246/250 [01:33<00:01, 2.54it/s]
Training 1/1 epoch (loss 2.5896): 99%|ββββββββββ| 247/250 [01:33<00:01, 2.40it/s]
Training 1/1 epoch (loss 2.6221): 99%|ββββββββββ| 247/250 [01:34<00:01, 2.40it/s]
Training 1/1 epoch (loss 2.6221): 99%|ββββββββββ| 248/250 [01:34<00:00, 2.45it/s]
Training 1/1 epoch (loss 2.6505): 99%|ββββββββββ| 248/250 [01:34<00:00, 2.45it/s]
Training 1/1 epoch (loss 2.6505): 100%|ββββββββββ| 249/250 [01:34<00:00, 2.32it/s]
Training 1/1 epoch (loss 2.6235): 100%|ββββββββββ| 249/250 [01:35<00:00, 2.32it/s]
Training 1/1 epoch (loss 2.6235): 100%|ββββββββββ| 250/250 [01:35<00:00, 2.59it/s]
Training 1/1 epoch (loss 2.6235): 100%|ββββββββββ| 250/250 [01:35<00:00, 2.63it/s] |
| tokenizer config file saved in /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-1.5T/tinyllama-1.5T-s3-Q1-2000/tokenizer_config.json |
| Special tokens file saved in /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-1.5T/tinyllama-1.5T-s3-Q1-2000/special_tokens_map.json |
| wandb: ERROR Problem finishing run |
| Exception ignored in atexit callback: <bound method rank_zero_only.<locals>.wrapper of <safe_rlhf.logger.Logger object at 0x1550c412cf10>> |
| Traceback (most recent call last): |
| File "/home/hansirui_1st/jiayi/resist/setting3/safe_rlhf/utils.py", line 212, in wrapper |
| return func(*args, **kwargs) |
| ^^^^^^^^^^^^^^^^^^^^^ |
| File "/home/hansirui_1st/jiayi/resist/setting3/safe_rlhf/logger.py", line 183, in close |
| self.wandb.finish() |
| File "/aifs4su/hansirui_1st/miniconda3/envs/jy-resist/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 406, in wrapper |
| return func(self, *args, **kwargs) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/aifs4su/hansirui_1st/miniconda3/envs/jy-resist/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 503, in wrapper |
| return func(self, *args, **kwargs) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/aifs4su/hansirui_1st/miniconda3/envs/jy-resist/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 451, in wrapper |
| return func(self, *args, **kwargs) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/aifs4su/hansirui_1st/miniconda3/envs/jy-resist/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 2309, in finish |
| return self._finish(exit_code) |
| ^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/aifs4su/hansirui_1st/miniconda3/envs/jy-resist/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 406, in wrapper |
| return func(self, *args, **kwargs) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/aifs4su/hansirui_1st/miniconda3/envs/jy-resist/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 2337, in _finish |
| self._atexit_cleanup(exit_code=exit_code) |
| File "/aifs4su/hansirui_1st/miniconda3/envs/jy-resist/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 2550, in _atexit_cleanup |
| self._on_finish() |
| File "/aifs4su/hansirui_1st/miniconda3/envs/jy-resist/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 2806, in _on_finish |
| wait_with_progress( |
| File "/aifs4su/hansirui_1st/miniconda3/envs/jy-resist/lib/python3.11/site-packages/wandb/sdk/mailbox/wait_with_progress.py", line 24, in wait_with_progress |
| return wait_all_with_progress( |
| ^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/aifs4su/hansirui_1st/miniconda3/envs/jy-resist/lib/python3.11/site-packages/wandb/sdk/mailbox/wait_with_progress.py", line 87, in wait_all_with_progress |
| return asyncio_compat.run(progress_loop_with_timeout) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/aifs4su/hansirui_1st/miniconda3/envs/jy-resist/lib/python3.11/site-packages/wandb/sdk/lib/asyncio_compat.py", line 27, in run |
| future = executor.submit(runner.run, fn) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/aifs4su/hansirui_1st/miniconda3/envs/jy-resist/lib/python3.11/concurrent/futures/thread.py", line 169, in submit |
| raise RuntimeError('cannot schedule new futures after ' |
| RuntimeError: cannot schedule new futures after interpreter shutdown |
|
|