| + deepspeed |
| [rank5]:[W529 03:49:30.190577462 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 5] using GPU 5 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| [rank7]:[W529 03:49:30.261888447 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 7] using GPU 7 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| [rank3]:[W529 03:49:31.284839409 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 3] using GPU 3 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| [rank1]:[W529 03:49:31.487324291 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 1] using GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| [rank4]:[W529 03:49:31.520216286 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 4] using GPU 4 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| [rank6]:[W529 03:49:31.520449262 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 6] using GPU 6 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| [rank2]:[W529 03:49:31.520493742 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 2] using GPU 2 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| [rank0]:[W529 03:49:31.558767564 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 0] using GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/config.json |
| Model config LlamaConfig { |
| "_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T", |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.49.0", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| Model config LlamaConfig { |
| "_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T", |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.49.0", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| Model config LlamaConfig { |
| "_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T", |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.49.0", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| Model config LlamaConfig { |
| "_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T", |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.49.0", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| Model config LlamaConfig { |
| "_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T", |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.49.0", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| Model config LlamaConfig { |
| "_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T", |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.49.0", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| Model config LlamaConfig { |
| "_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T", |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.49.0", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| Model config LlamaConfig { |
| "_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T", |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.49.0", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/model.safetensors |
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/model.safetensors |
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/model.safetensors |
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/model.safetensors |
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/model.safetensors |
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/model.safetensors |
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/model.safetensors |
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/model.safetensors |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
|
|
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/generation_config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/generation_config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/generation_config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/generation_config.json |
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/generation_config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/generation_config.json |
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/generation_config.json |
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| loading file tokenizer.model |
| loading file tokenizer.model |
| loading file tokenizer.model |
| loading file tokenizer.model |
| loading file tokenizer.model |
| loading file tokenizer.json |
| loading file added_tokens.json |
| loading file tokenizer.json |
| loading file tokenizer.json |
| loading file tokenizer.json |
| loading file tokenizer.json |
| loading file special_tokens_map.json |
| loading file added_tokens.json |
| loading file added_tokens.json |
| loading file added_tokens.json |
| loading file tokenizer_config.json |
| loading file added_tokens.json |
| loading file special_tokens_map.json |
| loading file special_tokens_map.json |
| loading file special_tokens_map.json |
| loading file chat_template.jinja |
| loading file tokenizer_config.json |
| loading file special_tokens_map.json |
| loading file tokenizer_config.json |
| loading file tokenizer_config.json |
| loading file chat_template.jinja |
| loading file chat_template.jinja |
| loading file tokenizer_config.json |
| loading file chat_template.jinja |
| loading file chat_template.jinja |
| loading file tokenizer.model |
| loading file tokenizer.json |
| loading file added_tokens.json |
| loading file special_tokens_map.json |
| loading file tokenizer_config.json |
| loading file chat_template.jinja |
| loading file tokenizer.model |
| loading file tokenizer.json |
| loading file added_tokens.json |
| loading file special_tokens_map.json |
| loading file tokenizer_config.json |
| loading file chat_template.jinja |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/generation_config.json |
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| loading file tokenizer.model |
| loading file tokenizer.json |
| loading file added_tokens.json |
| loading file special_tokens_map.json |
| loading file tokenizer_config.json |
| loading file chat_template.jinja |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root...Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
|
|
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| Detected CUDA files, patching ldflags |
| Emitting ninja build file /home/hansirui_1st/.cache/torch_extensions/py311_cu124/fused_adam/build.ninja... |
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| /aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/torch/utils/cpp_extension.py:2059: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. |
| If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST']. |
| warnings.warn( |
| Building extension module fused_adam... |
| Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) |
| Loading extension module fused_adam... |
| Loading extension module fused_adam... |
| Loading extension module fused_adam... |
| Loading extension module fused_adam... |
| Loading extension module fused_adam... |
| Loading extension module fused_adam... |
| Loading extension module fused_adam... |
| Loading extension module fused_adam... |
| wandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information. |
| `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| wandb: Currently logged in as: xtom to https://api.wandb.ai. Use `wandb login |
| wandb: Tracking run with wandb version 0.19.8 |
| wandb: Run data is saved locally in /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/tinyllama-1T/tinyllama-1T-s3-Q1-5k/wandb/run-20250529_034948-qxuigv2t |
| wandb: Run `wandb offline` to turn off syncing. |
| wandb: Syncing run tinyllama-1T-s3-Q1-5k |
| wandb: βοΈ View project at https://wandb.ai/xtom/Inverse_Alignment |
| wandb: π View run at https://wandb.ai/xtom/Inverse_Alignment/runs/qxuigv2t |
|
Training 1/1 epoch: 0%| | 0/157 [00:00<?, ?it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
|
Training 1/1 epoch (loss 2.3096): 0%| | 0/157 [00:10<?, ?it/s]
Training 1/1 epoch (loss 2.3096): 1%| | 1/157 [00:10<27:28, 10.56s/it]
Training 1/1 epoch (loss 2.3386): 1%| | 1/157 [00:13<27:28, 10.56s/it]
Training 1/1 epoch (loss 2.3386): 1%|β | 2/157 [00:13<15:38, 6.06s/it]
Training 1/1 epoch (loss 2.4139): 1%|β | 2/157 [00:15<15:38, 6.06s/it]
Training 1/1 epoch (loss 2.4139): 2%|β | 3/157 [00:15<10:35, 4.12s/it]
Training 1/1 epoch (loss 2.2709): 2%|β | 3/157 [00:17<10:35, 4.12s/it]
Training 1/1 epoch (loss 2.2709): 3%|β | 4/157 [00:17<08:22, 3.29s/it]
Training 1/1 epoch (loss 2.3339): 3%|β | 4/157 [00:19<08:22, 3.29s/it]
Training 1/1 epoch (loss 2.3339): 3%|β | 5/157 [00:19<06:55, 2.73s/it]
Training 1/1 epoch (loss 2.3797): 3%|β | 5/157 [00:20<06:55, 2.73s/it]
Training 1/1 epoch (loss 2.3797): 4%|β | 6/157 [00:20<05:58, 2.38s/it]
Training 1/1 epoch (loss 2.3575): 4%|β | 6/157 [00:22<05:58, 2.38s/it]
Training 1/1 epoch (loss 2.3575): 4%|β | 7/157 [00:22<05:02, 2.02s/it]
Training 1/1 epoch (loss 2.4028): 4%|β | 7/157 [00:23<05:02, 2.02s/it]
Training 1/1 epoch (loss 2.4028): 5%|β | 8/157 [00:23<04:51, 1.96s/it]
Training 1/1 epoch (loss 2.3018): 5%|β | 8/157 [00:25<04:51, 1.96s/it]
Training 1/1 epoch (loss 2.3018): 6%|β | 9/157 [00:25<04:57, 2.01s/it]
Training 1/1 epoch (loss 2.2948): 6%|β | 9/157 [00:27<04:57, 2.01s/it]
Training 1/1 epoch (loss 2.2948): 6%|β | 10/157 [00:27<04:25, 1.81s/it]
Training 1/1 epoch (loss 2.3405): 6%|β | 10/157 [00:29<04:25, 1.81s/it]
Training 1/1 epoch (loss 2.3405): 7%|β | 11/157 [00:29<04:24, 1.81s/it]
Training 1/1 epoch (loss 2.3771): 7%|β | 11/157 [00:30<04:24, 1.81s/it]
Training 1/1 epoch (loss 2.3771): 8%|β | 12/157 [00:30<04:13, 1.75s/it]
Training 1/1 epoch (loss 2.3303): 8%|β | 12/157 [00:31<04:13, 1.75s/it]
Training 1/1 epoch (loss 2.3303): 8%|β | 13/157 [00:31<03:41, 1.54s/it]
Training 1/1 epoch (loss 2.4227): 8%|β | 13/157 [00:33<03:41, 1.54s/it]
Training 1/1 epoch (loss 2.4227): 9%|β | 14/157 [00:33<03:53, 1.63s/it]
Training 1/1 epoch (loss 2.4305): 9%|β | 14/157 [00:35<03:53, 1.63s/it]
Training 1/1 epoch (loss 2.4305): 10%|β | 15/157 [00:35<03:53, 1.64s/it]
Training 1/1 epoch (loss 2.3639): 10%|β | 15/157 [00:36<03:53, 1.64s/it]
Training 1/1 epoch (loss 2.3639): 10%|β | 16/157 [00:36<03:52, 1.65s/it]
Training 1/1 epoch (loss 2.3537): 10%|β | 16/157 [00:37<03:52, 1.65s/it]
Training 1/1 epoch (loss 2.3537): 11%|β | 17/157 [00:37<03:19, 1.43s/it]
Training 1/1 epoch (loss 2.3149): 11%|β | 17/157 [00:40<03:19, 1.43s/it]
Training 1/1 epoch (loss 2.3149): 11%|ββ | 18/157 [00:40<04:01, 1.74s/it]
Training 1/1 epoch (loss 2.2397): 11%|ββ | 18/157 [00:42<04:01, 1.74s/it]
Training 1/1 epoch (loss 2.2397): 12%|ββ | 19/157 [00:42<04:03, 1.76s/it]
Training 1/1 epoch (loss 2.1309): 12%|ββ | 19/157 [00:44<04:03, 1.76s/it]
Training 1/1 epoch (loss 2.1309): 13%|ββ | 20/157 [00:44<04:11, 1.84s/it]
Training 1/1 epoch (loss 2.3359): 13%|ββ | 20/157 [00:45<04:11, 1.84s/it]
Training 1/1 epoch (loss 2.3359): 13%|ββ | 21/157 [00:45<03:46, 1.67s/it]
Training 1/1 epoch (loss 2.3052): 13%|ββ | 21/157 [00:46<03:46, 1.67s/it]
Training 1/1 epoch (loss 2.3052): 14%|ββ | 22/157 [00:46<03:34, 1.59s/it]
Training 1/1 epoch (loss 2.2583): 14%|ββ | 22/157 [00:49<03:34, 1.59s/it]
Training 1/1 epoch (loss 2.2583): 15%|ββ | 23/157 [00:49<03:57, 1.77s/it]
Training 1/1 epoch (loss 2.1834): 15%|ββ | 23/157 [00:51<03:57, 1.77s/it]
Training 1/1 epoch (loss 2.1834): 15%|ββ | 24/157 [00:51<04:23, 1.98s/it]
Training 1/1 epoch (loss 2.2734): 15%|ββ | 24/157 [00:53<04:23, 1.98s/it]
Training 1/1 epoch (loss 2.2734): 16%|ββ | 25/157 [00:53<04:17, 1.95s/it]
Training 1/1 epoch (loss 2.3625): 16%|ββ | 25/157 [00:55<04:17, 1.95s/it]
Training 1/1 epoch (loss 2.3625): 17%|ββ | 26/157 [00:55<04:11, 1.92s/it]
Training 1/1 epoch (loss 2.4484): 17%|ββ | 26/157 [00:57<04:11, 1.92s/it]
Training 1/1 epoch (loss 2.4484): 17%|ββ | 27/157 [00:57<04:12, 1.95s/it]
Training 1/1 epoch (loss 2.4371): 17%|ββ | 27/157 [00:59<04:12, 1.95s/it]
Training 1/1 epoch (loss 2.4371): 18%|ββ | 28/157 [00:59<04:04, 1.90s/it]
Training 1/1 epoch (loss 2.3688): 18%|ββ | 28/157 [01:00<04:04, 1.90s/it]
Training 1/1 epoch (loss 2.3688): 18%|ββ | 29/157 [01:00<03:46, 1.77s/it]
Training 1/1 epoch (loss 2.2012): 18%|ββ | 29/157 [01:03<03:46, 1.77s/it]
Training 1/1 epoch (loss 2.2012): 19%|ββ | 30/157 [01:03<04:13, 1.99s/it]
Training 1/1 epoch (loss 2.1782): 19%|ββ | 30/157 [01:04<04:13, 1.99s/it]
Training 1/1 epoch (loss 2.1782): 20%|ββ | 31/157 [01:04<03:45, 1.79s/it]
Training 1/1 epoch (loss 2.2963): 20%|ββ | 31/157 [01:05<03:45, 1.79s/it]
Training 1/1 epoch (loss 2.2963): 20%|ββ | 32/157 [01:05<03:34, 1.71s/it]
Training 1/1 epoch (loss 2.1054): 20%|ββ | 32/157 [01:07<03:34, 1.71s/it]
Training 1/1 epoch (loss 2.1054): 21%|ββ | 33/157 [01:07<03:31, 1.71s/it]
Training 1/1 epoch (loss 2.1638): 21%|ββ | 33/157 [01:09<03:31, 1.71s/it]
Training 1/1 epoch (loss 2.1638): 22%|βββ | 34/157 [01:09<03:27, 1.69s/it]
Training 1/1 epoch (loss 2.2893): 22%|βββ | 34/157 [01:10<03:27, 1.69s/it]
Training 1/1 epoch (loss 2.2893): 22%|βββ | 35/157 [01:10<03:21, 1.65s/it]
Training 1/1 epoch (loss 2.2653): 22%|βββ | 35/157 [01:12<03:21, 1.65s/it]
Training 1/1 epoch (loss 2.2653): 23%|βββ | 36/157 [01:12<03:06, 1.55s/it]
Training 1/1 epoch (loss 2.2135): 23%|βββ | 36/157 [01:14<03:06, 1.55s/it]
Training 1/1 epoch (loss 2.2135): 24%|βββ | 37/157 [01:14<03:30, 1.75s/it]
Training 1/1 epoch (loss 2.1109): 24%|βββ | 37/157 [01:16<03:30, 1.75s/it]
Training 1/1 epoch (loss 2.1109): 24%|βββ | 38/157 [01:16<03:51, 1.95s/it]
Training 1/1 epoch (loss 2.1161): 24%|βββ | 38/157 [01:18<03:51, 1.95s/it]
Training 1/1 epoch (loss 2.1161): 25%|βββ | 39/157 [01:18<03:46, 1.92s/it]
Training 1/1 epoch (loss 2.1096): 25%|βββ | 39/157 [01:21<03:46, 1.92s/it]
Training 1/1 epoch (loss 2.1096): 25%|βββ | 40/157 [01:21<04:02, 2.07s/it]
Training 1/1 epoch (loss 2.1024): 25%|βββ | 40/157 [01:22<04:02, 2.07s/it]
Training 1/1 epoch (loss 2.1024): 26%|βββ | 41/157 [01:22<03:56, 2.04s/it]
Training 1/1 epoch (loss 2.1919): 26%|βββ | 41/157 [01:24<03:56, 2.04s/it]
Training 1/1 epoch (loss 2.1919): 27%|βββ | 42/157 [01:24<03:32, 1.84s/it]
Training 1/1 epoch (loss 2.1500): 27%|βββ | 42/157 [01:26<03:32, 1.84s/it]
Training 1/1 epoch (loss 2.1500): 27%|βββ | 43/157 [01:26<03:24, 1.79s/it]
Training 1/1 epoch (loss 2.1616): 27%|βββ | 43/157 [01:27<03:24, 1.79s/it]
Training 1/1 epoch (loss 2.1616): 28%|βββ | 44/157 [01:27<03:09, 1.67s/it]
Training 1/1 epoch (loss 2.2563): 28%|βββ | 44/157 [01:28<03:09, 1.67s/it]
Training 1/1 epoch (loss 2.2563): 29%|βββ | 45/157 [01:28<02:55, 1.57s/it]
Training 1/1 epoch (loss 2.1972): 29%|βββ | 45/157 [01:30<02:55, 1.57s/it]
Training 1/1 epoch (loss 2.1972): 29%|βββ | 46/157 [01:30<03:07, 1.69s/it]
Training 1/1 epoch (loss 2.0668): 29%|βββ | 46/157 [01:33<03:07, 1.69s/it]
Training 1/1 epoch (loss 2.0668): 30%|βββ | 47/157 [01:33<03:30, 1.92s/it]
Training 1/1 epoch (loss 2.0923): 30%|βββ | 47/157 [01:35<03:30, 1.92s/it]
Training 1/1 epoch (loss 2.0923): 31%|βββ | 48/157 [01:35<03:46, 2.08s/it]
Training 1/1 epoch (loss 2.0977): 31%|βββ | 48/157 [01:37<03:46, 2.08s/it]
Training 1/1 epoch (loss 2.0977): 31%|βββ | 49/157 [01:37<03:24, 1.90s/it]
Training 1/1 epoch (loss 2.1360): 31%|βββ | 49/157 [01:38<03:24, 1.90s/it]
Training 1/1 epoch (loss 2.1360): 32%|ββββ | 50/157 [01:38<03:08, 1.76s/it]
Training 1/1 epoch (loss 2.1047): 32%|ββββ | 50/157 [01:40<03:08, 1.76s/it]
Training 1/1 epoch (loss 2.1047): 32%|ββββ | 51/157 [01:40<03:11, 1.81s/it]
Training 1/1 epoch (loss 2.1950): 32%|ββββ | 51/157 [01:42<03:11, 1.81s/it]
Training 1/1 epoch (loss 2.1950): 33%|ββββ | 52/157 [01:42<03:02, 1.74s/it]
Training 1/1 epoch (loss 2.0722): 33%|ββββ | 52/157 [01:44<03:02, 1.74s/it]
Training 1/1 epoch (loss 2.0722): 34%|ββββ | 53/157 [01:44<03:20, 1.93s/it]
Training 1/1 epoch (loss 2.1390): 34%|ββββ | 53/157 [01:46<03:20, 1.93s/it]
Training 1/1 epoch (loss 2.1390): 34%|ββββ | 54/157 [01:46<03:17, 1.91s/it]
Training 1/1 epoch (loss 2.1286): 34%|ββββ | 54/157 [01:48<03:17, 1.91s/it]
Training 1/1 epoch (loss 2.1286): 35%|ββββ | 55/157 [01:48<03:10, 1.87s/it]
Training 1/1 epoch (loss 2.2489): 35%|ββββ | 55/157 [01:49<03:10, 1.87s/it]
Training 1/1 epoch (loss 2.2489): 36%|ββββ | 56/157 [01:49<02:54, 1.72s/it]
Training 1/1 epoch (loss 2.0585): 36%|ββββ | 56/157 [01:51<02:54, 1.72s/it]
Training 1/1 epoch (loss 2.0585): 36%|ββββ | 57/157 [01:51<03:14, 1.94s/it]
Training 1/1 epoch (loss 1.9706): 36%|ββββ | 57/157 [01:53<03:14, 1.94s/it]
Training 1/1 epoch (loss 1.9706): 37%|ββββ | 58/157 [01:53<02:59, 1.81s/it]
Training 1/1 epoch (loss 2.0922): 37%|ββββ | 58/157 [01:54<02:59, 1.81s/it]
Training 1/1 epoch (loss 2.0922): 38%|ββββ | 59/157 [01:54<02:50, 1.74s/it]
Training 1/1 epoch (loss 2.0095): 38%|ββββ | 59/157 [01:56<02:50, 1.74s/it]
Training 1/1 epoch (loss 2.0095): 38%|ββββ | 60/157 [01:56<02:52, 1.78s/it]
Training 1/1 epoch (loss 1.9715): 38%|ββββ | 60/157 [01:59<02:52, 1.78s/it]
Training 1/1 epoch (loss 1.9715): 39%|ββββ | 61/157 [01:59<03:10, 1.98s/it]
Training 1/1 epoch (loss 2.0619): 39%|ββββ | 61/157 [02:00<03:10, 1.98s/it]
Training 1/1 epoch (loss 2.0619): 39%|ββββ | 62/157 [02:00<02:57, 1.87s/it]
Training 1/1 epoch (loss 1.9804): 39%|ββββ | 62/157 [02:02<02:57, 1.87s/it]
Training 1/1 epoch (loss 1.9804): 40%|ββββ | 63/157 [02:02<02:49, 1.81s/it]
Training 1/1 epoch (loss 1.9646): 40%|ββββ | 63/157 [02:04<02:49, 1.81s/it]
Training 1/1 epoch (loss 1.9646): 41%|ββββ | 64/157 [02:04<02:50, 1.84s/it]
Training 1/1 epoch (loss 2.0831): 41%|ββββ | 64/157 [02:06<02:50, 1.84s/it]
Training 1/1 epoch (loss 2.0831): 41%|βββββ | 65/157 [02:06<02:41, 1.75s/it]
Training 1/1 epoch (loss 2.0382): 41%|βββββ | 65/157 [02:07<02:41, 1.75s/it]
Training 1/1 epoch (loss 2.0382): 42%|βββββ | 66/157 [02:07<02:18, 1.52s/it]
Training 1/1 epoch (loss 2.0127): 42%|βββββ | 66/157 [02:08<02:18, 1.52s/it]
Training 1/1 epoch (loss 2.0127): 43%|βββββ | 67/157 [02:08<02:23, 1.60s/it]
Training 1/1 epoch (loss 1.9821): 43%|βββββ | 67/157 [02:10<02:23, 1.60s/it]
Training 1/1 epoch (loss 1.9821): 43%|βββββ | 68/157 [02:10<02:24, 1.62s/it]
Training 1/1 epoch (loss 1.9628): 43%|βββββ | 68/157 [02:11<02:24, 1.62s/it]
Training 1/1 epoch (loss 1.9628): 44%|βββββ | 69/157 [02:11<02:20, 1.59s/it]
Training 1/1 epoch (loss 1.9857): 44%|βββββ | 69/157 [02:13<02:20, 1.59s/it]
Training 1/1 epoch (loss 1.9857): 45%|βββββ | 70/157 [02:13<02:19, 1.60s/it]
Training 1/1 epoch (loss 1.9180): 45%|βββββ | 70/157 [02:16<02:19, 1.60s/it]
Training 1/1 epoch (loss 1.9180): 45%|βββββ | 71/157 [02:16<02:38, 1.85s/it]
Training 1/1 epoch (loss 1.9555): 45%|βββββ | 71/157 [02:17<02:38, 1.85s/it]
Training 1/1 epoch (loss 1.9555): 46%|βββββ | 72/157 [02:17<02:30, 1.77s/it]
Training 1/1 epoch (loss 1.9822): 46%|βββββ | 72/157 [02:20<02:30, 1.77s/it]
Training 1/1 epoch (loss 1.9822): 46%|βββββ | 73/157 [02:20<02:44, 1.96s/it]
Training 1/1 epoch (loss 1.9930): 46%|βββββ | 73/157 [02:22<02:44, 1.96s/it]
Training 1/1 epoch (loss 1.9930): 47%|βββββ | 74/157 [02:22<02:45, 1.99s/it]
Training 1/1 epoch (loss 1.9588): 47%|βββββ | 74/157 [02:24<02:45, 1.99s/it]
Training 1/1 epoch (loss 1.9588): 48%|βββββ | 75/157 [02:24<02:49, 2.07s/it]
Training 1/1 epoch (loss 1.8645): 48%|βββββ | 75/157 [02:25<02:49, 2.07s/it]
Training 1/1 epoch (loss 1.8645): 48%|βββββ | 76/157 [02:25<02:35, 1.92s/it]
Training 1/1 epoch (loss 1.9187): 48%|βββββ | 76/157 [02:27<02:35, 1.92s/it]
Training 1/1 epoch (loss 1.9187): 49%|βββββ | 77/157 [02:27<02:18, 1.73s/it]
Training 1/1 epoch (loss 2.0219): 49%|βββββ | 77/157 [02:28<02:18, 1.73s/it]
Training 1/1 epoch (loss 2.0219): 50%|βββββ | 78/157 [02:28<02:09, 1.63s/it]
Training 1/1 epoch (loss 2.0034): 50%|βββββ | 78/157 [02:29<02:09, 1.63s/it]
Training 1/1 epoch (loss 2.0034): 50%|βββββ | 79/157 [02:29<01:58, 1.52s/it]
Training 1/1 epoch (loss 2.0176): 50%|βββββ | 79/157 [02:32<01:58, 1.52s/it]
Training 1/1 epoch (loss 2.0176): 51%|βββββ | 80/157 [02:32<02:12, 1.72s/it]
Training 1/1 epoch (loss 1.9645): 51%|βββββ | 80/157 [02:34<02:12, 1.72s/it]
Training 1/1 epoch (loss 1.9645): 52%|ββββββ | 81/157 [02:34<02:27, 1.93s/it]
Training 1/1 epoch (loss 1.9381): 52%|ββββββ | 81/157 [02:36<02:27, 1.93s/it]
Training 1/1 epoch (loss 1.9381): 52%|ββββββ | 82/157 [02:36<02:36, 2.09s/it]
Training 1/1 epoch (loss 1.9325): 52%|ββββββ | 82/157 [02:38<02:36, 2.09s/it]
Training 1/1 epoch (loss 1.9325): 53%|ββββββ | 83/157 [02:38<02:25, 1.96s/it]
Training 1/1 epoch (loss 2.1090): 53%|ββββββ | 83/157 [02:39<02:25, 1.96s/it]
Training 1/1 epoch (loss 2.1090): 54%|ββββββ | 84/157 [02:39<02:09, 1.78s/it]
Training 1/1 epoch (loss 1.9618): 54%|ββββββ | 84/157 [02:41<02:09, 1.78s/it]
Training 1/1 epoch (loss 1.9618): 54%|ββββββ | 85/157 [02:41<01:59, 1.66s/it]
Training 1/1 epoch (loss 1.9648): 54%|ββββββ | 85/157 [02:42<01:59, 1.66s/it]
Training 1/1 epoch (loss 1.9648): 55%|ββββββ | 86/157 [02:42<01:48, 1.53s/it]
Training 1/1 epoch (loss 2.0318): 55%|ββββββ | 86/157 [02:43<01:48, 1.53s/it]
Training 1/1 epoch (loss 2.0318): 55%|ββββββ | 87/157 [02:43<01:43, 1.47s/it]
Training 1/1 epoch (loss 1.9265): 55%|ββββββ | 87/157 [02:45<01:43, 1.47s/it]
Training 1/1 epoch (loss 1.9265): 56%|ββββββ | 88/157 [02:45<01:39, 1.44s/it]
Training 1/1 epoch (loss 1.9878): 56%|ββββββ | 88/157 [02:46<01:39, 1.44s/it]
Training 1/1 epoch (loss 1.9878): 57%|ββββββ | 89/157 [02:46<01:37, 1.44s/it]
Training 1/1 epoch (loss 1.9466): 57%|ββββββ | 89/157 [02:48<01:37, 1.44s/it]
Training 1/1 epoch (loss 1.9466): 57%|ββββββ | 90/157 [02:48<01:43, 1.54s/it]
Training 1/1 epoch (loss 1.8784): 57%|ββββββ | 90/157 [02:50<01:43, 1.54s/it]
Training 1/1 epoch (loss 1.8784): 58%|ββββββ | 91/157 [02:50<01:54, 1.74s/it]
Training 1/1 epoch (loss 1.9324): 58%|ββββββ | 91/157 [02:51<01:54, 1.74s/it]
Training 1/1 epoch (loss 1.9324): 59%|ββββββ | 92/157 [02:51<01:44, 1.60s/it]
Training 1/1 epoch (loss 1.9027): 59%|ββββββ | 92/157 [02:53<01:44, 1.60s/it]
Training 1/1 epoch (loss 1.9027): 59%|ββββββ | 93/157 [02:53<01:48, 1.70s/it]
Training 1/1 epoch (loss 2.0108): 59%|ββββββ | 93/157 [02:56<01:48, 1.70s/it]
Training 1/1 epoch (loss 2.0108): 60%|ββββββ | 94/157 [02:56<02:01, 1.93s/it]
Training 1/1 epoch (loss 1.9439): 60%|ββββββ | 94/157 [02:57<02:01, 1.93s/it]
Training 1/1 epoch (loss 1.9439): 61%|ββββββ | 95/157 [02:57<01:47, 1.74s/it]
Training 1/1 epoch (loss 1.9233): 61%|ββββββ | 95/157 [02:59<01:47, 1.74s/it]
Training 1/1 epoch (loss 1.9233): 61%|ββββββ | 96/157 [02:59<01:39, 1.64s/it]
Training 1/1 epoch (loss 1.9011): 61%|ββββββ | 96/157 [03:00<01:39, 1.64s/it]
Training 1/1 epoch (loss 1.9011): 62%|βββββββ | 97/157 [03:00<01:30, 1.51s/it]
Training 1/1 epoch (loss 1.9861): 62%|βββββββ | 97/157 [03:01<01:30, 1.51s/it]
Training 1/1 epoch (loss 1.9861): 62%|βββββββ | 98/157 [03:01<01:25, 1.45s/it]
Training 1/1 epoch (loss 1.7874): 62%|βββββββ | 98/157 [03:03<01:25, 1.45s/it]
Training 1/1 epoch (loss 1.7874): 63%|βββββββ | 99/157 [03:03<01:26, 1.49s/it]
Training 1/1 epoch (loss 1.9212): 63%|βββββββ | 99/157 [03:04<01:26, 1.49s/it]
Training 1/1 epoch (loss 1.9212): 64%|βββββββ | 100/157 [03:04<01:21, 1.43s/it]
Training 1/1 epoch (loss 1.9453): 64%|βββββββ | 100/157 [03:06<01:21, 1.43s/it]
Training 1/1 epoch (loss 1.9453): 64%|βββββββ | 101/157 [03:06<01:30, 1.61s/it]
Training 1/1 epoch (loss 2.0092): 64%|βββββββ | 101/157 [03:08<01:30, 1.61s/it]
Training 1/1 epoch (loss 2.0092): 65%|βββββββ | 102/157 [03:08<01:42, 1.86s/it]
Training 1/1 epoch (loss 1.8836): 65%|βββββββ | 102/157 [03:10<01:42, 1.86s/it]
Training 1/1 epoch (loss 1.8836): 66%|βββββββ | 103/157 [03:10<01:38, 1.82s/it]
Training 1/1 epoch (loss 1.9379): 66%|βββββββ | 103/157 [03:13<01:38, 1.82s/it]
Training 1/1 epoch (loss 1.9379): 66%|βββββββ | 104/157 [03:13<01:45, 1.98s/it]
Training 1/1 epoch (loss 1.8654): 66%|βββββββ | 104/157 [03:15<01:45, 1.98s/it]
Training 1/1 epoch (loss 1.8654): 67%|βββββββ | 105/157 [03:15<01:47, 2.07s/it]
Training 1/1 epoch (loss 1.7977): 67%|βββββββ | 105/157 [03:16<01:47, 2.07s/it]
Training 1/1 epoch (loss 1.7977): 68%|βββββββ | 106/157 [03:16<01:34, 1.86s/it]
Training 1/1 epoch (loss 1.8567): 68%|βββββββ | 106/157 [03:18<01:34, 1.86s/it]
Training 1/1 epoch (loss 1.8567): 68%|βββββββ | 107/157 [03:18<01:33, 1.87s/it]
Training 1/1 epoch (loss 1.8010): 68%|βββββββ | 107/157 [03:20<01:33, 1.87s/it]
Training 1/1 epoch (loss 1.8010): 69%|βββββββ | 108/157 [03:20<01:34, 1.94s/it]
Training 1/1 epoch (loss 1.8831): 69%|βββββββ | 108/157 [03:21<01:34, 1.94s/it]
Training 1/1 epoch (loss 1.8831): 69%|βββββββ | 109/157 [03:21<01:23, 1.75s/it]
Training 1/1 epoch (loss 1.7052): 69%|βββββββ | 109/157 [03:24<01:23, 1.75s/it]
Training 1/1 epoch (loss 1.7052): 70%|βββββββ | 110/157 [03:24<01:27, 1.87s/it]
Training 1/1 epoch (loss 1.8611): 70%|βββββββ | 110/157 [03:25<01:27, 1.87s/it]
Training 1/1 epoch (loss 1.8611): 71%|βββββββ | 111/157 [03:25<01:16, 1.67s/it]
Training 1/1 epoch (loss 1.8793): 71%|βββββββ | 111/157 [03:27<01:16, 1.67s/it]
Training 1/1 epoch (loss 1.8793): 71%|ββββββββ | 112/157 [03:27<01:18, 1.75s/it]
Training 1/1 epoch (loss 1.8697): 71%|ββββββββ | 112/157 [03:28<01:18, 1.75s/it]
Training 1/1 epoch (loss 1.8697): 72%|ββββββββ | 113/157 [03:28<01:14, 1.69s/it]
Training 1/1 epoch (loss 1.7661): 72%|ββββββββ | 113/157 [03:30<01:14, 1.69s/it]
Training 1/1 epoch (loss 1.7661): 73%|ββββββββ | 114/157 [03:30<01:11, 1.67s/it]
Training 1/1 epoch (loss 1.7367): 73%|ββββββββ | 114/157 [03:32<01:11, 1.67s/it]
Training 1/1 epoch (loss 1.7367): 73%|ββββββββ | 115/157 [03:32<01:13, 1.76s/it]
Training 1/1 epoch (loss 1.8567): 73%|ββββββββ | 115/157 [03:33<01:13, 1.76s/it]
Training 1/1 epoch (loss 1.8567): 74%|ββββββββ | 116/157 [03:33<01:06, 1.63s/it]
Training 1/1 epoch (loss 1.9114): 74%|ββββββββ | 116/157 [03:35<01:06, 1.63s/it]
Training 1/1 epoch (loss 1.9114): 75%|ββββββββ | 117/157 [03:35<01:10, 1.76s/it]
Training 1/1 epoch (loss 1.8609): 75%|ββββββββ | 117/157 [03:37<01:10, 1.76s/it]
Training 1/1 epoch (loss 1.8609): 75%|ββββββββ | 118/157 [03:37<01:11, 1.84s/it]
Training 1/1 epoch (loss 1.7612): 75%|ββββββββ | 118/157 [03:39<01:11, 1.84s/it]
Training 1/1 epoch (loss 1.7612): 76%|ββββββββ | 119/157 [03:39<01:09, 1.83s/it]
Training 1/1 epoch (loss 1.9328): 76%|ββββββββ | 119/157 [03:41<01:09, 1.83s/it]
Training 1/1 epoch (loss 1.9328): 76%|ββββββββ | 120/157 [03:41<01:10, 1.91s/it]
Training 1/1 epoch (loss 1.7925): 76%|ββββββββ | 120/157 [03:42<01:10, 1.91s/it]
Training 1/1 epoch (loss 1.7925): 77%|ββββββββ | 121/157 [03:42<00:59, 1.66s/it]
Training 1/1 epoch (loss 1.8793): 77%|ββββββββ | 121/157 [03:45<00:59, 1.66s/it]
Training 1/1 epoch (loss 1.8793): 78%|ββββββββ | 122/157 [03:45<01:06, 1.90s/it]
Training 1/1 epoch (loss 1.8648): 78%|ββββββββ | 122/157 [03:46<01:06, 1.90s/it]
Training 1/1 epoch (loss 1.8648): 78%|ββββββββ | 123/157 [03:46<00:58, 1.73s/it]
Training 1/1 epoch (loss 1.8971): 78%|ββββββββ | 123/157 [03:49<00:58, 1.73s/it]
Training 1/1 epoch (loss 1.8971): 79%|ββββββββ | 124/157 [03:49<01:04, 1.94s/it]
Training 1/1 epoch (loss 1.8494): 79%|ββββββββ | 124/157 [03:50<01:04, 1.94s/it]
Training 1/1 epoch (loss 1.8494): 80%|ββββββββ | 125/157 [03:50<01:01, 1.93s/it]
Training 1/1 epoch (loss 1.9415): 80%|ββββββββ | 125/157 [03:52<01:01, 1.93s/it]
Training 1/1 epoch (loss 1.9415): 80%|ββββββββ | 126/157 [03:52<00:55, 1.80s/it]
Training 1/1 epoch (loss 1.7644): 80%|ββββββββ | 126/157 [03:53<00:55, 1.80s/it]
Training 1/1 epoch (loss 1.7644): 81%|ββββββββ | 127/157 [03:53<00:49, 1.65s/it]
Training 1/1 epoch (loss 1.8602): 81%|ββββββββ | 127/157 [03:54<00:49, 1.65s/it]
Training 1/1 epoch (loss 1.8602): 82%|βββββββββ | 128/157 [03:54<00:44, 1.53s/it]
Training 1/1 epoch (loss 1.9279): 82%|βββββββββ | 128/157 [03:56<00:44, 1.53s/it]
Training 1/1 epoch (loss 1.9279): 82%|βββββββββ | 129/157 [03:56<00:44, 1.61s/it]
Training 1/1 epoch (loss 1.7562): 82%|βββββββββ | 129/157 [03:58<00:44, 1.61s/it]
Training 1/1 epoch (loss 1.7562): 83%|βββββββββ | 130/157 [03:58<00:41, 1.54s/it]
Training 1/1 epoch (loss 1.7602): 83%|βββββββββ | 130/157 [03:59<00:41, 1.54s/it]
Training 1/1 epoch (loss 1.7602): 83%|βββββββββ | 131/157 [03:59<00:38, 1.49s/it]
Training 1/1 epoch (loss 1.7737): 83%|βββββββββ | 131/157 [04:01<00:38, 1.49s/it]
Training 1/1 epoch (loss 1.7737): 84%|βββββββββ | 132/157 [04:01<00:38, 1.54s/it]
Training 1/1 epoch (loss 1.9066): 84%|βββββββββ | 132/157 [04:03<00:38, 1.54s/it]
Training 1/1 epoch (loss 1.9066): 85%|βββββββββ | 133/157 [04:03<00:40, 1.68s/it]
Training 1/1 epoch (loss 1.8003): 85%|βββββββββ | 133/157 [04:04<00:40, 1.68s/it]
Training 1/1 epoch (loss 1.8003): 85%|βββββββββ | 134/157 [04:04<00:35, 1.53s/it]
Training 1/1 epoch (loss 1.7630): 85%|βββββββββ | 134/157 [04:06<00:35, 1.53s/it]
Training 1/1 epoch (loss 1.7630): 86%|βββββββββ | 135/157 [04:06<00:37, 1.71s/it]
Training 1/1 epoch (loss 1.8369): 86%|βββββββββ | 135/157 [04:08<00:37, 1.71s/it]
Training 1/1 epoch (loss 1.8369): 87%|βββββββββ | 136/157 [04:08<00:35, 1.68s/it]
Training 1/1 epoch (loss 1.7357): 87%|βββββββββ | 136/157 [04:09<00:35, 1.68s/it]
Training 1/1 epoch (loss 1.7357): 87%|βββββββββ | 137/157 [04:09<00:31, 1.58s/it]
Training 1/1 epoch (loss 1.7826): 87%|βββββββββ | 137/157 [04:10<00:31, 1.58s/it]
Training 1/1 epoch (loss 1.7826): 88%|βββββββββ | 138/157 [04:10<00:29, 1.57s/it]
Training 1/1 epoch (loss 1.8345): 88%|βββββββββ | 138/157 [04:12<00:29, 1.57s/it]
Training 1/1 epoch (loss 1.8345): 89%|βββββββββ | 139/157 [04:12<00:28, 1.57s/it]
Training 1/1 epoch (loss 1.8977): 89%|βββββββββ | 139/157 [04:14<00:28, 1.57s/it]
Training 1/1 epoch (loss 1.8977): 89%|βββββββββ | 140/157 [04:14<00:27, 1.64s/it]
Training 1/1 epoch (loss 1.7066): 89%|βββββββββ | 140/157 [04:16<00:27, 1.64s/it]
Training 1/1 epoch (loss 1.7066): 90%|βββββββββ | 141/157 [04:16<00:26, 1.68s/it]
Training 1/1 epoch (loss 1.6860): 90%|βββββββββ | 141/157 [04:17<00:26, 1.68s/it]
Training 1/1 epoch (loss 1.6860): 90%|βββββββββ | 142/157 [04:17<00:25, 1.69s/it]
Training 1/1 epoch (loss 1.8188): 90%|βββββββββ | 142/157 [04:19<00:25, 1.69s/it]
Training 1/1 epoch (loss 1.8188): 91%|βββββββββ | 143/157 [04:19<00:21, 1.53s/it]
Training 1/1 epoch (loss 1.8003): 91%|βββββββββ | 143/157 [04:20<00:21, 1.53s/it]
Training 1/1 epoch (loss 1.8003): 92%|ββββββββββ| 144/157 [04:20<00:21, 1.63s/it]
Training 1/1 epoch (loss 1.8589): 92%|ββββββββββ| 144/157 [04:22<00:21, 1.63s/it]
Training 1/1 epoch (loss 1.8589): 92%|ββββββββββ| 145/157 [04:22<00:20, 1.71s/it]
Training 1/1 epoch (loss 1.6862): 92%|ββββββββββ| 145/157 [04:24<00:20, 1.71s/it]
Training 1/1 epoch (loss 1.6862): 93%|ββββββββββ| 146/157 [04:24<00:18, 1.70s/it]
Training 1/1 epoch (loss 1.7867): 93%|ββββββββββ| 146/157 [04:26<00:18, 1.70s/it]
Training 1/1 epoch (loss 1.7867): 94%|ββββββββββ| 147/157 [04:26<00:17, 1.74s/it]
Training 1/1 epoch (loss 1.7420): 94%|ββββββββββ| 147/157 [04:27<00:17, 1.74s/it]
Training 1/1 epoch (loss 1.7420): 94%|ββββββββββ| 148/157 [04:27<00:14, 1.63s/it]
Training 1/1 epoch (loss 1.7181): 94%|ββββββββββ| 148/157 [04:30<00:14, 1.63s/it]
Training 1/1 epoch (loss 1.7181): 95%|ββββββββββ| 149/157 [04:30<00:14, 1.87s/it]
Training 1/1 epoch (loss 1.7371): 95%|ββββββββββ| 149/157 [04:31<00:14, 1.87s/it]
Training 1/1 epoch (loss 1.7371): 96%|ββββββββββ| 150/157 [04:31<00:12, 1.83s/it]
Training 1/1 epoch (loss 1.7533): 96%|ββββββββββ| 150/157 [04:33<00:12, 1.83s/it]
Training 1/1 epoch (loss 1.7533): 96%|ββββββββββ| 151/157 [04:33<00:10, 1.68s/it]
Training 1/1 epoch (loss 1.8635): 96%|ββββββββββ| 151/157 [04:34<00:10, 1.68s/it]
Training 1/1 epoch (loss 1.8635): 97%|ββββββββββ| 152/157 [04:34<00:08, 1.68s/it]
Training 1/1 epoch (loss 1.8700): 97%|ββββββββββ| 152/157 [04:36<00:08, 1.68s/it]
Training 1/1 epoch (loss 1.8700): 97%|ββββββββββ| 153/157 [04:36<00:06, 1.64s/it]
Training 1/1 epoch (loss 1.7453): 97%|ββββββββββ| 153/157 [04:38<00:06, 1.64s/it]
Training 1/1 epoch (loss 1.7453): 98%|ββββββββββ| 154/157 [04:38<00:04, 1.65s/it]
Training 1/1 epoch (loss 1.8425): 98%|ββββββββββ| 154/157 [04:39<00:04, 1.65s/it]
Training 1/1 epoch (loss 1.8425): 99%|ββββββββββ| 155/157 [04:39<00:03, 1.61s/it]
Training 1/1 epoch (loss 1.8008): 99%|ββββββββββ| 155/157 [04:41<00:03, 1.61s/it]
Training 1/1 epoch (loss 1.8008): 99%|ββββββββββ| 156/157 [04:41<00:01, 1.60s/it]
Training 1/1 epoch (loss 1.9444): 99%|ββββββββββ| 156/157 [04:42<00:01, 1.60s/it]
Training 1/1 epoch (loss 1.9444): 100%|ββββββββββ| 157/157 [04:42<00:00, 1.63s/it]
Training 1/1 epoch (loss 1.9444): 100%|ββββββββββ| 157/157 [04:42<00:00, 1.80s/it] |
| tokenizer config file saved in /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/tinyllama-1T/tinyllama-1T-s3-Q1-5k/tokenizer_config.json |
| Special tokens file saved in /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/tinyllama-1T/tinyllama-1T-s3-Q1-5k/special_tokens_map.json |
| wandb: ERROR Problem finishing run |
| Exception ignored in atexit callback: <bound method rank_zero_only.<locals>.wrapper of <safe_rlhf.logger.Logger object at 0x155117cabc10>> |
| Traceback (most recent call last): |
| File "/home/hansirui_1st/jiayi/resist/setting3/safe_rlhf/utils.py", line 212, in wrapper |
| return func(*args, **kwargs) |
| ^^^^^^^^^^^^^^^^^^^^^ |
| File "/home/hansirui_1st/jiayi/resist/setting3/safe_rlhf/logger.py", line 183, in close |
| self.wandb.finish() |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 449, in wrapper |
| return func(self, *args, **kwargs) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 391, in wrapper |
| return func(self, *args, **kwargs) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 2106, in finish |
| return self._finish(exit_code) |
| ^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 2127, in _finish |
| self._atexit_cleanup(exit_code=exit_code) |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 2352, in _atexit_cleanup |
| self._on_finish() |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 2609, in _on_finish |
| wait_with_progress( |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/mailbox/wait_with_progress.py", line 24, in wait_with_progress |
| return wait_all_with_progress( |
| ^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/mailbox/wait_with_progress.py", line 87, in wait_all_with_progress |
| return asyncio_compat.run(progress_loop_with_timeout) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/lib/asyncio_compat.py", line 27, in run |
| future = executor.submit(runner.run, fn) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/concurrent/futures/thread.py", line 169, in submit |
| raise RuntimeError('cannot schedule new futures after ' |
| RuntimeError: cannot schedule new futures after interpreter shutdown |
|
|