| + deepspeed |
| [rank2]:[W529 15:42:22.196876804 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 2] using GPU 2 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| [rank3]:[W529 15:42:22.235863737 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 3] using GPU 3 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| [rank5]:[W529 15:42:22.242363563 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 5] using GPU 5 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| [rank4]:[W529 15:42:22.245494802 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 4] using GPU 4 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| [rank7]:[W529 15:42:22.250250361 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 7] using GPU 7 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| [rank6]:[W529 15:42:22.265544893 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 6] using GPU 6 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| [rank1]:[W529 15:42:23.771982711 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 1] using GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| [rank0]:[W529 15:42:23.778516657 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 0] using GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/config.json |
| Model config LlamaConfig { |
| "_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T", |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.49.0", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| Model config LlamaConfig { |
| "_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T", |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.49.0", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| Model config LlamaConfig { |
| "_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T", |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.49.0", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| Model config LlamaConfig { |
| "_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T", |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.49.0", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| Model config LlamaConfig { |
| "_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T", |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.49.0", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| Model config LlamaConfig { |
| "_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T", |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.49.0", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/config.json |
| Model config LlamaConfig { |
| "_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T", |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.49.0", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/config.json |
| Model config LlamaConfig { |
| "_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T", |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.49.0", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/model.safetensors |
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/model.safetensors |
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/model.safetensors |
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/model.safetensors |
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/model.safetensors |
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/model.safetensors |
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/model.safetensors |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
| |
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
| |
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
| |
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
| |
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
| |
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
| |
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
| |
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/model.safetensors |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
|
|
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/generation_config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/generation_config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/generation_config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/generation_config.json |
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/generation_config.json |
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/generation_config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/generation_config.json |
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| loading file tokenizer.model |
| loading file tokenizer.json |
| loading file tokenizer.model |
| loading file added_tokens.json |
| loading file special_tokens_map.json |
| loading file tokenizer.json |
| loading file tokenizer_config.json |
| loading file added_tokens.json |
| loading file special_tokens_map.json |
| loading file tokenizer_config.json |
| loading file chat_template.jinja |
| loading file chat_template.jinja |
| loading file tokenizer.model |
| loading file tokenizer.json |
| loading file added_tokens.json |
| loading file special_tokens_map.json |
| loading file tokenizer_config.json |
| loading file chat_template.jinja |
| loading file tokenizer.model |
| loading file tokenizer.json |
| loading file added_tokens.json |
| loading file special_tokens_map.json |
| loading file tokenizer_config.json |
| loading file tokenizer.model |
| loading file chat_template.jinja |
| loading file tokenizer.json |
| loading file added_tokens.json |
| loading file special_tokens_map.json |
| loading file tokenizer_config.json |
| loading file chat_template.jinja |
| loading file tokenizer.model |
| loading file tokenizer.json |
| loading file added_tokens.json |
| loading file special_tokens_map.json |
| loading file tokenizer_config.json |
| loading file chat_template.jinja |
| loading file tokenizer.model |
| loading file tokenizer.json |
| loading file added_tokens.json |
| loading file special_tokens_map.json |
| loading file tokenizer_config.json |
| loading file chat_template.jinja |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-715k-1.5T/generation_config.json |
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| loading file tokenizer.model |
| loading file tokenizer.json |
| loading file added_tokens.json |
| loading file special_tokens_map.json |
| loading file tokenizer_config.json |
| loading file chat_template.jinja |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| Loading extension module fused_adam... |
| Loading extension module fused_adam... |
| Loading extension module fused_adam...Loading extension module fused_adam...Loading extension module fused_adam...Loading extension module fused_adam...Loading extension module fused_adam... |
|
|
|
|
|
|
|
|
| Loading extension module fused_adam... |
| wandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information. |
| `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| wandb: Currently logged in as: xtom to https://api.wandb.ai. Use `wandb login |
| wandb: Tracking run with wandb version 0.19.8 |
| wandb: Run data is saved locally in /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/tinyllama-1.5T/tinyllama-1.5T-s3-Q1-5k/wandb/run-20250529_154236-wyo48z9w |
| wandb: Run `wandb offline` to turn off syncing. |
| wandb: Syncing run tinyllama-1.5T-s3-Q1-5k |
| wandb: βοΈ View project at https://wandb.ai/xtom/Inverse_Alignment |
| wandb: π View run at https://wandb.ai/xtom/Inverse_Alignment/runs/wyo48z9w |
|
Training 1/1 epoch: 0%| | 0/157 [00:00<?, ?it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
|
Training 1/1 epoch (loss 2.1074): 0%| | 0/157 [00:05<?, ?it/s]
Training 1/1 epoch (loss 2.1074): 1%| | 1/157 [00:05<13:42, 5.27s/it]
Training 1/1 epoch (loss 2.0837): 1%| | 1/157 [00:06<13:42, 5.27s/it]
Training 1/1 epoch (loss 2.0837): 1%|β | 2/157 [00:06<07:51, 3.04s/it]
Training 1/1 epoch (loss 2.0288): 1%|β | 2/157 [00:07<07:51, 3.04s/it]
Training 1/1 epoch (loss 2.0288): 2%|β | 3/157 [00:07<04:36, 1.80s/it]
Training 1/1 epoch (loss 1.9894): 2%|β | 3/157 [00:07<04:36, 1.80s/it]
Training 1/1 epoch (loss 1.9894): 3%|β | 4/157 [00:07<03:05, 1.21s/it]
Training 1/1 epoch (loss 2.0680): 3%|β | 4/157 [00:07<03:05, 1.21s/it]
Training 1/1 epoch (loss 2.0680): 3%|β | 5/157 [00:07<02:16, 1.11it/s]
Training 1/1 epoch (loss 2.1245): 3%|β | 5/157 [00:08<02:16, 1.11it/s]
Training 1/1 epoch (loss 2.1245): 4%|β | 6/157 [00:08<01:45, 1.43it/s]
Training 1/1 epoch (loss 2.0715): 4%|β | 6/157 [00:08<01:45, 1.43it/s]
Training 1/1 epoch (loss 2.0715): 4%|β | 7/157 [00:08<01:25, 1.76it/s]
Training 1/1 epoch (loss 2.1327): 4%|β | 7/157 [00:08<01:25, 1.76it/s]
Training 1/1 epoch (loss 2.1327): 5%|β | 8/157 [00:08<01:15, 1.97it/s]
Training 1/1 epoch (loss 2.0162): 5%|β | 8/157 [00:09<01:15, 1.97it/s]
Training 1/1 epoch (loss 2.0162): 6%|β | 9/157 [00:09<01:05, 2.26it/s]
Training 1/1 epoch (loss 2.0269): 6%|β | 9/157 [00:09<01:05, 2.26it/s]
Training 1/1 epoch (loss 2.0269): 6%|β | 10/157 [00:09<00:58, 2.50it/s]
Training 1/1 epoch (loss 2.1011): 6%|β | 10/157 [00:09<00:58, 2.50it/s]
Training 1/1 epoch (loss 2.1011): 7%|β | 11/157 [00:09<00:54, 2.67it/s]
Training 1/1 epoch (loss 2.0894): 7%|β | 11/157 [00:09<00:54, 2.67it/s]
Training 1/1 epoch (loss 2.0894): 8%|β | 12/157 [00:09<00:53, 2.72it/s]
Training 1/1 epoch (loss 2.0605): 8%|β | 12/157 [00:10<00:53, 2.72it/s]
Training 1/1 epoch (loss 2.0605): 8%|β | 13/157 [00:10<00:50, 2.85it/s]
Training 1/1 epoch (loss 2.0858): 8%|β | 13/157 [00:10<00:50, 2.85it/s]
Training 1/1 epoch (loss 2.0858): 9%|β | 14/157 [00:10<00:48, 2.95it/s]
Training 1/1 epoch (loss 2.1289): 9%|β | 14/157 [00:10<00:48, 2.95it/s]
Training 1/1 epoch (loss 2.1289): 10%|β | 15/157 [00:10<00:46, 3.07it/s]
Training 1/1 epoch (loss 2.0675): 10%|β | 15/157 [00:11<00:46, 3.07it/s]
Training 1/1 epoch (loss 2.0675): 10%|β | 16/157 [00:11<00:45, 3.09it/s]
Training 1/1 epoch (loss 2.0517): 10%|β | 16/157 [00:11<00:45, 3.09it/s]
Training 1/1 epoch (loss 2.0517): 11%|β | 17/157 [00:11<00:44, 3.14it/s]
Training 1/1 epoch (loss 2.0305): 11%|β | 17/157 [00:11<00:44, 3.14it/s]
Training 1/1 epoch (loss 2.0305): 11%|ββ | 18/157 [00:11<00:44, 3.09it/s]
Training 1/1 epoch (loss 1.9453): 11%|ββ | 18/157 [00:12<00:44, 3.09it/s]
Training 1/1 epoch (loss 1.9453): 12%|ββ | 19/157 [00:12<00:45, 3.04it/s]
Training 1/1 epoch (loss 1.8759): 12%|ββ | 19/157 [00:12<00:45, 3.04it/s]
Training 1/1 epoch (loss 1.8759): 13%|ββ | 20/157 [00:12<00:45, 2.99it/s]
Training 1/1 epoch (loss 2.0644): 13%|ββ | 20/157 [00:12<00:45, 2.99it/s]
Training 1/1 epoch (loss 2.0644): 13%|ββ | 21/157 [00:12<00:45, 3.01it/s]
Training 1/1 epoch (loss 1.9895): 13%|ββ | 21/157 [00:13<00:45, 3.01it/s]
Training 1/1 epoch (loss 1.9895): 14%|ββ | 22/157 [00:13<00:43, 3.12it/s]
Training 1/1 epoch (loss 1.9812): 14%|ββ | 22/157 [00:13<00:43, 3.12it/s]
Training 1/1 epoch (loss 1.9812): 15%|ββ | 23/157 [00:13<00:42, 3.14it/s]
Training 1/1 epoch (loss 1.9165): 15%|ββ | 23/157 [00:13<00:42, 3.14it/s]
Training 1/1 epoch (loss 1.9165): 15%|ββ | 24/157 [00:13<00:45, 2.95it/s]
Training 1/1 epoch (loss 1.9857): 15%|ββ | 24/157 [00:14<00:45, 2.95it/s]
Training 1/1 epoch (loss 1.9857): 16%|ββ | 25/157 [00:14<00:43, 3.04it/s]
Training 1/1 epoch (loss 2.0265): 16%|ββ | 25/157 [00:14<00:43, 3.04it/s]
Training 1/1 epoch (loss 2.0265): 17%|ββ | 26/157 [00:14<00:41, 3.13it/s]
Training 1/1 epoch (loss 2.1116): 17%|ββ | 26/157 [00:14<00:41, 3.13it/s]
Training 1/1 epoch (loss 2.1116): 17%|ββ | 27/157 [00:14<00:41, 3.13it/s]
Training 1/1 epoch (loss 2.1804): 17%|ββ | 27/157 [00:15<00:41, 3.13it/s]
Training 1/1 epoch (loss 2.1804): 18%|ββ | 28/157 [00:15<00:40, 3.18it/s]
Training 1/1 epoch (loss 2.0337): 18%|ββ | 28/157 [00:15<00:40, 3.18it/s]
Training 1/1 epoch (loss 2.0337): 18%|ββ | 29/157 [00:15<00:40, 3.14it/s]
Training 1/1 epoch (loss 1.8988): 18%|ββ | 29/157 [00:15<00:40, 3.14it/s]
Training 1/1 epoch (loss 1.8988): 19%|ββ | 30/157 [00:15<00:40, 3.14it/s]
Training 1/1 epoch (loss 1.9284): 19%|ββ | 30/157 [00:16<00:40, 3.14it/s]
Training 1/1 epoch (loss 1.9284): 20%|ββ | 31/157 [00:16<00:40, 3.10it/s]
Training 1/1 epoch (loss 1.9970): 20%|ββ | 31/157 [00:16<00:40, 3.10it/s]
Training 1/1 epoch (loss 1.9970): 20%|ββ | 32/157 [00:16<00:40, 3.08it/s]
Training 1/1 epoch (loss 1.8037): 20%|ββ | 32/157 [00:16<00:40, 3.08it/s]
Training 1/1 epoch (loss 1.8037): 21%|ββ | 33/157 [00:16<00:39, 3.15it/s]
Training 1/1 epoch (loss 1.8698): 21%|ββ | 33/157 [00:17<00:39, 3.15it/s]
Training 1/1 epoch (loss 1.8698): 22%|βββ | 34/157 [00:17<00:38, 3.22it/s]
Training 1/1 epoch (loss 1.9345): 22%|βββ | 34/157 [00:17<00:38, 3.22it/s]
Training 1/1 epoch (loss 1.9345): 22%|βββ | 35/157 [00:17<00:37, 3.21it/s]
Training 1/1 epoch (loss 1.9134): 22%|βββ | 35/157 [00:17<00:37, 3.21it/s]
Training 1/1 epoch (loss 1.9134): 23%|βββ | 36/157 [00:17<00:37, 3.27it/s]
Training 1/1 epoch (loss 1.9457): 23%|βββ | 36/157 [00:17<00:37, 3.27it/s]
Training 1/1 epoch (loss 1.9457): 24%|βββ | 37/157 [00:17<00:38, 3.11it/s]
Training 1/1 epoch (loss 1.8787): 24%|βββ | 37/157 [00:18<00:38, 3.11it/s]
Training 1/1 epoch (loss 1.8787): 24%|βββ | 38/157 [00:18<00:37, 3.18it/s]
Training 1/1 epoch (loss 1.8362): 24%|βββ | 38/157 [00:18<00:37, 3.18it/s]
Training 1/1 epoch (loss 1.8362): 25%|βββ | 39/157 [00:18<00:37, 3.19it/s]
Training 1/1 epoch (loss 1.7647): 25%|βββ | 39/157 [00:18<00:37, 3.19it/s]
Training 1/1 epoch (loss 1.7647): 25%|βββ | 40/157 [00:18<00:36, 3.17it/s]
Training 1/1 epoch (loss 1.8400): 25%|βββ | 40/157 [00:19<00:36, 3.17it/s]
Training 1/1 epoch (loss 1.8400): 26%|βββ | 41/157 [00:19<00:36, 3.22it/s]
Training 1/1 epoch (loss 1.9568): 26%|βββ | 41/157 [00:19<00:36, 3.22it/s]
Training 1/1 epoch (loss 1.9568): 27%|βββ | 42/157 [00:19<00:35, 3.22it/s]
Training 1/1 epoch (loss 1.8554): 27%|βββ | 42/157 [00:19<00:35, 3.22it/s]
Training 1/1 epoch (loss 1.8554): 27%|βββ | 43/157 [00:19<00:36, 3.14it/s]
Training 1/1 epoch (loss 1.8962): 27%|βββ | 43/157 [00:20<00:36, 3.14it/s]
Training 1/1 epoch (loss 1.8962): 28%|βββ | 44/157 [00:20<00:36, 3.12it/s]
Training 1/1 epoch (loss 1.9128): 28%|βββ | 44/157 [00:20<00:36, 3.12it/s]
Training 1/1 epoch (loss 1.9128): 29%|βββ | 45/157 [00:20<00:35, 3.16it/s]
Training 1/1 epoch (loss 1.9431): 29%|βββ | 45/157 [00:20<00:35, 3.16it/s]
Training 1/1 epoch (loss 1.9431): 29%|βββ | 46/157 [00:20<00:35, 3.12it/s]
Training 1/1 epoch (loss 1.7809): 29%|βββ | 46/157 [00:21<00:35, 3.12it/s]
Training 1/1 epoch (loss 1.7809): 30%|βββ | 47/157 [00:21<00:34, 3.19it/s]
Training 1/1 epoch (loss 1.8334): 30%|βββ | 47/157 [00:21<00:34, 3.19it/s]
Training 1/1 epoch (loss 1.8334): 31%|βββ | 48/157 [00:21<00:34, 3.12it/s]
Training 1/1 epoch (loss 1.8358): 31%|βββ | 48/157 [00:21<00:34, 3.12it/s]
Training 1/1 epoch (loss 1.8358): 31%|βββ | 49/157 [00:21<00:35, 3.09it/s]
Training 1/1 epoch (loss 1.8595): 31%|βββ | 49/157 [00:22<00:35, 3.09it/s]
Training 1/1 epoch (loss 1.8595): 32%|ββββ | 50/157 [00:22<00:35, 2.99it/s]
Training 1/1 epoch (loss 1.8310): 32%|ββββ | 50/157 [00:22<00:35, 2.99it/s]
Training 1/1 epoch (loss 1.8310): 32%|ββββ | 51/157 [00:22<00:34, 3.08it/s]
Training 1/1 epoch (loss 1.8572): 32%|ββββ | 51/157 [00:22<00:34, 3.08it/s]
Training 1/1 epoch (loss 1.8572): 33%|ββββ | 52/157 [00:22<00:33, 3.13it/s]
Training 1/1 epoch (loss 1.7941): 33%|ββββ | 52/157 [00:23<00:33, 3.13it/s]
Training 1/1 epoch (loss 1.7941): 34%|ββββ | 53/157 [00:23<00:33, 3.10it/s]
Training 1/1 epoch (loss 1.8547): 34%|ββββ | 53/157 [00:23<00:33, 3.10it/s]
Training 1/1 epoch (loss 1.8547): 34%|ββββ | 54/157 [00:23<00:32, 3.19it/s]
Training 1/1 epoch (loss 1.8183): 34%|ββββ | 54/157 [00:23<00:32, 3.19it/s]
Training 1/1 epoch (loss 1.8183): 35%|ββββ | 55/157 [00:23<00:32, 3.14it/s]
Training 1/1 epoch (loss 1.9051): 35%|ββββ | 55/157 [00:24<00:32, 3.14it/s]
Training 1/1 epoch (loss 1.9051): 36%|ββββ | 56/157 [00:24<00:33, 3.01it/s]
Training 1/1 epoch (loss 1.7695): 36%|ββββ | 56/157 [00:24<00:33, 3.01it/s]
Training 1/1 epoch (loss 1.7695): 36%|ββββ | 57/157 [00:24<00:32, 3.07it/s]
Training 1/1 epoch (loss 1.6909): 36%|ββββ | 57/157 [00:24<00:32, 3.07it/s]
Training 1/1 epoch (loss 1.6909): 37%|ββββ | 58/157 [00:24<00:31, 3.10it/s]
Training 1/1 epoch (loss 1.8318): 37%|ββββ | 58/157 [00:24<00:31, 3.10it/s]
Training 1/1 epoch (loss 1.8318): 38%|ββββ | 59/157 [00:24<00:31, 3.15it/s]
Training 1/1 epoch (loss 1.7360): 38%|ββββ | 59/157 [00:25<00:31, 3.15it/s]
Training 1/1 epoch (loss 1.7360): 38%|ββββ | 60/157 [00:25<00:30, 3.18it/s]
Training 1/1 epoch (loss 1.7236): 38%|ββββ | 60/157 [00:25<00:30, 3.18it/s]
Training 1/1 epoch (loss 1.7236): 39%|ββββ | 61/157 [00:25<00:30, 3.19it/s]
Training 1/1 epoch (loss 1.7511): 39%|ββββ | 61/157 [00:26<00:30, 3.19it/s]
Training 1/1 epoch (loss 1.7511): 39%|ββββ | 62/157 [00:26<00:32, 2.96it/s]
Training 1/1 epoch (loss 1.7000): 39%|ββββ | 62/157 [00:26<00:32, 2.96it/s]
Training 1/1 epoch (loss 1.7000): 40%|ββββ | 63/157 [00:26<00:30, 3.07it/s]
Training 1/1 epoch (loss 1.6660): 40%|ββββ | 63/157 [00:26<00:30, 3.07it/s]
Training 1/1 epoch (loss 1.6660): 41%|ββββ | 64/157 [00:26<00:30, 3.08it/s]
Training 1/1 epoch (loss 1.7950): 41%|ββββ | 64/157 [00:26<00:30, 3.08it/s]
Training 1/1 epoch (loss 1.7950): 41%|βββββ | 65/157 [00:26<00:29, 3.13it/s]
Training 1/1 epoch (loss 1.7309): 41%|βββββ | 65/157 [00:27<00:29, 3.13it/s]
Training 1/1 epoch (loss 1.7309): 42%|βββββ | 66/157 [00:27<00:28, 3.19it/s]
Training 1/1 epoch (loss 1.7645): 42%|βββββ | 66/157 [00:27<00:28, 3.19it/s]
Training 1/1 epoch (loss 1.7645): 43%|βββββ | 67/157 [00:27<00:28, 3.18it/s]
Training 1/1 epoch (loss 1.7087): 43%|βββββ | 67/157 [00:27<00:28, 3.18it/s]
Training 1/1 epoch (loss 1.7087): 43%|βββββ | 68/157 [00:27<00:28, 3.14it/s]
Training 1/1 epoch (loss 1.6865): 43%|βββββ | 68/157 [00:28<00:28, 3.14it/s]
Training 1/1 epoch (loss 1.6865): 44%|βββββ | 69/157 [00:28<00:28, 3.06it/s]
Training 1/1 epoch (loss 1.6336): 44%|βββββ | 69/157 [00:28<00:28, 3.06it/s]
Training 1/1 epoch (loss 1.6336): 45%|βββββ | 70/157 [00:28<00:28, 3.04it/s]
Training 1/1 epoch (loss 1.6874): 45%|βββββ | 70/157 [00:28<00:28, 3.04it/s]
Training 1/1 epoch (loss 1.6874): 45%|βββββ | 71/157 [00:28<00:28, 3.05it/s]
Training 1/1 epoch (loss 1.6803): 45%|βββββ | 71/157 [00:29<00:28, 3.05it/s]
Training 1/1 epoch (loss 1.6803): 46%|βββββ | 72/157 [00:29<00:27, 3.07it/s]
Training 1/1 epoch (loss 1.7427): 46%|βββββ | 72/157 [00:29<00:27, 3.07it/s]
Training 1/1 epoch (loss 1.7427): 46%|βββββ | 73/157 [00:29<00:27, 3.11it/s]
Training 1/1 epoch (loss 1.7150): 46%|βββββ | 73/157 [00:29<00:27, 3.11it/s]
Training 1/1 epoch (loss 1.7150): 47%|βββββ | 74/157 [00:29<00:28, 2.89it/s]
Training 1/1 epoch (loss 1.7005): 47%|βββββ | 74/157 [00:30<00:28, 2.89it/s]
Training 1/1 epoch (loss 1.7005): 48%|βββββ | 75/157 [00:30<00:27, 2.99it/s]
Training 1/1 epoch (loss 1.6265): 48%|βββββ | 75/157 [00:30<00:27, 2.99it/s]
Training 1/1 epoch (loss 1.6265): 48%|βββββ | 76/157 [00:30<00:26, 3.08it/s]
Training 1/1 epoch (loss 1.6765): 48%|βββββ | 76/157 [00:30<00:26, 3.08it/s]
Training 1/1 epoch (loss 1.6765): 49%|βββββ | 77/157 [00:30<00:25, 3.18it/s]
Training 1/1 epoch (loss 1.7506): 49%|βββββ | 77/157 [00:31<00:25, 3.18it/s]
Training 1/1 epoch (loss 1.7506): 50%|βββββ | 78/157 [00:31<00:24, 3.20it/s]
Training 1/1 epoch (loss 1.7361): 50%|βββββ | 78/157 [00:31<00:24, 3.20it/s]
Training 1/1 epoch (loss 1.7361): 50%|βββββ | 79/157 [00:31<00:25, 3.05it/s]
Training 1/1 epoch (loss 1.7678): 50%|βββββ | 79/157 [00:31<00:25, 3.05it/s]
Training 1/1 epoch (loss 1.7678): 51%|βββββ | 80/157 [00:31<00:25, 3.02it/s]
Training 1/1 epoch (loss 1.7269): 51%|βββββ | 80/157 [00:32<00:25, 3.02it/s]
Training 1/1 epoch (loss 1.7269): 52%|ββββββ | 81/157 [00:32<00:25, 2.96it/s]
Training 1/1 epoch (loss 1.6835): 52%|ββββββ | 81/157 [00:32<00:25, 2.96it/s]
Training 1/1 epoch (loss 1.6835): 52%|ββββββ | 82/157 [00:32<00:24, 3.04it/s]
Training 1/1 epoch (loss 1.6561): 52%|ββββββ | 82/157 [00:32<00:24, 3.04it/s]
Training 1/1 epoch (loss 1.6561): 53%|ββββββ | 83/157 [00:32<00:23, 3.13it/s]
Training 1/1 epoch (loss 1.8010): 53%|ββββββ | 83/157 [00:33<00:23, 3.13it/s]
Training 1/1 epoch (loss 1.8010): 54%|ββββββ | 84/157 [00:33<00:23, 3.09it/s]
Training 1/1 epoch (loss 1.6636): 54%|ββββββ | 84/157 [00:33<00:23, 3.09it/s]
Training 1/1 epoch (loss 1.6636): 54%|ββββββ | 85/157 [00:33<00:22, 3.16it/s]
Training 1/1 epoch (loss 1.7305): 54%|ββββββ | 85/157 [00:33<00:22, 3.16it/s]
Training 1/1 epoch (loss 1.7305): 55%|ββββββ | 86/157 [00:33<00:22, 3.17it/s]
Training 1/1 epoch (loss 1.7247): 55%|ββββββ | 86/157 [00:34<00:22, 3.17it/s]
Training 1/1 epoch (loss 1.7247): 55%|ββββββ | 87/157 [00:34<00:22, 3.11it/s]
Training 1/1 epoch (loss 1.6840): 55%|ββββββ | 87/157 [00:34<00:22, 3.11it/s]
Training 1/1 epoch (loss 1.6840): 56%|ββββββ | 88/157 [00:34<00:22, 3.13it/s]
Training 1/1 epoch (loss 1.7000): 56%|ββββββ | 88/157 [00:34<00:22, 3.13it/s]
Training 1/1 epoch (loss 1.7000): 57%|ββββββ | 89/157 [00:34<00:21, 3.14it/s]
Training 1/1 epoch (loss 1.7090): 57%|ββββββ | 89/157 [00:34<00:21, 3.14it/s]
Training 1/1 epoch (loss 1.7090): 57%|ββββββ | 90/157 [00:34<00:21, 3.19it/s]
Training 1/1 epoch (loss 1.7075): 57%|ββββββ | 90/157 [00:35<00:21, 3.19it/s]
Training 1/1 epoch (loss 1.7075): 58%|ββββββ | 91/157 [00:35<00:20, 3.23it/s]
Training 1/1 epoch (loss 1.6455): 58%|ββββββ | 91/157 [00:35<00:20, 3.23it/s]
Training 1/1 epoch (loss 1.6455): 59%|ββββββ | 92/157 [00:35<00:19, 3.27it/s]
Training 1/1 epoch (loss 1.7032): 59%|ββββββ | 92/157 [00:35<00:19, 3.27it/s]
Training 1/1 epoch (loss 1.7032): 59%|ββββββ | 93/157 [00:35<00:19, 3.22it/s]
Training 1/1 epoch (loss 1.7702): 59%|ββββββ | 93/157 [00:36<00:19, 3.22it/s]
Training 1/1 epoch (loss 1.7702): 60%|ββββββ | 94/157 [00:36<00:20, 3.09it/s]
Training 1/1 epoch (loss 1.6854): 60%|ββββββ | 94/157 [00:36<00:20, 3.09it/s]
Training 1/1 epoch (loss 1.6854): 61%|ββββββ | 95/157 [00:36<00:20, 3.09it/s]
Training 1/1 epoch (loss 1.7049): 61%|ββββββ | 95/157 [00:36<00:20, 3.09it/s]
Training 1/1 epoch (loss 1.7049): 61%|ββββββ | 96/157 [00:36<00:19, 3.12it/s]
Training 1/1 epoch (loss 1.7091): 61%|ββββββ | 96/157 [00:37<00:19, 3.12it/s]
Training 1/1 epoch (loss 1.7091): 62%|βββββββ | 97/157 [00:37<00:18, 3.20it/s]
Training 1/1 epoch (loss 1.7583): 62%|βββββββ | 97/157 [00:37<00:18, 3.20it/s]
Training 1/1 epoch (loss 1.7583): 62%|βββββββ | 98/157 [00:37<00:18, 3.17it/s]
Training 1/1 epoch (loss 1.6013): 62%|βββββββ | 98/157 [00:37<00:18, 3.17it/s]
Training 1/1 epoch (loss 1.6013): 63%|βββββββ | 99/157 [00:37<00:19, 3.04it/s]
Training 1/1 epoch (loss 1.6922): 63%|βββββββ | 99/157 [00:38<00:19, 3.04it/s]
Training 1/1 epoch (loss 1.6922): 64%|βββββββ | 100/157 [00:38<00:18, 3.02it/s]
Training 1/1 epoch (loss 1.7088): 64%|βββββββ | 100/157 [00:38<00:18, 3.02it/s]
Training 1/1 epoch (loss 1.7088): 64%|βββββββ | 101/157 [00:38<00:18, 3.09it/s]
Training 1/1 epoch (loss 1.7533): 64%|βββββββ | 101/157 [00:38<00:18, 3.09it/s]
Training 1/1 epoch (loss 1.7533): 65%|βββββββ | 102/157 [00:38<00:17, 3.21it/s]
Training 1/1 epoch (loss 1.6626): 65%|βββββββ | 102/157 [00:39<00:17, 3.21it/s]
Training 1/1 epoch (loss 1.6626): 66%|βββββββ | 103/157 [00:39<00:16, 3.19it/s]
Training 1/1 epoch (loss 1.7928): 66%|βββββββ | 103/157 [00:39<00:16, 3.19it/s]
Training 1/1 epoch (loss 1.7928): 66%|βββββββ | 104/157 [00:39<00:16, 3.16it/s]
Training 1/1 epoch (loss 1.7256): 66%|βββββββ | 104/157 [00:39<00:16, 3.16it/s]
Training 1/1 epoch (loss 1.7256): 67%|βββββββ | 105/157 [00:39<00:16, 3.12it/s]
Training 1/1 epoch (loss 1.5999): 67%|βββββββ | 105/157 [00:40<00:16, 3.12it/s]
Training 1/1 epoch (loss 1.5999): 68%|βββββββ | 106/157 [00:40<00:16, 3.05it/s]
Training 1/1 epoch (loss 1.6621): 68%|βββββββ | 106/157 [00:40<00:16, 3.05it/s]
Training 1/1 epoch (loss 1.6621): 68%|βββββββ | 107/157 [00:40<00:15, 3.13it/s]
Training 1/1 epoch (loss 1.5622): 68%|βββββββ | 107/157 [00:40<00:15, 3.13it/s]
Training 1/1 epoch (loss 1.5622): 69%|βββββββ | 108/157 [00:40<00:15, 3.20it/s]
Training 1/1 epoch (loss 1.6833): 69%|βββββββ | 108/157 [00:41<00:15, 3.20it/s]
Training 1/1 epoch (loss 1.6833): 69%|βββββββ | 109/157 [00:41<00:14, 3.24it/s]
Training 1/1 epoch (loss 1.5124): 69%|βββββββ | 109/157 [00:41<00:14, 3.24it/s]
Training 1/1 epoch (loss 1.5124): 70%|βββββββ | 110/157 [00:41<00:14, 3.27it/s]
Training 1/1 epoch (loss 1.6758): 70%|βββββββ | 110/157 [00:41<00:14, 3.27it/s]
Training 1/1 epoch (loss 1.6758): 71%|βββββββ | 111/157 [00:41<00:14, 3.11it/s]
Training 1/1 epoch (loss 1.6793): 71%|βββββββ | 111/157 [00:42<00:14, 3.11it/s]
Training 1/1 epoch (loss 1.6793): 71%|ββββββββ | 112/157 [00:42<00:14, 3.02it/s]
Training 1/1 epoch (loss 1.5983): 71%|ββββββββ | 112/157 [00:42<00:14, 3.02it/s]
Training 1/1 epoch (loss 1.5983): 72%|ββββββββ | 113/157 [00:42<00:14, 2.94it/s]
Training 1/1 epoch (loss 1.5842): 72%|ββββββββ | 113/157 [00:42<00:14, 2.94it/s]
Training 1/1 epoch (loss 1.5842): 73%|ββββββββ | 114/157 [00:42<00:14, 2.97it/s]
Training 1/1 epoch (loss 1.5593): 73%|ββββββββ | 114/157 [00:43<00:14, 2.97it/s]
Training 1/1 epoch (loss 1.5593): 73%|ββββββββ | 115/157 [00:43<00:13, 3.04it/s]
Training 1/1 epoch (loss 1.6969): 73%|ββββββββ | 115/157 [00:43<00:13, 3.04it/s]
Training 1/1 epoch (loss 1.6969): 74%|ββββββββ | 116/157 [00:43<00:13, 3.06it/s]
Training 1/1 epoch (loss 1.7444): 74%|ββββββββ | 116/157 [00:43<00:13, 3.06it/s]
Training 1/1 epoch (loss 1.7444): 75%|ββββββββ | 117/157 [00:43<00:12, 3.15it/s]
Training 1/1 epoch (loss 1.7194): 75%|ββββββββ | 117/157 [00:43<00:12, 3.15it/s]
Training 1/1 epoch (loss 1.7194): 75%|ββββββββ | 118/157 [00:43<00:12, 3.10it/s]
Training 1/1 epoch (loss 1.6006): 75%|ββββββββ | 118/157 [00:44<00:12, 3.10it/s]
Training 1/1 epoch (loss 1.6006): 76%|ββββββββ | 119/157 [00:44<00:12, 3.16it/s]
Training 1/1 epoch (loss 1.7695): 76%|ββββββββ | 119/157 [00:44<00:12, 3.16it/s]
Training 1/1 epoch (loss 1.7695): 76%|ββββββββ | 120/157 [00:44<00:11, 3.11it/s]
Training 1/1 epoch (loss 1.6067): 76%|ββββββββ | 120/157 [00:44<00:11, 3.11it/s]
Training 1/1 epoch (loss 1.6067): 77%|ββββββββ | 121/157 [00:44<00:11, 3.13it/s]
Training 1/1 epoch (loss 1.7456): 77%|ββββββββ | 121/157 [00:45<00:11, 3.13it/s]
Training 1/1 epoch (loss 1.7456): 78%|ββββββββ | 122/157 [00:45<00:10, 3.22it/s]
Training 1/1 epoch (loss 1.7042): 78%|ββββββββ | 122/157 [00:45<00:10, 3.22it/s]
Training 1/1 epoch (loss 1.7042): 78%|ββββββββ | 123/157 [00:45<00:10, 3.22it/s]
Training 1/1 epoch (loss 1.7045): 78%|ββββββββ | 123/157 [00:45<00:10, 3.22it/s]
Training 1/1 epoch (loss 1.7045): 79%|ββββββββ | 124/157 [00:45<00:10, 3.19it/s]
Training 1/1 epoch (loss 1.6618): 79%|ββββββββ | 124/157 [00:46<00:10, 3.19it/s]
Training 1/1 epoch (loss 1.6618): 80%|ββββββββ | 125/157 [00:46<00:10, 3.09it/s]
Training 1/1 epoch (loss 1.7859): 80%|ββββββββ | 125/157 [00:46<00:10, 3.09it/s]
Training 1/1 epoch (loss 1.7859): 80%|ββββββββ | 126/157 [00:46<00:09, 3.15it/s]
Training 1/1 epoch (loss 1.6377): 80%|ββββββββ | 126/157 [00:46<00:09, 3.15it/s]
Training 1/1 epoch (loss 1.6377): 81%|ββββββββ | 127/157 [00:46<00:09, 3.17it/s]
Training 1/1 epoch (loss 1.6963): 81%|ββββββββ | 127/157 [00:47<00:09, 3.17it/s]
Training 1/1 epoch (loss 1.6963): 82%|βββββββββ | 128/157 [00:47<00:09, 3.18it/s]
Training 1/1 epoch (loss 1.7824): 82%|βββββββββ | 128/157 [00:47<00:09, 3.18it/s]
Training 1/1 epoch (loss 1.7824): 82%|βββββββββ | 129/157 [00:47<00:08, 3.21it/s]
Training 1/1 epoch (loss 1.5965): 82%|βββββββββ | 129/157 [00:47<00:08, 3.21it/s]
Training 1/1 epoch (loss 1.5965): 83%|βββββββββ | 130/157 [00:47<00:08, 3.27it/s]
Training 1/1 epoch (loss 1.6430): 83%|βββββββββ | 130/157 [00:48<00:08, 3.27it/s]
Training 1/1 epoch (loss 1.6430): 83%|βββββββββ | 131/157 [00:48<00:08, 3.12it/s]
Training 1/1 epoch (loss 1.6010): 83%|βββββββββ | 131/157 [00:48<00:08, 3.12it/s]
Training 1/1 epoch (loss 1.6010): 84%|βββββββββ | 132/157 [00:48<00:07, 3.16it/s]
Training 1/1 epoch (loss 1.7360): 84%|βββββββββ | 132/157 [00:48<00:07, 3.16it/s]
Training 1/1 epoch (loss 1.7360): 85%|βββββββββ | 133/157 [00:48<00:07, 3.20it/s]
Training 1/1 epoch (loss 1.6793): 85%|βββββββββ | 133/157 [00:48<00:07, 3.20it/s]
Training 1/1 epoch (loss 1.6793): 85%|βββββββββ | 134/157 [00:48<00:07, 3.26it/s]
Training 1/1 epoch (loss 1.6131): 85%|βββββββββ | 134/157 [00:49<00:07, 3.26it/s]
Training 1/1 epoch (loss 1.6131): 86%|βββββββββ | 135/157 [00:49<00:06, 3.28it/s]
Training 1/1 epoch (loss 1.6997): 86%|βββββββββ | 135/157 [00:49<00:06, 3.28it/s]
Training 1/1 epoch (loss 1.6997): 87%|βββββββββ | 136/157 [00:49<00:06, 3.22it/s]
Training 1/1 epoch (loss 1.5873): 87%|βββββββββ | 136/157 [00:49<00:06, 3.22it/s]
Training 1/1 epoch (loss 1.5873): 87%|βββββββββ | 137/157 [00:49<00:06, 3.16it/s]
Training 1/1 epoch (loss 1.6592): 87%|βββββββββ | 137/157 [00:50<00:06, 3.16it/s]
Training 1/1 epoch (loss 1.6592): 88%|βββββββββ | 138/157 [00:50<00:06, 3.13it/s]
Training 1/1 epoch (loss 1.6289): 88%|βββββββββ | 138/157 [00:50<00:06, 3.13it/s]
Training 1/1 epoch (loss 1.6289): 89%|βββββββββ | 139/157 [00:50<00:05, 3.16it/s]
Training 1/1 epoch (loss 1.7736): 89%|βββββββββ | 139/157 [00:50<00:05, 3.16it/s]
Training 1/1 epoch (loss 1.7736): 89%|βββββββββ | 140/157 [00:50<00:05, 3.21it/s]
Training 1/1 epoch (loss 1.5625): 89%|βββββββββ | 140/157 [00:51<00:05, 3.21it/s]
Training 1/1 epoch (loss 1.5625): 90%|βββββββββ | 141/157 [00:51<00:04, 3.28it/s]
Training 1/1 epoch (loss 1.5759): 90%|βββββββββ | 141/157 [00:51<00:04, 3.28it/s]
Training 1/1 epoch (loss 1.5759): 90%|βββββββββ | 142/157 [00:51<00:04, 3.25it/s]
Training 1/1 epoch (loss 1.6786): 90%|βββββββββ | 142/157 [00:51<00:04, 3.25it/s]
Training 1/1 epoch (loss 1.6786): 91%|βββββββββ | 143/157 [00:51<00:04, 3.21it/s]
Training 1/1 epoch (loss 1.6745): 91%|βββββββββ | 143/157 [00:52<00:04, 3.21it/s]
Training 1/1 epoch (loss 1.6745): 92%|ββββββββββ| 144/157 [00:52<00:04, 2.88it/s]
Training 1/1 epoch (loss 1.7155): 92%|ββββββββββ| 144/157 [00:52<00:04, 2.88it/s]
Training 1/1 epoch (loss 1.7155): 92%|ββββββββββ| 145/157 [00:52<00:04, 2.95it/s]
Training 1/1 epoch (loss 1.5363): 92%|ββββββββββ| 145/157 [00:52<00:04, 2.95it/s]
Training 1/1 epoch (loss 1.5363): 93%|ββββββββββ| 146/157 [00:52<00:03, 3.08it/s]
Training 1/1 epoch (loss 1.6680): 93%|ββββββββββ| 146/157 [00:53<00:03, 3.08it/s]
Training 1/1 epoch (loss 1.6680): 94%|ββββββββββ| 147/157 [00:53<00:03, 3.14it/s]
Training 1/1 epoch (loss 1.6196): 94%|ββββββββββ| 147/157 [00:53<00:03, 3.14it/s]
Training 1/1 epoch (loss 1.6196): 94%|ββββββββββ| 148/157 [00:53<00:03, 2.92it/s]
Training 1/1 epoch (loss 1.6130): 94%|ββββββββββ| 148/157 [00:53<00:03, 2.92it/s]
Training 1/1 epoch (loss 1.6130): 95%|ββββββββββ| 149/157 [00:53<00:02, 2.95it/s]
Training 1/1 epoch (loss 1.5877): 95%|ββββββββββ| 149/157 [00:54<00:02, 2.95it/s]
Training 1/1 epoch (loss 1.5877): 96%|ββββββββββ| 150/157 [00:54<00:02, 2.94it/s]
Training 1/1 epoch (loss 1.6538): 96%|ββββββββββ| 150/157 [00:54<00:02, 2.94it/s]
Training 1/1 epoch (loss 1.6538): 96%|ββββββββββ| 151/157 [00:54<00:01, 3.00it/s]
Training 1/1 epoch (loss 1.7373): 96%|ββββββββββ| 151/157 [00:54<00:01, 3.00it/s]
Training 1/1 epoch (loss 1.7373): 97%|ββββββββββ| 152/157 [00:54<00:01, 3.07it/s]
Training 1/1 epoch (loss 1.7477): 97%|ββββββββββ| 152/157 [00:55<00:01, 3.07it/s]
Training 1/1 epoch (loss 1.7477): 97%|ββββββββββ| 153/157 [00:55<00:01, 3.11it/s]
Training 1/1 epoch (loss 1.6433): 97%|ββββββββββ| 153/157 [00:55<00:01, 3.11it/s]
Training 1/1 epoch (loss 1.6433): 98%|ββββββββββ| 154/157 [00:55<00:00, 3.10it/s]
Training 1/1 epoch (loss 1.6935): 98%|ββββββββββ| 154/157 [00:55<00:00, 3.10it/s]
Training 1/1 epoch (loss 1.6935): 99%|ββββββββββ| 155/157 [00:55<00:00, 3.19it/s]
Training 1/1 epoch (loss 1.6658): 99%|ββββββββββ| 155/157 [00:56<00:00, 3.19it/s]
Training 1/1 epoch (loss 1.6658): 99%|ββββββββββ| 156/157 [00:56<00:00, 3.12it/s]
Training 1/1 epoch (loss 1.7351): 99%|ββββββββββ| 156/157 [00:56<00:00, 3.12it/s]
Training 1/1 epoch (loss 1.7351): 100%|ββββββββββ| 157/157 [00:56<00:00, 3.03it/s]
Training 1/1 epoch (loss 1.7351): 100%|ββββββββββ| 157/157 [00:56<00:00, 2.78it/s] |
| tokenizer config file saved in /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/tinyllama-1.5T/tinyllama-1.5T-s3-Q1-5k/tokenizer_config.json |
| Special tokens file saved in /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/tinyllama-1.5T/tinyllama-1.5T-s3-Q1-5k/special_tokens_map.json |
| wandb: ERROR Problem finishing run |
| Exception ignored in atexit callback: <bound method rank_zero_only.<locals>.wrapper of <safe_rlhf.logger.Logger object at 0x15511740a490>> |
| Traceback (most recent call last): |
| File "/home/hansirui_1st/jiayi/resist/setting3/safe_rlhf/utils.py", line 212, in wrapper |
| return func(*args, **kwargs) |
| ^^^^^^^^^^^^^^^^^^^^^ |
| File "/home/hansirui_1st/jiayi/resist/setting3/safe_rlhf/logger.py", line 183, in close |
| self.wandb.finish() |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 449, in wrapper |
| return func(self, *args, **kwargs) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 391, in wrapper |
| return func(self, *args, **kwargs) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 2106, in finish |
| return self._finish(exit_code) |
| ^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 2127, in _finish |
| self._atexit_cleanup(exit_code=exit_code) |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 2352, in _atexit_cleanup |
| self._on_finish() |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 2609, in _on_finish |
| wait_with_progress( |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/mailbox/wait_with_progress.py", line 24, in wait_with_progress |
| return wait_all_with_progress( |
| ^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/mailbox/wait_with_progress.py", line 87, in wait_all_with_progress |
| return asyncio_compat.run(progress_loop_with_timeout) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/lib/asyncio_compat.py", line 27, in run |
| future = executor.submit(runner.run, fn) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/concurrent/futures/thread.py", line 169, in submit |
| raise RuntimeError('cannot schedule new futures after ' |
| RuntimeError: cannot schedule new futures after interpreter shutdown |
|
|