| + deepspeed |
| [rank7]:[W529 03:46:34.714767094 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 7] using GPU 7 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| [rank2]:[W529 03:46:34.219527012 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 2] using GPU 2 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| [rank6]:[W529 03:46:35.366183290 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 6] using GPU 6 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| [rank1]:[W529 03:46:35.489775868 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 1] using GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| [rank3]:[W529 03:46:35.601008144 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 3] using GPU 3 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| [rank4]:[W529 03:46:35.602531075 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 4] using GPU 4 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| [rank5]:[W529 03:46:35.648807422 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 5] using GPU 5 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| [rank0]:[W529 03:46:35.654873357 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 0] using GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/config.json |
| Model config LlamaConfig { |
| "_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T", |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.49.0", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| Model config LlamaConfig { |
| "_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T", |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.49.0", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| Model config LlamaConfig { |
| "_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T", |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.49.0", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/config.json |
| Model config LlamaConfig { |
| "_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T", |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.49.0", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| Model config LlamaConfig { |
| "_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T", |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.49.0", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| Model config LlamaConfig { |
| "_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T", |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.49.0", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| Model config LlamaConfig { |
| "_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T", |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.49.0", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| Model config LlamaConfig { |
| "_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T", |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.49.0", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/model.safetensors |
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/model.safetensors |
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/model.safetensors |
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/model.safetensors |
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/model.safetensors |
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/model.safetensors |
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/model.safetensors |
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/model.safetensors |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
|
|
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/generation_config.json |
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/generation_config.json |
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/generation_config.json |
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/generation_config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/generation_config.json |
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/generation_config.json |
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/generation_config.json |
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| loading file tokenizer.model |
| loading file tokenizer.json |
| loading file added_tokens.json |
| loading file special_tokens_map.json |
| loading file tokenizer_config.json |
| loading file chat_template.jinja |
| loading file tokenizer.model |
| loading file tokenizer.json |
| loading file added_tokens.json |
| loading file special_tokens_map.json |
| loading file tokenizer_config.json |
| loading file chat_template.jinja |
| loading file tokenizer.model |
| loading file tokenizer.json |
| loading file added_tokens.json |
| loading file special_tokens_map.json |
| loading file tokenizer_config.json |
| loading file chat_template.jinja |
| loading file tokenizer.model |
| loading file tokenizer.json |
| loading file added_tokens.json |
| loading file special_tokens_map.json |
| loading file tokenizer_config.json |
| loading file chat_template.jinja |
| loading file tokenizer.model |
| loading file tokenizer.json |
| loading file added_tokens.json |
| loading file special_tokens_map.json |
| loading file tokenizer_config.json |
| loading file chat_template.jinja |
| loading file tokenizer.model |
| loading file tokenizer.json |
| loading file added_tokens.json |
| loading file special_tokens_map.json |
| loading file tokenizer_config.json |
| loading file chat_template.jinja |
| loading file tokenizer.model |
| loading file tokenizer.json |
| loading file added_tokens.json |
| loading file special_tokens_map.json |
| loading file tokenizer_config.json |
| loading file chat_template.jinja |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/generation_config.json |
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| loading file tokenizer.model |
| loading file tokenizer.json |
| loading file added_tokens.json |
| loading file special_tokens_map.json |
| loading file tokenizer_config.json |
| loading file chat_template.jinja |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| Detected CUDA files, patching ldflags |
| Emitting ninja build file /home/hansirui_1st/.cache/torch_extensions/py311_cu124/fused_adam/build.ninja... |
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| /aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/torch/utils/cpp_extension.py:2059: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. |
| If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST']. |
| warnings.warn( |
| Building extension module fused_adam... |
| Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) |
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| Loading extension module fused_adam... |
| Loading extension module fused_adam...Loading extension module fused_adam... |
|
|
| Loading extension module fused_adam... |
| Loading extension module fused_adam... |
| Loading extension module fused_adam... |
| Loading extension module fused_adam... |
| Loading extension module fused_adam... |
| wandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information. |
| `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| wandb: Currently logged in as: xtom to https://api.wandb.ai. Use `wandb login |
| wandb: Tracking run with wandb version 0.19.8 |
| wandb: Run data is saved locally in /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/tinyllama-1T/tinyllama-1T-s3-Q1-2k/wandb/run-20250529_034652-rq0tyx6h |
| wandb: Run `wandb offline` to turn off syncing. |
| wandb: Syncing run tinyllama-1T-s3-Q1-2k |
| wandb: βοΈ View project at https://wandb.ai/xtom/Inverse_Alignment |
| wandb: π View run at https://wandb.ai/xtom/Inverse_Alignment/runs/rq0tyx6h |
|
Training 1/1 epoch: 0%| | 0/63 [00:00<?, ?it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
|
Training 1/1 epoch (loss 2.6241): 0%| | 0/63 [00:10<?, ?it/s]
Training 1/1 epoch (loss 2.6241): 2%|β | 1/63 [00:10<10:49, 10.48s/it]
Training 1/1 epoch (loss 2.2714): 2%|β | 1/63 [00:12<10:49, 10.48s/it]
Training 1/1 epoch (loss 2.2714): 3%|β | 2/63 [00:12<05:46, 5.69s/it]
Training 1/1 epoch (loss 2.3873): 3%|β | 2/63 [00:14<05:46, 5.69s/it]
Training 1/1 epoch (loss 2.3873): 5%|β | 3/63 [00:14<03:50, 3.85s/it]
Training 1/1 epoch (loss 2.3484): 5%|β | 3/63 [00:15<03:50, 3.85s/it]
Training 1/1 epoch (loss 2.3484): 6%|β | 4/63 [00:15<02:48, 2.85s/it]
Training 1/1 epoch (loss 2.3425): 6%|β | 4/63 [00:17<02:48, 2.85s/it]
Training 1/1 epoch (loss 2.3425): 8%|β | 5/63 [00:17<02:31, 2.61s/it]
Training 1/1 epoch (loss 2.3013): 8%|β | 5/63 [00:19<02:31, 2.61s/it]
Training 1/1 epoch (loss 2.3013): 10%|β | 6/63 [00:19<02:01, 2.13s/it]
Training 1/1 epoch (loss 2.3548): 10%|β | 6/63 [00:21<02:01, 2.13s/it]
Training 1/1 epoch (loss 2.3548): 11%|β | 7/63 [00:21<01:59, 2.14s/it]
Training 1/1 epoch (loss 2.4646): 11%|β | 7/63 [00:23<01:59, 2.14s/it]
Training 1/1 epoch (loss 2.4646): 13%|ββ | 8/63 [00:23<01:53, 2.06s/it]
Training 1/1 epoch (loss 2.3824): 13%|ββ | 8/63 [00:24<01:53, 2.06s/it]
Training 1/1 epoch (loss 2.3824): 14%|ββ | 9/63 [00:24<01:43, 1.91s/it]
Training 1/1 epoch (loss 2.3521): 14%|ββ | 9/63 [00:26<01:43, 1.91s/it]
Training 1/1 epoch (loss 2.3521): 16%|ββ | 10/63 [00:26<01:39, 1.88s/it]
Training 1/1 epoch (loss 2.3694): 16%|ββ | 10/63 [00:28<01:39, 1.88s/it]
Training 1/1 epoch (loss 2.3694): 17%|ββ | 11/63 [00:28<01:37, 1.87s/it]
Training 1/1 epoch (loss 2.3652): 17%|ββ | 11/63 [00:30<01:37, 1.87s/it]
Training 1/1 epoch (loss 2.3652): 19%|ββ | 12/63 [00:30<01:32, 1.81s/it]
Training 1/1 epoch (loss 2.2498): 19%|ββ | 12/63 [00:30<01:32, 1.81s/it]
Training 1/1 epoch (loss 2.2498): 21%|ββ | 13/63 [00:30<01:15, 1.51s/it]
Training 1/1 epoch (loss 2.4220): 21%|ββ | 13/63 [00:33<01:15, 1.51s/it]
Training 1/1 epoch (loss 2.4220): 22%|βββ | 14/63 [00:33<01:27, 1.78s/it]
Training 1/1 epoch (loss 2.2120): 22%|βββ | 14/63 [00:35<01:27, 1.78s/it]
Training 1/1 epoch (loss 2.2120): 24%|βββ | 15/63 [00:35<01:33, 1.96s/it]
Training 1/1 epoch (loss 2.2469): 24%|βββ | 15/63 [00:38<01:33, 1.96s/it]
Training 1/1 epoch (loss 2.2469): 25%|βββ | 16/63 [00:38<01:43, 2.19s/it]
Training 1/1 epoch (loss 2.4127): 25%|βββ | 16/63 [00:39<01:43, 2.19s/it]
Training 1/1 epoch (loss 2.4127): 27%|βββ | 17/63 [00:39<01:28, 1.92s/it]
Training 1/1 epoch (loss 2.2184): 27%|βββ | 17/63 [00:42<01:28, 1.92s/it]
Training 1/1 epoch (loss 2.2184): 29%|βββ | 18/63 [00:42<01:31, 2.03s/it]
Training 1/1 epoch (loss 2.3181): 29%|βββ | 18/63 [00:44<01:31, 2.03s/it]
Training 1/1 epoch (loss 2.3181): 30%|βββ | 19/63 [00:44<01:34, 2.16s/it]
Training 1/1 epoch (loss 2.3320): 30%|βββ | 19/63 [00:45<01:34, 2.16s/it]
Training 1/1 epoch (loss 2.3320): 32%|ββββ | 20/63 [00:45<01:17, 1.80s/it]
Training 1/1 epoch (loss 2.3601): 32%|ββββ | 20/63 [00:46<01:17, 1.80s/it]
Training 1/1 epoch (loss 2.3601): 33%|ββββ | 21/63 [00:46<01:11, 1.71s/it]
Training 1/1 epoch (loss 2.4236): 33%|ββββ | 21/63 [00:48<01:11, 1.71s/it]
Training 1/1 epoch (loss 2.4236): 35%|ββββ | 22/63 [00:48<01:03, 1.56s/it]
Training 1/1 epoch (loss 2.1815): 35%|ββββ | 22/63 [00:50<01:03, 1.56s/it]
Training 1/1 epoch (loss 2.1815): 37%|ββββ | 23/63 [00:50<01:09, 1.73s/it]
Training 1/1 epoch (loss 2.3397): 37%|ββββ | 23/63 [00:52<01:09, 1.73s/it]
Training 1/1 epoch (loss 2.3397): 38%|ββββ | 24/63 [00:52<01:13, 1.89s/it]
Training 1/1 epoch (loss 2.2450): 38%|ββββ | 24/63 [00:53<01:13, 1.89s/it]
Training 1/1 epoch (loss 2.2450): 40%|ββββ | 25/63 [00:53<01:04, 1.69s/it]
Training 1/1 epoch (loss 2.3588): 40%|ββββ | 25/63 [00:55<01:04, 1.69s/it]
Training 1/1 epoch (loss 2.3588): 41%|βββββ | 26/63 [00:55<01:06, 1.78s/it]
Training 1/1 epoch (loss 2.2800): 41%|βββββ | 26/63 [00:57<01:06, 1.78s/it]
Training 1/1 epoch (loss 2.2800): 43%|βββββ | 27/63 [00:57<00:58, 1.64s/it]
Training 1/1 epoch (loss 2.1983): 43%|βββββ | 27/63 [00:59<00:58, 1.64s/it]
Training 1/1 epoch (loss 2.1983): 44%|βββββ | 28/63 [00:59<01:00, 1.73s/it]
Training 1/1 epoch (loss 2.4104): 44%|βββββ | 28/63 [01:01<01:00, 1.73s/it]
Training 1/1 epoch (loss 2.4104): 46%|βββββ | 29/63 [01:01<01:06, 1.96s/it]
Training 1/1 epoch (loss 2.2371): 46%|βββββ | 29/63 [01:03<01:06, 1.96s/it]
Training 1/1 epoch (loss 2.2371): 48%|βββββ | 30/63 [01:03<01:03, 1.92s/it]
Training 1/1 epoch (loss 2.4156): 48%|βββββ | 30/63 [01:05<01:03, 1.92s/it]
Training 1/1 epoch (loss 2.4156): 49%|βββββ | 31/63 [01:05<01:04, 2.02s/it]
Training 1/1 epoch (loss 2.2388): 49%|βββββ | 31/63 [01:08<01:04, 2.02s/it]
Training 1/1 epoch (loss 2.2388): 51%|βββββ | 32/63 [01:08<01:09, 2.24s/it]
Training 1/1 epoch (loss 2.1797): 51%|βββββ | 32/63 [01:10<01:09, 2.24s/it]
Training 1/1 epoch (loss 2.1797): 52%|ββββββ | 33/63 [01:10<01:05, 2.17s/it]
Training 1/1 epoch (loss 2.0785): 52%|ββββββ | 33/63 [01:12<01:05, 2.17s/it]
Training 1/1 epoch (loss 2.0785): 54%|ββββββ | 34/63 [01:12<01:02, 2.14s/it]
Training 1/1 epoch (loss 2.1466): 54%|ββββββ | 34/63 [01:13<01:02, 2.14s/it]
Training 1/1 epoch (loss 2.1466): 56%|ββββββ | 35/63 [01:13<00:53, 1.91s/it]
Training 1/1 epoch (loss 2.0627): 56%|ββββββ | 35/63 [01:15<00:53, 1.91s/it]
Training 1/1 epoch (loss 2.0627): 57%|ββββββ | 36/63 [01:15<00:49, 1.83s/it]
Training 1/1 epoch (loss 2.3559): 57%|ββββββ | 36/63 [01:16<00:49, 1.83s/it]
Training 1/1 epoch (loss 2.3559): 59%|ββββββ | 37/63 [01:16<00:42, 1.63s/it]
Training 1/1 epoch (loss 2.1486): 59%|ββββββ | 37/63 [01:18<00:42, 1.63s/it]
Training 1/1 epoch (loss 2.1486): 60%|ββββββ | 38/63 [01:18<00:42, 1.70s/it]
Training 1/1 epoch (loss 2.0500): 60%|ββββββ | 38/63 [01:19<00:42, 1.70s/it]
Training 1/1 epoch (loss 2.0500): 62%|βββββββ | 39/63 [01:19<00:38, 1.61s/it]
Training 1/1 epoch (loss 2.0975): 62%|βββββββ | 39/63 [01:22<00:38, 1.61s/it]
Training 1/1 epoch (loss 2.0975): 63%|βββββββ | 40/63 [01:22<00:41, 1.81s/it]
Training 1/1 epoch (loss 2.0596): 63%|βββββββ | 40/63 [01:24<00:41, 1.81s/it]
Training 1/1 epoch (loss 2.0596): 65%|βββββββ | 41/63 [01:24<00:43, 1.97s/it]
Training 1/1 epoch (loss 2.1414): 65%|βββββββ | 41/63 [01:25<00:43, 1.97s/it]
Training 1/1 epoch (loss 2.1414): 67%|βββββββ | 42/63 [01:25<00:37, 1.78s/it]
Training 1/1 epoch (loss 2.2219): 67%|βββββββ | 42/63 [01:27<00:37, 1.78s/it]
Training 1/1 epoch (loss 2.2219): 68%|βββββββ | 43/63 [01:27<00:34, 1.71s/it]
Training 1/1 epoch (loss 2.0297): 68%|βββββββ | 43/63 [01:28<00:34, 1.71s/it]
Training 1/1 epoch (loss 2.0297): 70%|βββββββ | 44/63 [01:28<00:31, 1.65s/it]
Training 1/1 epoch (loss 2.0731): 70%|βββββββ | 44/63 [01:31<00:31, 1.65s/it]
Training 1/1 epoch (loss 2.0731): 71%|ββββββββ | 45/63 [01:31<00:32, 1.83s/it]
Training 1/1 epoch (loss 2.0504): 71%|ββββββββ | 45/63 [01:32<00:32, 1.83s/it]
Training 1/1 epoch (loss 2.0504): 73%|ββββββββ | 46/63 [01:32<00:28, 1.65s/it]
Training 1/1 epoch (loss 2.0236): 73%|ββββββββ | 46/63 [01:34<00:28, 1.65s/it]
Training 1/1 epoch (loss 2.0236): 75%|ββββββββ | 47/63 [01:34<00:27, 1.73s/it]
Training 1/1 epoch (loss 2.1700): 75%|ββββββββ | 47/63 [01:35<00:27, 1.73s/it]
Training 1/1 epoch (loss 2.1700): 76%|ββββββββ | 48/63 [01:35<00:24, 1.62s/it]
Training 1/1 epoch (loss 2.0542): 76%|ββββββββ | 48/63 [01:37<00:24, 1.62s/it]
Training 1/1 epoch (loss 2.0542): 78%|ββββββββ | 49/63 [01:37<00:24, 1.73s/it]
Training 1/1 epoch (loss 2.0704): 78%|ββββββββ | 49/63 [01:39<00:24, 1.73s/it]
Training 1/1 epoch (loss 2.0704): 79%|ββββββββ | 50/63 [01:39<00:22, 1.75s/it]
Training 1/1 epoch (loss 2.1153): 79%|ββββββββ | 50/63 [01:40<00:22, 1.75s/it]
Training 1/1 epoch (loss 2.1153): 81%|ββββββββ | 51/63 [01:40<00:19, 1.60s/it]
Training 1/1 epoch (loss 2.1367): 81%|ββββββββ | 51/63 [01:42<00:19, 1.60s/it]
Training 1/1 epoch (loss 2.1367): 83%|βββββββββ | 52/63 [01:42<00:17, 1.58s/it]
Training 1/1 epoch (loss 2.0709): 83%|βββββββββ | 52/63 [01:43<00:17, 1.58s/it]
Training 1/1 epoch (loss 2.0709): 84%|βββββββββ | 53/63 [01:43<00:15, 1.52s/it]
Training 1/1 epoch (loss 2.1604): 84%|βββββββββ | 53/63 [01:45<00:15, 1.52s/it]
Training 1/1 epoch (loss 2.1604): 86%|βββββββββ | 54/63 [01:45<00:14, 1.62s/it]
Training 1/1 epoch (loss 2.2015): 86%|βββββββββ | 54/63 [01:46<00:14, 1.62s/it]
Training 1/1 epoch (loss 2.2015): 87%|βββββββββ | 55/63 [01:46<00:12, 1.52s/it]
Training 1/1 epoch (loss 1.9832): 87%|βββββββββ | 55/63 [01:49<00:12, 1.52s/it]
Training 1/1 epoch (loss 1.9832): 89%|βββββββββ | 56/63 [01:49<00:13, 1.86s/it]
Training 1/1 epoch (loss 2.0302): 89%|βββββββββ | 56/63 [01:51<00:13, 1.86s/it]
Training 1/1 epoch (loss 2.0302): 90%|βββββββββ | 57/63 [01:51<00:11, 1.94s/it]
Training 1/1 epoch (loss 2.0724): 90%|βββββββββ | 57/63 [01:52<00:11, 1.94s/it]
Training 1/1 epoch (loss 2.0724): 92%|ββββββββββ| 58/63 [01:52<00:08, 1.75s/it]
Training 1/1 epoch (loss 2.1017): 92%|ββββββββββ| 58/63 [01:54<00:08, 1.75s/it]
Training 1/1 epoch (loss 2.1017): 94%|ββββββββββ| 59/63 [01:54<00:07, 1.76s/it]
Training 1/1 epoch (loss 2.0331): 94%|ββββββββββ| 59/63 [01:56<00:07, 1.76s/it]
Training 1/1 epoch (loss 2.0331): 95%|ββββββββββ| 60/63 [01:56<00:05, 1.83s/it]
Training 1/1 epoch (loss 2.0199): 95%|ββββββββββ| 60/63 [01:57<00:05, 1.83s/it]
Training 1/1 epoch (loss 2.0199): 97%|ββββββββββ| 61/63 [01:57<00:03, 1.63s/it]
Training 1/1 epoch (loss 2.0146): 97%|ββββββββββ| 61/63 [02:00<00:03, 1.63s/it]
Training 1/1 epoch (loss 2.0146): 98%|ββββββββββ| 62/63 [02:00<00:01, 1.86s/it]
Training 1/1 epoch (loss 2.1965): 98%|ββββββββββ| 62/63 [02:01<00:01, 1.86s/it]
Training 1/1 epoch (loss 2.1965): 100%|ββββββββββ| 63/63 [02:01<00:00, 1.77s/it]
Training 1/1 epoch (loss 2.1965): 100%|ββββββββββ| 63/63 [02:01<00:00, 1.93s/it] |
| tokenizer config file saved in /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/tinyllama-1T/tinyllama-1T-s3-Q1-2k/tokenizer_config.json |
| Special tokens file saved in /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/tinyllama-1T/tinyllama-1T-s3-Q1-2k/special_tokens_map.json |
| wandb: ERROR Problem finishing run |
| Exception ignored in atexit callback: <bound method rank_zero_only.<locals>.wrapper of <safe_rlhf.logger.Logger object at 0x1551040ffe50>> |
| Traceback (most recent call last): |
| File "/home/hansirui_1st/jiayi/resist/setting3/safe_rlhf/utils.py", line 212, in wrapper |
| return func(*args, **kwargs) |
| ^^^^^^^^^^^^^^^^^^^^^ |
| File "/home/hansirui_1st/jiayi/resist/setting3/safe_rlhf/logger.py", line 183, in close |
| self.wandb.finish() |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 449, in wrapper |
| return func(self, *args, **kwargs) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 391, in wrapper |
| return func(self, *args, **kwargs) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 2106, in finish |
| return self._finish(exit_code) |
| ^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 2127, in _finish |
| self._atexit_cleanup(exit_code=exit_code) |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 2352, in _atexit_cleanup |
| self._on_finish() |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 2609, in _on_finish |
| wait_with_progress( |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/mailbox/wait_with_progress.py", line 24, in wait_with_progress |
| return wait_all_with_progress( |
| ^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/mailbox/wait_with_progress.py", line 87, in wait_all_with_progress |
| return asyncio_compat.run(progress_loop_with_timeout) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/lib/asyncio_compat.py", line 27, in run |
| future = executor.submit(runner.run, fn) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/concurrent/futures/thread.py", line 169, in submit |
| raise RuntimeError('cannot schedule new futures after ' |
| RuntimeError: cannot schedule new futures after interpreter shutdown |
|
|