| + deepspeed |
| [rank3]:[W529 17:24:04.397531853 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 3] using GPU 3 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| [rank5]:[W529 17:24:04.397593410 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 5] using GPU 5 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| [rank0]:[W529 17:24:04.399876136 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 0] using GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| [rank6]:[W529 17:24:04.422029780 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 6] using GPU 6 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| [rank4]:[W529 17:24:04.424584761 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 4] using GPU 4 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| [rank1]:[W529 17:24:04.427455340 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 1] using GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| [rank7]:[W529 17:24:04.430855057 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 7] using GPU 7 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| [rank2]:[W529 17:24:04.431000205 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 2] using GPU 2 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/config.json |
| Model config LlamaConfig { |
| "_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T", |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.49.0", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| Model config LlamaConfig { |
| "_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T", |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.49.0", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| Model config LlamaConfig { |
| "_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T", |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.49.0", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| Model config LlamaConfig { |
| "_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T", |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.49.0", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| Model config LlamaConfig { |
| "_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T", |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.49.0", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| Model config LlamaConfig { |
| "_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T", |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.49.0", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| Model config LlamaConfig { |
| "_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T", |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.49.0", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| Model config LlamaConfig { |
| "_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T", |
| "architectures": [ |
| "LlamaForCausalLM" |
| ], |
| "attention_bias": false, |
| "attention_dropout": 0.0, |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "head_dim": 64, |
| "hidden_act": "silu", |
| "hidden_size": 2048, |
| "initializer_range": 0.02, |
| "intermediate_size": 5632, |
| "max_position_embeddings": 2048, |
| "mlp_bias": false, |
| "model_type": "llama", |
| "num_attention_heads": 32, |
| "num_hidden_layers": 22, |
| "num_key_value_heads": 4, |
| "pretraining_tp": 1, |
| "rms_norm_eps": 1e-05, |
| "rope_scaling": null, |
| "rope_theta": 10000.0, |
| "tie_word_embeddings": false, |
| "torch_dtype": "float32", |
| "transformers_version": "4.49.0", |
| "use_cache": true, |
| "vocab_size": 32000 |
| } |
|
|
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/model.safetensors |
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/model.safetensors |
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/model.safetensors |
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/model.safetensors |
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/model.safetensors |
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/model.safetensors |
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/model.safetensors |
| loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/model.safetensors |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Will use torch_dtype=torch.float32 as defined in model's config object |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Instantiating LlamaForCausalLM model under default dtype torch.float32. |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2 |
| } |
|
|
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/generation_config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/generation_config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/generation_config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/generation_config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/generation_config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/generation_config.json |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/generation_config.json |
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| loading file tokenizer.model |
| loading file tokenizer.model |
| loading file tokenizer.model |
| loading file tokenizer.json |
| loading file tokenizer.json |
| loading file tokenizer.json |
| loading file added_tokens.json |
| loading file added_tokens.json |
| loading file added_tokens.json |
| loading file special_tokens_map.json |
| loading file special_tokens_map.json |
| loading file special_tokens_map.json |
| loading file tokenizer_config.json |
| loading file tokenizer_config.json |
| loading file tokenizer_config.json |
| loading file chat_template.jinja |
| loading file chat_template.jinja |
| loading file chat_template.jinja |
| loading file tokenizer.model |
| loading file tokenizer.model |
| loading file tokenizer.json |
| loading file added_tokens.json |
| loading file tokenizer.json |
| loading file special_tokens_map.json |
| loading file tokenizer_config.json |
| loading file added_tokens.json |
| loading file chat_template.jinja |
| loading file tokenizer.model |
| loading file special_tokens_map.json |
| loading file tokenizer_config.json |
| loading file tokenizer.json |
| loading file chat_template.jinja |
| loading file added_tokens.json |
| loading file special_tokens_map.json |
| loading file tokenizer_config.json |
| loading file chat_template.jinja |
| loading file tokenizer.model |
| loading file tokenizer.json |
| loading file added_tokens.json |
| loading file special_tokens_map.json |
| loading file tokenizer_config.json |
| loading file chat_template.jinja |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| All model checkpoint weights were used when initializing LlamaForCausalLM. |
|
|
| All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T. |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/generation_config.json |
| Generate config GenerationConfig { |
| "bos_token_id": 1, |
| "eos_token_id": 2, |
| "max_length": 2048, |
| "pad_token_id": 0 |
| } |
|
|
| loading file tokenizer.model |
| loading file tokenizer.json |
| loading file added_tokens.json |
| loading file special_tokens_map.json |
| loading file tokenizer_config.json |
| loading file chat_template.jinja |
| You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False` |
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root...Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
|
|
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| Detected CUDA files, patching ldflags |
| Emitting ninja build file /home/hansirui_1st/.cache/torch_extensions/py311_cu124/fused_adam/build.ninja... |
| /aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/torch/utils/cpp_extension.py:2059: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. |
| If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST']. |
| warnings.warn( |
| Building extension module fused_adam... |
| Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) |
| Loading extension module fused_adam... |
| Loading extension module fused_adam...Loading extension module fused_adam... |
|
|
| Loading extension module fused_adam... |
| Loading extension module fused_adam... |
| Loading extension module fused_adam...Loading extension module fused_adam... |
|
|
| Loading extension module fused_adam... |
| wandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information. |
| `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| wandb: Currently logged in as: xtom to https://api.wandb.ai. Use `wandb login |
| wandb: Tracking run with wandb version 0.19.8 |
| wandb: Run data is saved locally in /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/tinyllama-2T/tinyllama-2T-s3-Q1-10k/wandb/run-20250529_172422-u8m0zro2 |
| wandb: Run `wandb offline` to turn off syncing. |
| wandb: Syncing run tinyllama-2T-s3-Q1-10k |
| wandb: βοΈ View project at https://wandb.ai/xtom/Inverse_Alignment |
| wandb: π View run at https://wandb.ai/xtom/Inverse_Alignment/runs/u8m0zro2 |
|
Training 1/1 epoch: 0%| | 0/313 [00:00<?, ?it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
|
Training 1/1 epoch (loss 2.0926): 0%| | 0/313 [00:05<?, ?it/s]
Training 1/1 epoch (loss 2.0926): 0%| | 1/313 [00:05<27:40, 5.32s/it]
Training 1/1 epoch (loss 2.0378): 0%| | 1/313 [00:06<27:40, 5.32s/it]
Training 1/1 epoch (loss 2.0378): 1%| | 2/313 [00:06<16:13, 3.13s/it]
Training 1/1 epoch (loss 2.0299): 1%| | 2/313 [00:07<16:13, 3.13s/it]
Training 1/1 epoch (loss 2.0299): 1%| | 3/313 [00:07<09:32, 1.85s/it]
Training 1/1 epoch (loss 2.0874): 1%| | 3/313 [00:07<09:32, 1.85s/it]
Training 1/1 epoch (loss 2.0874): 1%|β | 4/313 [00:07<06:23, 1.24s/it]
Training 1/1 epoch (loss 2.1063): 1%|β | 4/313 [00:07<06:23, 1.24s/it]
Training 1/1 epoch (loss 2.1063): 2%|β | 5/313 [00:07<04:42, 1.09it/s]
Training 1/1 epoch (loss 2.0525): 2%|β | 5/313 [00:08<04:42, 1.09it/s]
Training 1/1 epoch (loss 2.0525): 2%|β | 6/313 [00:08<03:39, 1.40it/s]
Training 1/1 epoch (loss 2.0463): 2%|β | 6/313 [00:08<03:39, 1.40it/s]
Training 1/1 epoch (loss 2.0463): 2%|β | 7/313 [00:08<02:58, 1.71it/s]
Training 1/1 epoch (loss 2.0052): 2%|β | 7/313 [00:08<02:58, 1.71it/s]
Training 1/1 epoch (loss 2.0052): 3%|β | 8/313 [00:08<02:38, 1.93it/s]
Training 1/1 epoch (loss 1.9870): 3%|β | 8/313 [00:09<02:38, 1.93it/s]
Training 1/1 epoch (loss 1.9870): 3%|β | 9/313 [00:09<02:19, 2.18it/s]
Training 1/1 epoch (loss 2.0778): 3%|β | 9/313 [00:09<02:19, 2.18it/s]
Training 1/1 epoch (loss 2.0778): 3%|β | 10/313 [00:09<02:05, 2.41it/s]
Training 1/1 epoch (loss 2.0321): 3%|β | 10/313 [00:09<02:05, 2.41it/s]
Training 1/1 epoch (loss 2.0321): 4%|β | 11/313 [00:09<01:59, 2.53it/s]
Training 1/1 epoch (loss 1.9115): 4%|β | 11/313 [00:10<01:59, 2.53it/s]
Training 1/1 epoch (loss 1.9115): 4%|β | 12/313 [00:10<01:53, 2.65it/s]
Training 1/1 epoch (loss 1.9781): 4%|β | 12/313 [00:10<01:53, 2.65it/s]
Training 1/1 epoch (loss 1.9781): 4%|β | 13/313 [00:10<01:47, 2.80it/s]
Training 1/1 epoch (loss 2.0730): 4%|β | 13/313 [00:10<01:47, 2.80it/s]
Training 1/1 epoch (loss 2.0730): 4%|β | 14/313 [00:10<01:40, 2.96it/s]
Training 1/1 epoch (loss 1.9696): 4%|β | 14/313 [00:11<01:40, 2.96it/s]
Training 1/1 epoch (loss 1.9696): 5%|β | 15/313 [00:11<01:41, 2.93it/s]
Training 1/1 epoch (loss 2.0028): 5%|β | 15/313 [00:11<01:41, 2.93it/s]
Training 1/1 epoch (loss 2.0028): 5%|β | 16/313 [00:11<01:42, 2.91it/s]
Training 1/1 epoch (loss 2.0004): 5%|β | 16/313 [00:11<01:42, 2.91it/s]
Training 1/1 epoch (loss 2.0004): 5%|β | 17/313 [00:11<01:41, 2.92it/s]
Training 1/1 epoch (loss 2.0214): 5%|β | 17/313 [00:12<01:41, 2.92it/s]
Training 1/1 epoch (loss 2.0214): 6%|β | 18/313 [00:12<01:41, 2.90it/s]
Training 1/1 epoch (loss 1.9294): 6%|β | 18/313 [00:12<01:41, 2.90it/s]
Training 1/1 epoch (loss 1.9294): 6%|β | 19/313 [00:12<01:39, 2.97it/s]
Training 1/1 epoch (loss 2.0602): 6%|β | 19/313 [00:12<01:39, 2.97it/s]
Training 1/1 epoch (loss 2.0602): 6%|β | 20/313 [00:12<01:36, 3.04it/s]
Training 1/1 epoch (loss 1.8947): 6%|β | 20/313 [00:13<01:36, 3.04it/s]
Training 1/1 epoch (loss 1.8947): 7%|β | 21/313 [00:13<01:33, 3.11it/s]
Training 1/1 epoch (loss 2.1213): 7%|β | 21/313 [00:13<01:33, 3.11it/s]
Training 1/1 epoch (loss 2.1213): 7%|β | 22/313 [00:13<01:32, 3.13it/s]
Training 1/1 epoch (loss 2.0758): 7%|β | 22/313 [00:13<01:32, 3.13it/s]
Training 1/1 epoch (loss 2.0758): 7%|β | 23/313 [00:13<01:35, 3.05it/s]
Training 1/1 epoch (loss 1.9203): 7%|β | 23/313 [00:14<01:35, 3.05it/s]
Training 1/1 epoch (loss 1.9203): 8%|β | 24/313 [00:14<01:37, 2.98it/s]
Training 1/1 epoch (loss 2.0804): 8%|β | 24/313 [00:14<01:37, 2.98it/s]
Training 1/1 epoch (loss 2.0804): 8%|β | 25/313 [00:14<01:33, 3.07it/s]
Training 1/1 epoch (loss 1.9755): 8%|β | 25/313 [00:14<01:33, 3.07it/s]
Training 1/1 epoch (loss 1.9755): 8%|β | 26/313 [00:14<01:31, 3.13it/s]
Training 1/1 epoch (loss 1.8770): 8%|β | 26/313 [00:15<01:31, 3.13it/s]
Training 1/1 epoch (loss 1.8770): 9%|β | 27/313 [00:15<01:32, 3.10it/s]
Training 1/1 epoch (loss 1.9673): 9%|β | 27/313 [00:15<01:32, 3.10it/s]
Training 1/1 epoch (loss 1.9673): 9%|β | 28/313 [00:15<01:30, 3.15it/s]
Training 1/1 epoch (loss 2.0302): 9%|β | 28/313 [00:15<01:30, 3.15it/s]
Training 1/1 epoch (loss 2.0302): 9%|β | 29/313 [00:15<01:29, 3.19it/s]
Training 1/1 epoch (loss 1.9636): 9%|β | 29/313 [00:16<01:29, 3.19it/s]
Training 1/1 epoch (loss 1.9636): 10%|β | 30/313 [00:16<01:33, 3.04it/s]
Training 1/1 epoch (loss 2.0194): 10%|β | 30/313 [00:16<01:33, 3.04it/s]
Training 1/1 epoch (loss 2.0194): 10%|β | 31/313 [00:16<01:37, 2.88it/s]
Training 1/1 epoch (loss 1.9885): 10%|β | 31/313 [00:16<01:37, 2.88it/s]
Training 1/1 epoch (loss 1.9885): 10%|β | 32/313 [00:16<01:35, 2.95it/s]
Training 1/1 epoch (loss 1.8595): 10%|β | 32/313 [00:17<01:35, 2.95it/s]
Training 1/1 epoch (loss 1.8595): 11%|β | 33/313 [00:17<01:32, 3.02it/s]
Training 1/1 epoch (loss 1.8983): 11%|β | 33/313 [00:17<01:32, 3.02it/s]
Training 1/1 epoch (loss 1.8983): 11%|β | 34/313 [00:17<01:29, 3.11it/s]
Training 1/1 epoch (loss 1.8375): 11%|β | 34/313 [00:17<01:29, 3.11it/s]
Training 1/1 epoch (loss 1.8375): 11%|β | 35/313 [00:17<01:29, 3.10it/s]
Training 1/1 epoch (loss 1.9185): 11%|β | 35/313 [00:18<01:29, 3.10it/s]
Training 1/1 epoch (loss 1.9185): 12%|ββ | 36/313 [00:18<01:28, 3.14it/s]
Training 1/1 epoch (loss 1.8495): 12%|ββ | 36/313 [00:18<01:28, 3.14it/s]
Training 1/1 epoch (loss 1.8495): 12%|ββ | 37/313 [00:18<01:26, 3.17it/s]
Training 1/1 epoch (loss 1.8587): 12%|ββ | 37/313 [00:18<01:26, 3.17it/s]
Training 1/1 epoch (loss 1.8587): 12%|ββ | 38/313 [00:18<01:25, 3.22it/s]
Training 1/1 epoch (loss 1.8771): 12%|ββ | 38/313 [00:18<01:25, 3.22it/s]
Training 1/1 epoch (loss 1.8771): 12%|ββ | 39/313 [00:18<01:25, 3.22it/s]
Training 1/1 epoch (loss 1.9401): 12%|ββ | 39/313 [00:19<01:25, 3.22it/s]
Training 1/1 epoch (loss 1.9401): 13%|ββ | 40/313 [00:19<01:25, 3.19it/s]
Training 1/1 epoch (loss 1.8532): 13%|ββ | 40/313 [00:19<01:25, 3.19it/s]
Training 1/1 epoch (loss 1.8532): 13%|ββ | 41/313 [00:19<01:25, 3.18it/s]
Training 1/1 epoch (loss 1.8698): 13%|ββ | 41/313 [00:19<01:25, 3.18it/s]
Training 1/1 epoch (loss 1.8698): 13%|ββ | 42/313 [00:19<01:25, 3.16it/s]
Training 1/1 epoch (loss 1.7666): 13%|ββ | 42/313 [00:20<01:25, 3.16it/s]
Training 1/1 epoch (loss 1.7666): 14%|ββ | 43/313 [00:20<01:24, 3.18it/s]
Training 1/1 epoch (loss 1.9162): 14%|ββ | 43/313 [00:20<01:24, 3.18it/s]
Training 1/1 epoch (loss 1.9162): 14%|ββ | 44/313 [00:20<01:24, 3.18it/s]
Training 1/1 epoch (loss 1.9371): 14%|ββ | 44/313 [00:20<01:24, 3.18it/s]
Training 1/1 epoch (loss 1.9371): 14%|ββ | 45/313 [00:20<01:23, 3.21it/s]
Training 1/1 epoch (loss 1.8585): 14%|ββ | 45/313 [00:21<01:23, 3.21it/s]
Training 1/1 epoch (loss 1.8585): 15%|ββ | 46/313 [00:21<01:23, 3.20it/s]
Training 1/1 epoch (loss 1.9039): 15%|ββ | 46/313 [00:21<01:23, 3.20it/s]
Training 1/1 epoch (loss 1.9039): 15%|ββ | 47/313 [00:21<01:22, 3.23it/s]
Training 1/1 epoch (loss 1.8362): 15%|ββ | 47/313 [00:21<01:22, 3.23it/s]
Training 1/1 epoch (loss 1.8362): 15%|ββ | 48/313 [00:21<01:26, 3.05it/s]
Training 1/1 epoch (loss 1.8363): 15%|ββ | 48/313 [00:22<01:26, 3.05it/s]
Training 1/1 epoch (loss 1.8363): 16%|ββ | 49/313 [00:22<01:30, 2.92it/s]
Training 1/1 epoch (loss 1.8857): 16%|ββ | 49/313 [00:22<01:30, 2.92it/s]
Training 1/1 epoch (loss 1.8857): 16%|ββ | 50/313 [00:22<01:28, 2.98it/s]
Training 1/1 epoch (loss 1.9434): 16%|ββ | 50/313 [00:22<01:28, 2.98it/s]
Training 1/1 epoch (loss 1.9434): 16%|ββ | 51/313 [00:22<01:24, 3.09it/s]
Training 1/1 epoch (loss 1.9819): 16%|ββ | 51/313 [00:23<01:24, 3.09it/s]
Training 1/1 epoch (loss 1.9819): 17%|ββ | 52/313 [00:23<01:22, 3.17it/s]
Training 1/1 epoch (loss 1.7833): 17%|ββ | 52/313 [00:23<01:22, 3.17it/s]
Training 1/1 epoch (loss 1.7833): 17%|ββ | 53/313 [00:23<01:21, 3.20it/s]
Training 1/1 epoch (loss 1.8340): 17%|ββ | 53/313 [00:23<01:21, 3.20it/s]
Training 1/1 epoch (loss 1.8340): 17%|ββ | 54/313 [00:23<01:20, 3.21it/s]
Training 1/1 epoch (loss 1.8865): 17%|ββ | 54/313 [00:24<01:20, 3.21it/s]
Training 1/1 epoch (loss 1.8865): 18%|ββ | 55/313 [00:24<01:23, 3.08it/s]
Training 1/1 epoch (loss 1.8031): 18%|ββ | 55/313 [00:24<01:23, 3.08it/s]
Training 1/1 epoch (loss 1.8031): 18%|ββ | 56/313 [00:24<01:22, 3.11it/s]
Training 1/1 epoch (loss 1.8663): 18%|ββ | 56/313 [00:24<01:22, 3.11it/s]
Training 1/1 epoch (loss 1.8663): 18%|ββ | 57/313 [00:24<01:22, 3.12it/s]
Training 1/1 epoch (loss 1.6980): 18%|ββ | 57/313 [00:25<01:22, 3.12it/s]
Training 1/1 epoch (loss 1.6980): 19%|ββ | 58/313 [00:25<01:20, 3.16it/s]
Training 1/1 epoch (loss 1.6969): 19%|ββ | 58/313 [00:25<01:20, 3.16it/s]
Training 1/1 epoch (loss 1.6969): 19%|ββ | 59/313 [00:25<01:18, 3.25it/s]
Training 1/1 epoch (loss 1.8177): 19%|ββ | 59/313 [00:25<01:18, 3.25it/s]
Training 1/1 epoch (loss 1.8177): 19%|ββ | 60/313 [00:25<01:17, 3.27it/s]
Training 1/1 epoch (loss 1.7262): 19%|ββ | 60/313 [00:25<01:17, 3.27it/s]
Training 1/1 epoch (loss 1.7262): 19%|ββ | 61/313 [00:25<01:21, 3.10it/s]
Training 1/1 epoch (loss 1.7318): 19%|ββ | 61/313 [00:26<01:21, 3.10it/s]
Training 1/1 epoch (loss 1.7318): 20%|ββ | 62/313 [00:26<01:20, 3.12it/s]
Training 1/1 epoch (loss 1.8102): 20%|ββ | 62/313 [00:26<01:20, 3.12it/s]
Training 1/1 epoch (loss 1.8102): 20%|ββ | 63/313 [00:26<01:18, 3.19it/s]
Training 1/1 epoch (loss 1.7257): 20%|ββ | 63/313 [00:26<01:18, 3.19it/s]
Training 1/1 epoch (loss 1.7257): 20%|ββ | 64/313 [00:26<01:23, 2.96it/s]
Training 1/1 epoch (loss 1.7902): 20%|ββ | 64/313 [00:27<01:23, 2.96it/s]
Training 1/1 epoch (loss 1.7902): 21%|ββ | 65/313 [00:27<01:22, 2.99it/s]
Training 1/1 epoch (loss 1.7745): 21%|ββ | 65/313 [00:27<01:22, 2.99it/s]
Training 1/1 epoch (loss 1.7745): 21%|ββ | 66/313 [00:27<01:21, 3.05it/s]
Training 1/1 epoch (loss 1.7651): 21%|ββ | 66/313 [00:27<01:21, 3.05it/s]
Training 1/1 epoch (loss 1.7651): 21%|βββ | 67/313 [00:27<01:21, 3.03it/s]
Training 1/1 epoch (loss 1.6642): 21%|βββ | 67/313 [00:28<01:21, 3.03it/s]
Training 1/1 epoch (loss 1.6642): 22%|βββ | 68/313 [00:28<01:18, 3.13it/s]
Training 1/1 epoch (loss 1.7589): 22%|βββ | 68/313 [00:28<01:18, 3.13it/s]
Training 1/1 epoch (loss 1.7589): 22%|βββ | 69/313 [00:28<01:17, 3.16it/s]
Training 1/1 epoch (loss 1.6992): 22%|βββ | 69/313 [00:28<01:17, 3.16it/s]
Training 1/1 epoch (loss 1.6992): 22%|βββ | 70/313 [00:28<01:16, 3.17it/s]
Training 1/1 epoch (loss 1.7180): 22%|βββ | 70/313 [00:29<01:16, 3.17it/s]
Training 1/1 epoch (loss 1.7180): 23%|βββ | 71/313 [00:29<01:16, 3.18it/s]
Training 1/1 epoch (loss 1.7383): 23%|βββ | 71/313 [00:29<01:16, 3.18it/s]
Training 1/1 epoch (loss 1.7383): 23%|βββ | 72/313 [00:29<01:17, 3.11it/s]
Training 1/1 epoch (loss 1.7956): 23%|βββ | 72/313 [00:29<01:17, 3.11it/s]
Training 1/1 epoch (loss 1.7956): 23%|βββ | 73/313 [00:29<01:20, 2.99it/s]
Training 1/1 epoch (loss 1.6938): 23%|βββ | 73/313 [00:30<01:20, 2.99it/s]
Training 1/1 epoch (loss 1.6938): 24%|βββ | 74/313 [00:30<01:21, 2.92it/s]
Training 1/1 epoch (loss 1.7160): 24%|βββ | 74/313 [00:30<01:21, 2.92it/s]
Training 1/1 epoch (loss 1.7160): 24%|βββ | 75/313 [00:30<01:18, 3.03it/s]
Training 1/1 epoch (loss 1.7032): 24%|βββ | 75/313 [00:30<01:18, 3.03it/s]
Training 1/1 epoch (loss 1.7032): 24%|βββ | 76/313 [00:30<01:16, 3.11it/s]
Training 1/1 epoch (loss 1.7636): 24%|βββ | 76/313 [00:31<01:16, 3.11it/s]
Training 1/1 epoch (loss 1.7636): 25%|βββ | 77/313 [00:31<01:14, 3.16it/s]
Training 1/1 epoch (loss 1.8648): 25%|βββ | 77/313 [00:31<01:14, 3.16it/s]
Training 1/1 epoch (loss 1.8648): 25%|βββ | 78/313 [00:31<01:13, 3.19it/s]
Training 1/1 epoch (loss 1.5523): 25%|βββ | 78/313 [00:31<01:13, 3.19it/s]
Training 1/1 epoch (loss 1.5523): 25%|βββ | 79/313 [00:31<01:12, 3.22it/s]
Training 1/1 epoch (loss 1.7232): 25%|βββ | 79/313 [00:32<01:12, 3.22it/s]
Training 1/1 epoch (loss 1.7232): 26%|βββ | 80/313 [00:32<01:21, 2.87it/s]
Training 1/1 epoch (loss 1.6719): 26%|βββ | 80/313 [00:32<01:21, 2.87it/s]
Training 1/1 epoch (loss 1.6719): 26%|βββ | 81/313 [00:32<01:19, 2.94it/s]
Training 1/1 epoch (loss 1.7004): 26%|βββ | 81/313 [00:32<01:19, 2.94it/s]
Training 1/1 epoch (loss 1.7004): 26%|βββ | 82/313 [00:32<01:16, 3.02it/s]
Training 1/1 epoch (loss 1.6513): 26%|βββ | 82/313 [00:33<01:16, 3.02it/s]
Training 1/1 epoch (loss 1.6513): 27%|βββ | 83/313 [00:33<01:13, 3.12it/s]
Training 1/1 epoch (loss 1.7058): 27%|βββ | 83/313 [00:33<01:13, 3.12it/s]
Training 1/1 epoch (loss 1.7058): 27%|βββ | 84/313 [00:33<01:13, 3.14it/s]
Training 1/1 epoch (loss 1.5802): 27%|βββ | 84/313 [00:33<01:13, 3.14it/s]
Training 1/1 epoch (loss 1.5802): 27%|βββ | 85/313 [00:33<01:12, 3.15it/s]
Training 1/1 epoch (loss 1.6224): 27%|βββ | 85/313 [00:34<01:12, 3.15it/s]
Training 1/1 epoch (loss 1.6224): 27%|βββ | 86/313 [00:34<01:12, 3.11it/s]
Training 1/1 epoch (loss 1.6919): 27%|βββ | 86/313 [00:34<01:12, 3.11it/s]
Training 1/1 epoch (loss 1.6919): 28%|βββ | 87/313 [00:34<01:13, 3.08it/s]
Training 1/1 epoch (loss 1.6789): 28%|βββ | 87/313 [00:34<01:13, 3.08it/s]
Training 1/1 epoch (loss 1.6789): 28%|βββ | 88/313 [00:34<01:13, 3.08it/s]
Training 1/1 epoch (loss 1.6910): 28%|βββ | 88/313 [00:35<01:13, 3.08it/s]
Training 1/1 epoch (loss 1.6910): 28%|βββ | 89/313 [00:35<01:11, 3.15it/s]
Training 1/1 epoch (loss 1.7508): 28%|βββ | 89/313 [00:35<01:11, 3.15it/s]
Training 1/1 epoch (loss 1.7508): 29%|βββ | 90/313 [00:35<01:09, 3.21it/s]
Training 1/1 epoch (loss 1.7003): 29%|βββ | 90/313 [00:35<01:09, 3.21it/s]
Training 1/1 epoch (loss 1.7003): 29%|βββ | 91/313 [00:35<01:10, 3.16it/s]
Training 1/1 epoch (loss 1.6340): 29%|βββ | 91/313 [00:36<01:10, 3.16it/s]
Training 1/1 epoch (loss 1.6340): 29%|βββ | 92/313 [00:36<01:10, 3.14it/s]
Training 1/1 epoch (loss 1.6711): 29%|βββ | 92/313 [00:36<01:10, 3.14it/s]
Training 1/1 epoch (loss 1.6711): 30%|βββ | 93/313 [00:36<01:09, 3.15it/s]
Training 1/1 epoch (loss 1.6785): 30%|βββ | 93/313 [00:36<01:09, 3.15it/s]
Training 1/1 epoch (loss 1.6785): 30%|βββ | 94/313 [00:36<01:08, 3.18it/s]
Training 1/1 epoch (loss 1.8088): 30%|βββ | 94/313 [00:36<01:08, 3.18it/s]
Training 1/1 epoch (loss 1.8088): 30%|βββ | 95/313 [00:36<01:08, 3.20it/s]
Training 1/1 epoch (loss 1.6991): 30%|βββ | 95/313 [00:37<01:08, 3.20it/s]
Training 1/1 epoch (loss 1.6991): 31%|βββ | 96/313 [00:37<01:09, 3.12it/s]
Training 1/1 epoch (loss 1.4703): 31%|βββ | 96/313 [00:37<01:09, 3.12it/s]
Training 1/1 epoch (loss 1.4703): 31%|βββ | 97/313 [00:37<01:09, 3.09it/s]
Training 1/1 epoch (loss 1.6407): 31%|βββ | 97/313 [00:37<01:09, 3.09it/s]
Training 1/1 epoch (loss 1.6407): 31%|ββββ | 98/313 [00:37<01:10, 3.06it/s]
Training 1/1 epoch (loss 1.8100): 31%|ββββ | 98/313 [00:38<01:10, 3.06it/s]
Training 1/1 epoch (loss 1.8100): 32%|ββββ | 99/313 [00:38<01:12, 2.97it/s]
Training 1/1 epoch (loss 1.6703): 32%|ββββ | 99/313 [00:38<01:12, 2.97it/s]
Training 1/1 epoch (loss 1.6703): 32%|ββββ | 100/313 [00:38<01:08, 3.10it/s]
Training 1/1 epoch (loss 1.6727): 32%|ββββ | 100/313 [00:38<01:08, 3.10it/s]
Training 1/1 epoch (loss 1.6727): 32%|ββββ | 101/313 [00:38<01:08, 3.11it/s]
Training 1/1 epoch (loss 1.7239): 32%|ββββ | 101/313 [00:39<01:08, 3.11it/s]
Training 1/1 epoch (loss 1.7239): 33%|ββββ | 102/313 [00:39<01:07, 3.15it/s]
Training 1/1 epoch (loss 1.6383): 33%|ββββ | 102/313 [00:39<01:07, 3.15it/s]
Training 1/1 epoch (loss 1.6383): 33%|ββββ | 103/313 [00:39<01:05, 3.20it/s]
Training 1/1 epoch (loss 1.6323): 33%|ββββ | 103/313 [00:39<01:05, 3.20it/s]
Training 1/1 epoch (loss 1.6323): 33%|ββββ | 104/313 [00:39<01:06, 3.16it/s]
Training 1/1 epoch (loss 1.7299): 33%|ββββ | 104/313 [00:40<01:06, 3.16it/s]
Training 1/1 epoch (loss 1.7299): 34%|ββββ | 105/313 [00:40<01:08, 3.02it/s]
Training 1/1 epoch (loss 1.8477): 34%|ββββ | 105/313 [00:40<01:08, 3.02it/s]
Training 1/1 epoch (loss 1.8477): 34%|ββββ | 106/313 [00:40<01:07, 3.08it/s]
Training 1/1 epoch (loss 1.6369): 34%|ββββ | 106/313 [00:40<01:07, 3.08it/s]
Training 1/1 epoch (loss 1.6369): 34%|ββββ | 107/313 [00:40<01:05, 3.13it/s]
Training 1/1 epoch (loss 1.6914): 34%|ββββ | 107/313 [00:41<01:05, 3.13it/s]
Training 1/1 epoch (loss 1.6914): 35%|ββββ | 108/313 [00:41<01:04, 3.16it/s]
Training 1/1 epoch (loss 1.6073): 35%|ββββ | 108/313 [00:41<01:04, 3.16it/s]
Training 1/1 epoch (loss 1.6073): 35%|ββββ | 109/313 [00:41<01:03, 3.23it/s]
Training 1/1 epoch (loss 1.6467): 35%|ββββ | 109/313 [00:41<01:03, 3.23it/s]
Training 1/1 epoch (loss 1.6467): 35%|ββββ | 110/313 [00:41<01:02, 3.25it/s]
Training 1/1 epoch (loss 1.6176): 35%|ββββ | 110/313 [00:42<01:02, 3.25it/s]
Training 1/1 epoch (loss 1.6176): 35%|ββββ | 111/313 [00:42<01:05, 3.07it/s]
Training 1/1 epoch (loss 1.6768): 35%|ββββ | 111/313 [00:42<01:05, 3.07it/s]
Training 1/1 epoch (loss 1.6768): 36%|ββββ | 112/313 [00:42<01:04, 3.11it/s]
Training 1/1 epoch (loss 1.5737): 36%|ββββ | 112/313 [00:42<01:04, 3.11it/s]
Training 1/1 epoch (loss 1.5737): 36%|ββββ | 113/313 [00:42<01:03, 3.15it/s]
Training 1/1 epoch (loss 1.6504): 36%|ββββ | 113/313 [00:43<01:03, 3.15it/s]
Training 1/1 epoch (loss 1.6504): 36%|ββββ | 114/313 [00:43<01:02, 3.17it/s]
Training 1/1 epoch (loss 1.6179): 36%|ββββ | 114/313 [00:43<01:02, 3.17it/s]
Training 1/1 epoch (loss 1.6179): 37%|ββββ | 115/313 [00:43<01:01, 3.22it/s]
Training 1/1 epoch (loss 1.7255): 37%|ββββ | 115/313 [00:43<01:01, 3.22it/s]
Training 1/1 epoch (loss 1.7255): 37%|ββββ | 116/313 [00:43<01:01, 3.19it/s]
Training 1/1 epoch (loss 1.7449): 37%|ββββ | 116/313 [00:43<01:01, 3.19it/s]
Training 1/1 epoch (loss 1.7449): 37%|ββββ | 117/313 [00:43<01:03, 3.11it/s]
Training 1/1 epoch (loss 1.6939): 37%|ββββ | 117/313 [00:44<01:03, 3.11it/s]
Training 1/1 epoch (loss 1.6939): 38%|ββββ | 118/313 [00:44<01:04, 3.01it/s]
Training 1/1 epoch (loss 1.6857): 38%|ββββ | 118/313 [00:44<01:04, 3.01it/s]
Training 1/1 epoch (loss 1.6857): 38%|ββββ | 119/313 [00:44<01:03, 3.05it/s]
Training 1/1 epoch (loss 1.6002): 38%|ββββ | 119/313 [00:45<01:03, 3.05it/s]
Training 1/1 epoch (loss 1.6002): 38%|ββββ | 120/313 [00:45<01:03, 3.02it/s]
Training 1/1 epoch (loss 1.7256): 38%|ββββ | 120/313 [00:45<01:03, 3.02it/s]
Training 1/1 epoch (loss 1.7256): 39%|ββββ | 121/313 [00:45<01:01, 3.11it/s]
Training 1/1 epoch (loss 1.6940): 39%|ββββ | 121/313 [00:45<01:01, 3.11it/s]
Training 1/1 epoch (loss 1.6940): 39%|ββββ | 122/313 [00:45<01:02, 3.08it/s]
Training 1/1 epoch (loss 1.4625): 39%|ββββ | 122/313 [00:45<01:02, 3.08it/s]
Training 1/1 epoch (loss 1.4625): 39%|ββββ | 123/313 [00:45<01:01, 3.10it/s]
Training 1/1 epoch (loss 1.7884): 39%|ββββ | 123/313 [00:46<01:01, 3.10it/s]
Training 1/1 epoch (loss 1.7884): 40%|ββββ | 124/313 [00:46<01:01, 3.05it/s]
Training 1/1 epoch (loss 1.7799): 40%|ββββ | 124/313 [00:46<01:01, 3.05it/s]
Training 1/1 epoch (loss 1.7799): 40%|ββββ | 125/313 [00:46<00:59, 3.14it/s]
Training 1/1 epoch (loss 1.7574): 40%|ββββ | 125/313 [00:46<00:59, 3.14it/s]
Training 1/1 epoch (loss 1.7574): 40%|ββββ | 126/313 [00:46<01:00, 3.10it/s]
Training 1/1 epoch (loss 1.7055): 40%|ββββ | 126/313 [00:47<01:00, 3.10it/s]
Training 1/1 epoch (loss 1.7055): 41%|ββββ | 127/313 [00:47<00:59, 3.15it/s]
Training 1/1 epoch (loss 1.5698): 41%|ββββ | 127/313 [00:47<00:59, 3.15it/s]
Training 1/1 epoch (loss 1.5698): 41%|ββββ | 128/313 [00:47<00:58, 3.14it/s]
Training 1/1 epoch (loss 1.6432): 41%|ββββ | 128/313 [00:47<00:58, 3.14it/s]
Training 1/1 epoch (loss 1.6432): 41%|ββββ | 129/313 [00:47<01:04, 2.84it/s]
Training 1/1 epoch (loss 1.6049): 41%|ββββ | 129/313 [00:48<01:04, 2.84it/s]
Training 1/1 epoch (loss 1.6049): 42%|βββββ | 130/313 [00:48<01:03, 2.90it/s]
Training 1/1 epoch (loss 1.6483): 42%|βββββ | 130/313 [00:48<01:03, 2.90it/s]
Training 1/1 epoch (loss 1.6483): 42%|βββββ | 131/313 [00:48<01:01, 2.98it/s]
Training 1/1 epoch (loss 1.6256): 42%|βββββ | 131/313 [00:48<01:01, 2.98it/s]
Training 1/1 epoch (loss 1.6256): 42%|βββββ | 132/313 [00:48<00:58, 3.11it/s]
Training 1/1 epoch (loss 1.6438): 42%|βββββ | 132/313 [00:49<00:58, 3.11it/s]
Training 1/1 epoch (loss 1.6438): 42%|βββββ | 133/313 [00:49<00:56, 3.19it/s]
Training 1/1 epoch (loss 1.6236): 42%|βββββ | 133/313 [00:49<00:56, 3.19it/s]
Training 1/1 epoch (loss 1.6236): 43%|βββββ | 134/313 [00:49<00:57, 3.14it/s]
Training 1/1 epoch (loss 1.6764): 43%|βββββ | 134/313 [00:49<00:57, 3.14it/s]
Training 1/1 epoch (loss 1.6764): 43%|βββββ | 135/313 [00:49<00:55, 3.24it/s]
Training 1/1 epoch (loss 1.6242): 43%|βββββ | 135/313 [00:50<00:55, 3.24it/s]
Training 1/1 epoch (loss 1.6242): 43%|βββββ | 136/313 [00:50<00:57, 3.07it/s]
Training 1/1 epoch (loss 1.7148): 43%|βββββ | 136/313 [00:50<00:57, 3.07it/s]
Training 1/1 epoch (loss 1.7148): 44%|βββββ | 137/313 [00:50<00:55, 3.16it/s]
Training 1/1 epoch (loss 1.5899): 44%|βββββ | 137/313 [00:50<00:55, 3.16it/s]
Training 1/1 epoch (loss 1.5899): 44%|βββββ | 138/313 [00:50<00:54, 3.19it/s]
Training 1/1 epoch (loss 1.6368): 44%|βββββ | 138/313 [00:51<00:54, 3.19it/s]
Training 1/1 epoch (loss 1.6368): 44%|βββββ | 139/313 [00:51<00:53, 3.25it/s]
Training 1/1 epoch (loss 1.6045): 44%|βββββ | 139/313 [00:51<00:53, 3.25it/s]
Training 1/1 epoch (loss 1.6045): 45%|βββββ | 140/313 [00:51<00:52, 3.28it/s]
Training 1/1 epoch (loss 1.6905): 45%|βββββ | 140/313 [00:51<00:52, 3.28it/s]
Training 1/1 epoch (loss 1.6905): 45%|βββββ | 141/313 [00:51<00:52, 3.26it/s]
Training 1/1 epoch (loss 1.6994): 45%|βββββ | 141/313 [00:52<00:52, 3.26it/s]
Training 1/1 epoch (loss 1.6994): 45%|βββββ | 142/313 [00:52<00:54, 3.14it/s]
Training 1/1 epoch (loss 1.5698): 45%|βββββ | 142/313 [00:52<00:54, 3.14it/s]
Training 1/1 epoch (loss 1.5698): 46%|βββββ | 143/313 [00:52<00:54, 3.09it/s]
Training 1/1 epoch (loss 1.6753): 46%|βββββ | 143/313 [00:52<00:54, 3.09it/s]
Training 1/1 epoch (loss 1.6753): 46%|βββββ | 144/313 [00:52<00:54, 3.11it/s]
Training 1/1 epoch (loss 1.5735): 46%|βββββ | 144/313 [00:53<00:54, 3.11it/s]
Training 1/1 epoch (loss 1.5735): 46%|βββββ | 145/313 [00:53<00:56, 2.98it/s]
Training 1/1 epoch (loss 1.6457): 46%|βββββ | 145/313 [00:53<00:56, 2.98it/s]
Training 1/1 epoch (loss 1.6457): 47%|βββββ | 146/313 [00:53<00:53, 3.10it/s]
Training 1/1 epoch (loss 1.5575): 47%|βββββ | 146/313 [00:53<00:53, 3.10it/s]
Training 1/1 epoch (loss 1.5575): 47%|βββββ | 147/313 [00:53<00:53, 3.13it/s]
Training 1/1 epoch (loss 1.6680): 47%|βββββ | 147/313 [00:54<00:53, 3.13it/s]
Training 1/1 epoch (loss 1.6680): 47%|βββββ | 148/313 [00:54<00:53, 3.09it/s]
Training 1/1 epoch (loss 1.6678): 47%|βββββ | 148/313 [00:54<00:53, 3.09it/s]
Training 1/1 epoch (loss 1.6678): 48%|βββββ | 149/313 [00:54<00:54, 3.02it/s]
Training 1/1 epoch (loss 1.7486): 48%|βββββ | 149/313 [00:54<00:54, 3.02it/s]
Training 1/1 epoch (loss 1.7486): 48%|βββββ | 150/313 [00:54<00:52, 3.12it/s]
Training 1/1 epoch (loss 1.6242): 48%|βββββ | 150/313 [00:54<00:52, 3.12it/s]
Training 1/1 epoch (loss 1.6242): 48%|βββββ | 151/313 [00:54<00:51, 3.15it/s]
Training 1/1 epoch (loss 1.6788): 48%|βββββ | 151/313 [00:55<00:51, 3.15it/s]
Training 1/1 epoch (loss 1.6788): 49%|βββββ | 152/313 [00:55<00:51, 3.13it/s]
Training 1/1 epoch (loss 1.4764): 49%|βββββ | 152/313 [00:55<00:51, 3.13it/s]
Training 1/1 epoch (loss 1.4764): 49%|βββββ | 153/313 [00:55<00:50, 3.18it/s]
Training 1/1 epoch (loss 1.6394): 49%|βββββ | 153/313 [00:55<00:50, 3.18it/s]
Training 1/1 epoch (loss 1.6394): 49%|βββββ | 154/313 [00:55<00:51, 3.11it/s]
Training 1/1 epoch (loss 1.7444): 49%|βββββ | 154/313 [00:56<00:51, 3.11it/s]
Training 1/1 epoch (loss 1.7444): 50%|βββββ | 155/313 [00:56<00:51, 3.06it/s]
Training 1/1 epoch (loss 1.7393): 50%|βββββ | 155/313 [00:56<00:51, 3.06it/s]
Training 1/1 epoch (loss 1.7393): 50%|βββββ | 156/313 [00:56<00:50, 3.08it/s]
Training 1/1 epoch (loss 1.5543): 50%|βββββ | 156/313 [00:56<00:50, 3.08it/s]
Training 1/1 epoch (loss 1.5543): 50%|βββββ | 157/313 [00:56<00:49, 3.16it/s]
Training 1/1 epoch (loss 1.5309): 50%|βββββ | 157/313 [00:57<00:49, 3.16it/s]
Training 1/1 epoch (loss 1.5309): 50%|βββββ | 158/313 [00:57<00:50, 3.05it/s]
Training 1/1 epoch (loss 1.5597): 50%|βββββ | 158/313 [00:57<00:50, 3.05it/s]
Training 1/1 epoch (loss 1.5597): 51%|βββββ | 159/313 [00:57<00:49, 3.12it/s]
Training 1/1 epoch (loss 1.6047): 51%|βββββ | 159/313 [00:57<00:49, 3.12it/s]
Training 1/1 epoch (loss 1.6047): 51%|βββββ | 160/313 [00:57<00:51, 2.96it/s]
Training 1/1 epoch (loss 1.6550): 51%|βββββ | 160/313 [00:58<00:51, 2.96it/s]
Training 1/1 epoch (loss 1.6550): 51%|ββββββ | 161/313 [00:58<00:53, 2.85it/s]
Training 1/1 epoch (loss 1.5532): 51%|ββββββ | 161/313 [00:58<00:53, 2.85it/s]
Training 1/1 epoch (loss 1.5532): 52%|ββββββ | 162/313 [00:58<00:50, 2.96it/s]
Training 1/1 epoch (loss 1.6690): 52%|ββββββ | 162/313 [00:58<00:50, 2.96it/s]
Training 1/1 epoch (loss 1.6690): 52%|ββββββ | 163/313 [00:58<00:49, 3.01it/s]
Training 1/1 epoch (loss 1.5638): 52%|ββββββ | 163/313 [00:59<00:49, 3.01it/s]
Training 1/1 epoch (loss 1.5638): 52%|ββββββ | 164/313 [00:59<00:47, 3.11it/s]
Training 1/1 epoch (loss 1.6524): 52%|ββββββ | 164/313 [00:59<00:47, 3.11it/s]
Training 1/1 epoch (loss 1.6524): 53%|ββββββ | 165/313 [00:59<00:47, 3.15it/s]
Training 1/1 epoch (loss 1.6897): 53%|ββββββ | 165/313 [00:59<00:47, 3.15it/s]
Training 1/1 epoch (loss 1.6897): 53%|ββββββ | 166/313 [00:59<00:47, 3.10it/s]
Training 1/1 epoch (loss 1.6250): 53%|ββββββ | 166/313 [01:00<00:47, 3.10it/s]
Training 1/1 epoch (loss 1.6250): 53%|ββββββ | 167/313 [01:00<00:48, 3.02it/s]
Training 1/1 epoch (loss 1.6794): 53%|ββββββ | 167/313 [01:00<00:48, 3.02it/s]
Training 1/1 epoch (loss 1.6794): 54%|ββββββ | 168/313 [01:00<00:47, 3.06it/s]
Training 1/1 epoch (loss 1.7051): 54%|ββββββ | 168/313 [01:00<00:47, 3.06it/s]
Training 1/1 epoch (loss 1.7051): 54%|ββββββ | 169/313 [01:00<00:46, 3.08it/s]
Training 1/1 epoch (loss 1.6418): 54%|ββββββ | 169/313 [01:01<00:46, 3.08it/s]
Training 1/1 epoch (loss 1.6418): 54%|ββββββ | 170/313 [01:01<00:45, 3.12it/s]
Training 1/1 epoch (loss 1.6070): 54%|ββββββ | 170/313 [01:01<00:45, 3.12it/s]
Training 1/1 epoch (loss 1.6070): 55%|ββββββ | 171/313 [01:01<00:45, 3.14it/s]
Training 1/1 epoch (loss 1.6956): 55%|ββββββ | 171/313 [01:01<00:45, 3.14it/s]
Training 1/1 epoch (loss 1.6956): 55%|ββββββ | 172/313 [01:01<00:43, 3.21it/s]
Training 1/1 epoch (loss 1.6971): 55%|ββββββ | 172/313 [01:02<00:43, 3.21it/s]
Training 1/1 epoch (loss 1.6971): 55%|ββββββ | 173/313 [01:02<00:44, 3.12it/s]
Training 1/1 epoch (loss 1.6629): 55%|ββββββ | 173/313 [01:02<00:44, 3.12it/s]
Training 1/1 epoch (loss 1.6629): 56%|ββββββ | 174/313 [01:02<00:46, 2.96it/s]
Training 1/1 epoch (loss 1.6480): 56%|ββββββ | 174/313 [01:02<00:46, 2.96it/s]
Training 1/1 epoch (loss 1.6480): 56%|ββββββ | 175/313 [01:02<00:48, 2.82it/s]
Training 1/1 epoch (loss 1.7248): 56%|ββββββ | 175/313 [01:03<00:48, 2.82it/s]
Training 1/1 epoch (loss 1.7248): 56%|ββββββ | 176/313 [01:03<00:48, 2.84it/s]
Training 1/1 epoch (loss 1.4787): 56%|ββββββ | 176/313 [01:03<00:48, 2.84it/s]
Training 1/1 epoch (loss 1.4787): 57%|ββββββ | 177/313 [01:03<00:46, 2.94it/s]
Training 1/1 epoch (loss 1.6037): 57%|ββββββ | 177/313 [01:03<00:46, 2.94it/s]
Training 1/1 epoch (loss 1.6037): 57%|ββββββ | 178/313 [01:03<00:45, 2.99it/s]
Training 1/1 epoch (loss 1.6034): 57%|ββββββ | 178/313 [01:04<00:45, 2.99it/s]
Training 1/1 epoch (loss 1.6034): 57%|ββββββ | 179/313 [01:04<00:44, 2.99it/s]
Training 1/1 epoch (loss 1.5712): 57%|ββββββ | 179/313 [01:04<00:44, 2.99it/s]
Training 1/1 epoch (loss 1.5712): 58%|ββββββ | 180/313 [01:04<00:42, 3.09it/s]
Training 1/1 epoch (loss 1.7338): 58%|ββββββ | 180/313 [01:04<00:42, 3.09it/s]
Training 1/1 epoch (loss 1.7338): 58%|ββββββ | 181/313 [01:04<00:41, 3.16it/s]
Training 1/1 epoch (loss 1.6811): 58%|ββββββ | 181/313 [01:05<00:41, 3.16it/s]
Training 1/1 epoch (loss 1.6811): 58%|ββββββ | 182/313 [01:05<00:40, 3.21it/s]
Training 1/1 epoch (loss 1.7510): 58%|ββββββ | 182/313 [01:05<00:40, 3.21it/s]
Training 1/1 epoch (loss 1.7510): 58%|ββββββ | 183/313 [01:05<00:40, 3.24it/s]
Training 1/1 epoch (loss 1.6298): 58%|ββββββ | 183/313 [01:05<00:40, 3.24it/s]
Training 1/1 epoch (loss 1.6298): 59%|ββββββ | 184/313 [01:05<00:40, 3.20it/s]
Training 1/1 epoch (loss 1.6324): 59%|ββββββ | 184/313 [01:06<00:40, 3.20it/s]
Training 1/1 epoch (loss 1.6324): 59%|ββββββ | 185/313 [01:06<00:40, 3.18it/s]
Training 1/1 epoch (loss 1.5875): 59%|ββββββ | 185/313 [01:06<00:40, 3.18it/s]
Training 1/1 epoch (loss 1.5875): 59%|ββββββ | 186/313 [01:06<00:42, 3.00it/s]
Training 1/1 epoch (loss 1.6676): 59%|ββββββ | 186/313 [01:06<00:42, 3.00it/s]
Training 1/1 epoch (loss 1.6676): 60%|ββββββ | 187/313 [01:06<00:40, 3.12it/s]
Training 1/1 epoch (loss 1.6576): 60%|ββββββ | 187/313 [01:07<00:40, 3.12it/s]
Training 1/1 epoch (loss 1.6576): 60%|ββββββ | 188/313 [01:07<00:39, 3.19it/s]
Training 1/1 epoch (loss 1.6691): 60%|ββββββ | 188/313 [01:07<00:39, 3.19it/s]
Training 1/1 epoch (loss 1.6691): 60%|ββββββ | 189/313 [01:07<00:38, 3.22it/s]
Training 1/1 epoch (loss 1.5939): 60%|ββββββ | 189/313 [01:07<00:38, 3.22it/s]
Training 1/1 epoch (loss 1.5939): 61%|ββββββ | 190/313 [01:07<00:37, 3.24it/s]
Training 1/1 epoch (loss 1.5956): 61%|ββββββ | 190/313 [01:07<00:37, 3.24it/s]
Training 1/1 epoch (loss 1.5956): 61%|ββββββ | 191/313 [01:07<00:38, 3.13it/s]
Training 1/1 epoch (loss 1.6894): 61%|ββββββ | 191/313 [01:08<00:38, 3.13it/s]
Training 1/1 epoch (loss 1.6894): 61%|βββββββ | 192/313 [01:08<00:39, 3.08it/s]
Training 1/1 epoch (loss 1.6332): 61%|βββββββ | 192/313 [01:08<00:39, 3.08it/s]
Training 1/1 epoch (loss 1.6332): 62%|βββββββ | 193/313 [01:08<00:38, 3.09it/s]
Training 1/1 epoch (loss 1.5714): 62%|βββββββ | 193/313 [01:08<00:38, 3.09it/s]
Training 1/1 epoch (loss 1.5714): 62%|βββββββ | 194/313 [01:08<00:39, 3.04it/s]
Training 1/1 epoch (loss 1.6249): 62%|βββββββ | 194/313 [01:09<00:39, 3.04it/s]
Training 1/1 epoch (loss 1.6249): 62%|βββββββ | 195/313 [01:09<00:37, 3.15it/s]
Training 1/1 epoch (loss 1.6267): 62%|βββββββ | 195/313 [01:09<00:37, 3.15it/s]
Training 1/1 epoch (loss 1.6267): 63%|βββββββ | 196/313 [01:09<00:36, 3.22it/s]
Training 1/1 epoch (loss 1.7291): 63%|βββββββ | 196/313 [01:09<00:36, 3.22it/s]
Training 1/1 epoch (loss 1.7291): 63%|βββββββ | 197/313 [01:09<00:36, 3.19it/s]
Training 1/1 epoch (loss 1.6425): 63%|βββββββ | 197/313 [01:10<00:36, 3.19it/s]
Training 1/1 epoch (loss 1.6425): 63%|βββββββ | 198/313 [01:10<00:36, 3.14it/s]
Training 1/1 epoch (loss 1.6296): 63%|βββββββ | 198/313 [01:10<00:36, 3.14it/s]
Training 1/1 epoch (loss 1.6296): 64%|βββββββ | 199/313 [01:10<00:36, 3.13it/s]
Training 1/1 epoch (loss 1.6762): 64%|βββββββ | 199/313 [01:10<00:36, 3.13it/s]
Training 1/1 epoch (loss 1.6762): 64%|βββββββ | 200/313 [01:10<00:35, 3.15it/s]
Training 1/1 epoch (loss 1.5672): 64%|βββββββ | 200/313 [01:11<00:35, 3.15it/s]
Training 1/1 epoch (loss 1.5672): 64%|βββββββ | 201/313 [01:11<00:35, 3.19it/s]
Training 1/1 epoch (loss 1.7375): 64%|βββββββ | 201/313 [01:11<00:35, 3.19it/s]
Training 1/1 epoch (loss 1.7375): 65%|βββββββ | 202/313 [01:11<00:34, 3.18it/s]
Training 1/1 epoch (loss 1.6607): 65%|βββββββ | 202/313 [01:11<00:34, 3.18it/s]
Training 1/1 epoch (loss 1.6607): 65%|βββββββ | 203/313 [01:11<00:33, 3.25it/s]
Training 1/1 epoch (loss 1.6216): 65%|βββββββ | 203/313 [01:12<00:33, 3.25it/s]
Training 1/1 epoch (loss 1.6216): 65%|βββββββ | 204/313 [01:12<00:34, 3.20it/s]
Training 1/1 epoch (loss 1.6251): 65%|βββββββ | 204/313 [01:12<00:34, 3.20it/s]
Training 1/1 epoch (loss 1.6251): 65%|βββββββ | 205/313 [01:12<00:34, 3.16it/s]
Training 1/1 epoch (loss 1.5731): 65%|βββββββ | 205/313 [01:12<00:34, 3.16it/s]
Training 1/1 epoch (loss 1.5731): 66%|βββββββ | 206/313 [01:12<00:33, 3.22it/s]
Training 1/1 epoch (loss 1.6598): 66%|βββββββ | 206/313 [01:12<00:33, 3.22it/s]
Training 1/1 epoch (loss 1.6598): 66%|βββββββ | 207/313 [01:12<00:32, 3.22it/s]
Training 1/1 epoch (loss 1.5949): 66%|βββββββ | 207/313 [01:13<00:32, 3.22it/s]
Training 1/1 epoch (loss 1.5949): 66%|βββββββ | 208/313 [01:13<00:32, 3.23it/s]
Training 1/1 epoch (loss 1.4813): 66%|βββββββ | 208/313 [01:13<00:32, 3.23it/s]
Training 1/1 epoch (loss 1.4813): 67%|βββββββ | 209/313 [01:13<00:32, 3.24it/s]
Training 1/1 epoch (loss 1.7383): 67%|βββββββ | 209/313 [01:13<00:32, 3.24it/s]
Training 1/1 epoch (loss 1.7383): 67%|βββββββ | 210/313 [01:13<00:34, 3.02it/s]
Training 1/1 epoch (loss 1.6412): 67%|βββββββ | 210/313 [01:14<00:34, 3.02it/s]
Training 1/1 epoch (loss 1.6412): 67%|βββββββ | 211/313 [01:14<00:34, 2.97it/s]
Training 1/1 epoch (loss 1.5699): 67%|βββββββ | 211/313 [01:14<00:34, 2.97it/s]
Training 1/1 epoch (loss 1.5699): 68%|βββββββ | 212/313 [01:14<00:32, 3.08it/s]
Training 1/1 epoch (loss 1.6134): 68%|βββββββ | 212/313 [01:14<00:32, 3.08it/s]
Training 1/1 epoch (loss 1.6134): 68%|βββββββ | 213/313 [01:14<00:32, 3.10it/s]
Training 1/1 epoch (loss 1.7413): 68%|βββββββ | 213/313 [01:15<00:32, 3.10it/s]
Training 1/1 epoch (loss 1.7413): 68%|βββββββ | 214/313 [01:15<00:31, 3.18it/s]
Training 1/1 epoch (loss 1.6385): 68%|βββββββ | 214/313 [01:15<00:31, 3.18it/s]
Training 1/1 epoch (loss 1.6385): 69%|βββββββ | 215/313 [01:15<00:30, 3.20it/s]
Training 1/1 epoch (loss 1.5449): 69%|βββββββ | 215/313 [01:15<00:30, 3.20it/s]
Training 1/1 epoch (loss 1.5449): 69%|βββββββ | 216/313 [01:15<00:30, 3.20it/s]
Training 1/1 epoch (loss 1.7295): 69%|βββββββ | 216/313 [01:16<00:30, 3.20it/s]
Training 1/1 epoch (loss 1.7295): 69%|βββββββ | 217/313 [01:16<00:30, 3.11it/s]
Training 1/1 epoch (loss 1.7213): 69%|βββββββ | 217/313 [01:16<00:30, 3.11it/s]
Training 1/1 epoch (loss 1.7213): 70%|βββββββ | 218/313 [01:16<00:30, 3.11it/s]
Training 1/1 epoch (loss 1.5007): 70%|βββββββ | 218/313 [01:16<00:30, 3.11it/s]
Training 1/1 epoch (loss 1.5007): 70%|βββββββ | 219/313 [01:16<00:29, 3.21it/s]
Training 1/1 epoch (loss 1.6654): 70%|βββββββ | 219/313 [01:17<00:29, 3.21it/s]
Training 1/1 epoch (loss 1.6654): 70%|βββββββ | 220/313 [01:17<00:28, 3.21it/s]
Training 1/1 epoch (loss 1.5504): 70%|βββββββ | 220/313 [01:17<00:28, 3.21it/s]
Training 1/1 epoch (loss 1.5504): 71%|βββββββ | 221/313 [01:17<00:28, 3.19it/s]
Training 1/1 epoch (loss 1.6087): 71%|βββββββ | 221/313 [01:17<00:28, 3.19it/s]
Training 1/1 epoch (loss 1.6087): 71%|βββββββ | 222/313 [01:17<00:29, 3.07it/s]
Training 1/1 epoch (loss 1.5768): 71%|βββββββ | 222/313 [01:18<00:29, 3.07it/s]
Training 1/1 epoch (loss 1.5768): 71%|βββββββ | 223/313 [01:18<00:28, 3.13it/s]
Training 1/1 epoch (loss 1.5215): 71%|βββββββ | 223/313 [01:18<00:28, 3.13it/s]
Training 1/1 epoch (loss 1.5215): 72%|ββββββββ | 224/313 [01:18<00:29, 2.99it/s]
Training 1/1 epoch (loss 1.7573): 72%|ββββββββ | 224/313 [01:18<00:29, 2.99it/s]
Training 1/1 epoch (loss 1.7573): 72%|ββββββββ | 225/313 [01:18<00:28, 3.06it/s]
Training 1/1 epoch (loss 1.5708): 72%|ββββββββ | 225/313 [01:19<00:28, 3.06it/s]
Training 1/1 epoch (loss 1.5708): 72%|ββββββββ | 226/313 [01:19<00:28, 3.06it/s]
Training 1/1 epoch (loss 1.5598): 72%|ββββββββ | 226/313 [01:19<00:28, 3.06it/s]
Training 1/1 epoch (loss 1.5598): 73%|ββββββββ | 227/313 [01:19<00:27, 3.17it/s]
Training 1/1 epoch (loss 1.5481): 73%|ββββββββ | 227/313 [01:19<00:27, 3.17it/s]
Training 1/1 epoch (loss 1.5481): 73%|ββββββββ | 228/313 [01:19<00:26, 3.24it/s]
Training 1/1 epoch (loss 1.6433): 73%|ββββββββ | 228/313 [01:20<00:26, 3.24it/s]
Training 1/1 epoch (loss 1.6433): 73%|ββββββββ | 229/313 [01:20<00:27, 3.00it/s]
Training 1/1 epoch (loss 1.6533): 73%|ββββββββ | 229/313 [01:20<00:27, 3.00it/s]
Training 1/1 epoch (loss 1.6533): 73%|ββββββββ | 230/313 [01:20<00:27, 3.02it/s]
Training 1/1 epoch (loss 1.6008): 73%|ββββββββ | 230/313 [01:20<00:27, 3.02it/s]
Training 1/1 epoch (loss 1.6008): 74%|ββββββββ | 231/313 [01:20<00:26, 3.09it/s]
Training 1/1 epoch (loss 1.7018): 74%|ββββββββ | 231/313 [01:21<00:26, 3.09it/s]
Training 1/1 epoch (loss 1.7018): 74%|ββββββββ | 232/313 [01:21<00:25, 3.12it/s]
Training 1/1 epoch (loss 1.6710): 74%|ββββββββ | 232/313 [01:21<00:25, 3.12it/s]
Training 1/1 epoch (loss 1.6710): 74%|ββββββββ | 233/313 [01:21<00:25, 3.17it/s]
Training 1/1 epoch (loss 1.6279): 74%|ββββββββ | 233/313 [01:21<00:25, 3.17it/s]
Training 1/1 epoch (loss 1.6279): 75%|ββββββββ | 234/313 [01:21<00:24, 3.19it/s]
Training 1/1 epoch (loss 1.6871): 75%|ββββββββ | 234/313 [01:21<00:24, 3.19it/s]
Training 1/1 epoch (loss 1.6871): 75%|ββββββββ | 235/313 [01:21<00:24, 3.18it/s]
Training 1/1 epoch (loss 1.5745): 75%|ββββββββ | 235/313 [01:22<00:24, 3.18it/s]
Training 1/1 epoch (loss 1.5745): 75%|ββββββββ | 236/313 [01:22<00:24, 3.10it/s]
Training 1/1 epoch (loss 1.6169): 75%|ββββββββ | 236/313 [01:22<00:24, 3.10it/s]
Training 1/1 epoch (loss 1.6169): 76%|ββββββββ | 237/313 [01:22<00:23, 3.17it/s]
Training 1/1 epoch (loss 1.5648): 76%|ββββββββ | 237/313 [01:22<00:23, 3.17it/s]
Training 1/1 epoch (loss 1.5648): 76%|ββββββββ | 238/313 [01:22<00:23, 3.25it/s]
Training 1/1 epoch (loss 1.6262): 76%|ββββββββ | 238/313 [01:23<00:23, 3.25it/s]
Training 1/1 epoch (loss 1.6262): 76%|ββββββββ | 239/313 [01:23<00:22, 3.23it/s]
Training 1/1 epoch (loss 1.5735): 76%|ββββββββ | 239/313 [01:23<00:22, 3.23it/s]
Training 1/1 epoch (loss 1.5735): 77%|ββββββββ | 240/313 [01:23<00:23, 3.16it/s]
Training 1/1 epoch (loss 1.5800): 77%|ββββββββ | 240/313 [01:23<00:23, 3.16it/s]
Training 1/1 epoch (loss 1.5800): 77%|ββββββββ | 241/313 [01:23<00:22, 3.16it/s]
Training 1/1 epoch (loss 1.6429): 77%|ββββββββ | 241/313 [01:24<00:22, 3.16it/s]
Training 1/1 epoch (loss 1.6429): 77%|ββββββββ | 242/313 [01:24<00:22, 3.18it/s]
Training 1/1 epoch (loss 1.6876): 77%|ββββββββ | 242/313 [01:24<00:22, 3.18it/s]
Training 1/1 epoch (loss 1.6876): 78%|ββββββββ | 243/313 [01:24<00:22, 3.08it/s]
Training 1/1 epoch (loss 1.5906): 78%|ββββββββ | 243/313 [01:24<00:22, 3.08it/s]
Training 1/1 epoch (loss 1.5906): 78%|ββββββββ | 244/313 [01:24<00:22, 3.11it/s]
Training 1/1 epoch (loss 1.6599): 78%|ββββββββ | 244/313 [01:25<00:22, 3.11it/s]
Training 1/1 epoch (loss 1.6599): 78%|ββββββββ | 245/313 [01:25<00:21, 3.19it/s]
Training 1/1 epoch (loss 1.6273): 78%|ββββββββ | 245/313 [01:25<00:21, 3.19it/s]
Training 1/1 epoch (loss 1.6273): 79%|ββββββββ | 246/313 [01:25<00:20, 3.23it/s]
Training 1/1 epoch (loss 1.6427): 79%|ββββββββ | 246/313 [01:25<00:20, 3.23it/s]
Training 1/1 epoch (loss 1.6427): 79%|ββββββββ | 247/313 [01:25<00:19, 3.32it/s]
Training 1/1 epoch (loss 1.4683): 79%|ββββββββ | 247/313 [01:26<00:19, 3.32it/s]
Training 1/1 epoch (loss 1.4683): 79%|ββββββββ | 248/313 [01:26<00:20, 3.12it/s]
Training 1/1 epoch (loss 1.6116): 79%|ββββββββ | 248/313 [01:26<00:20, 3.12it/s]
Training 1/1 epoch (loss 1.6116): 80%|ββββββββ | 249/313 [01:26<00:20, 3.08it/s]
Training 1/1 epoch (loss 1.5960): 80%|ββββββββ | 249/313 [01:26<00:20, 3.08it/s]
Training 1/1 epoch (loss 1.5960): 80%|ββββββββ | 250/313 [01:26<00:20, 3.14it/s]
Training 1/1 epoch (loss 1.7006): 80%|ββββββββ | 250/313 [01:27<00:20, 3.14it/s]
Training 1/1 epoch (loss 1.7006): 80%|ββββββββ | 251/313 [01:27<00:19, 3.18it/s]
Training 1/1 epoch (loss 1.6213): 80%|ββββββββ | 251/313 [01:27<00:19, 3.18it/s]
Training 1/1 epoch (loss 1.6213): 81%|ββββββββ | 252/313 [01:27<00:18, 3.25it/s]
Training 1/1 epoch (loss 1.6434): 81%|ββββββββ | 252/313 [01:27<00:18, 3.25it/s]
Training 1/1 epoch (loss 1.6434): 81%|ββββββββ | 253/313 [01:27<00:19, 3.11it/s]
Training 1/1 epoch (loss 1.6180): 81%|ββββββββ | 253/313 [01:28<00:19, 3.11it/s]
Training 1/1 epoch (loss 1.6180): 81%|ββββββββ | 254/313 [01:28<00:19, 3.05it/s]
Training 1/1 epoch (loss 1.6964): 81%|ββββββββ | 254/313 [01:28<00:19, 3.05it/s]
Training 1/1 epoch (loss 1.6964): 81%|βββββββββ | 255/313 [01:28<00:19, 2.98it/s]
Training 1/1 epoch (loss 1.6528): 81%|βββββββββ | 255/313 [01:28<00:19, 2.98it/s]
Training 1/1 epoch (loss 1.6528): 82%|βββββββββ | 256/313 [01:28<00:18, 3.08it/s]
Training 1/1 epoch (loss 1.6951): 82%|βββββββββ | 256/313 [01:28<00:18, 3.08it/s]
Training 1/1 epoch (loss 1.6951): 82%|βββββββββ | 257/313 [01:28<00:17, 3.13it/s]
Training 1/1 epoch (loss 1.6386): 82%|βββββββββ | 257/313 [01:29<00:17, 3.13it/s]
Training 1/1 epoch (loss 1.6386): 82%|βββββββββ | 258/313 [01:29<00:17, 3.18it/s]
Training 1/1 epoch (loss 1.5334): 82%|βββββββββ | 258/313 [01:29<00:17, 3.18it/s]
Training 1/1 epoch (loss 1.5334): 83%|βββββββββ | 259/313 [01:29<00:16, 3.18it/s]
Training 1/1 epoch (loss 1.6517): 83%|βββββββββ | 259/313 [01:29<00:16, 3.18it/s]
Training 1/1 epoch (loss 1.6517): 83%|βββββββββ | 260/313 [01:29<00:16, 3.15it/s]
Training 1/1 epoch (loss 1.5974): 83%|βββββββββ | 260/313 [01:30<00:16, 3.15it/s]
Training 1/1 epoch (loss 1.5974): 83%|βββββββββ | 261/313 [01:30<00:16, 3.07it/s]
Training 1/1 epoch (loss 1.6111): 83%|βββββββββ | 261/313 [01:30<00:16, 3.07it/s]
Training 1/1 epoch (loss 1.6111): 84%|βββββββββ | 262/313 [01:30<00:16, 3.10it/s]
Training 1/1 epoch (loss 1.5949): 84%|βββββββββ | 262/313 [01:30<00:16, 3.10it/s]
Training 1/1 epoch (loss 1.5949): 84%|βββββββββ | 263/313 [01:30<00:15, 3.19it/s]
Training 1/1 epoch (loss 1.6023): 84%|βββββββββ | 263/313 [01:31<00:15, 3.19it/s]
Training 1/1 epoch (loss 1.6023): 84%|βββββββββ | 264/313 [01:31<00:15, 3.18it/s]
Training 1/1 epoch (loss 1.5209): 84%|βββββββββ | 264/313 [01:31<00:15, 3.18it/s]
Training 1/1 epoch (loss 1.5209): 85%|βββββββββ | 265/313 [01:31<00:14, 3.24it/s]
Training 1/1 epoch (loss 1.6167): 85%|βββββββββ | 265/313 [01:31<00:14, 3.24it/s]
Training 1/1 epoch (loss 1.6167): 85%|βββββββββ | 266/313 [01:31<00:14, 3.24it/s]
Training 1/1 epoch (loss 1.6290): 85%|βββββββββ | 266/313 [01:32<00:14, 3.24it/s]
Training 1/1 epoch (loss 1.6290): 85%|βββββββββ | 267/313 [01:32<00:14, 3.18it/s]
Training 1/1 epoch (loss 1.6433): 85%|βββββββββ | 267/313 [01:32<00:14, 3.18it/s]
Training 1/1 epoch (loss 1.6433): 86%|βββββββββ | 268/313 [01:32<00:14, 3.08it/s]
Training 1/1 epoch (loss 1.5628): 86%|βββββββββ | 268/313 [01:32<00:14, 3.08it/s]
Training 1/1 epoch (loss 1.5628): 86%|βββββββββ | 269/313 [01:32<00:13, 3.14it/s]
Training 1/1 epoch (loss 1.6078): 86%|βββββββββ | 269/313 [01:33<00:13, 3.14it/s]
Training 1/1 epoch (loss 1.6078): 86%|βββββββββ | 270/313 [01:33<00:13, 3.20it/s]
Training 1/1 epoch (loss 1.7188): 86%|βββββββββ | 270/313 [01:33<00:13, 3.20it/s]
Training 1/1 epoch (loss 1.7188): 87%|βββββββββ | 271/313 [01:33<00:13, 3.22it/s]
Training 1/1 epoch (loss 1.4836): 87%|βββββββββ | 271/313 [01:33<00:13, 3.22it/s]
Training 1/1 epoch (loss 1.4836): 87%|βββββββββ | 272/313 [01:33<00:12, 3.23it/s]
Training 1/1 epoch (loss 1.6051): 87%|βββββββββ | 272/313 [01:33<00:12, 3.23it/s]
Training 1/1 epoch (loss 1.6051): 87%|βββββββββ | 273/313 [01:33<00:12, 3.19it/s]
Training 1/1 epoch (loss 1.5771): 87%|βββββββββ | 273/313 [01:34<00:12, 3.19it/s]
Training 1/1 epoch (loss 1.5771): 88%|βββββββββ | 274/313 [01:34<00:12, 3.09it/s]
Training 1/1 epoch (loss 1.6320): 88%|βββββββββ | 274/313 [01:34<00:12, 3.09it/s]
Training 1/1 epoch (loss 1.6320): 88%|βββββββββ | 275/313 [01:34<00:12, 3.07it/s]
Training 1/1 epoch (loss 1.6735): 88%|βββββββββ | 275/313 [01:35<00:12, 3.07it/s]
Training 1/1 epoch (loss 1.6735): 88%|βββββββββ | 276/313 [01:35<00:12, 3.06it/s]
Training 1/1 epoch (loss 1.7257): 88%|βββββββββ | 276/313 [01:35<00:12, 3.06it/s]
Training 1/1 epoch (loss 1.7257): 88%|βββββββββ | 277/313 [01:35<00:11, 3.17it/s]
Training 1/1 epoch (loss 1.5925): 88%|βββββββββ | 277/313 [01:35<00:11, 3.17it/s]
Training 1/1 epoch (loss 1.5925): 89%|βββββββββ | 278/313 [01:35<00:11, 3.18it/s]
Training 1/1 epoch (loss 1.6449): 89%|βββββββββ | 278/313 [01:35<00:11, 3.18it/s]
Training 1/1 epoch (loss 1.6449): 89%|βββββββββ | 279/313 [01:35<00:10, 3.16it/s]
Training 1/1 epoch (loss 1.6957): 89%|βββββββββ | 279/313 [01:36<00:10, 3.16it/s]
Training 1/1 epoch (loss 1.6957): 89%|βββββββββ | 280/313 [01:36<00:10, 3.07it/s]
Training 1/1 epoch (loss 1.6507): 89%|βββββββββ | 280/313 [01:36<00:10, 3.07it/s]
Training 1/1 epoch (loss 1.6507): 90%|βββββββββ | 281/313 [01:36<00:10, 3.13it/s]
Training 1/1 epoch (loss 1.6802): 90%|βββββββββ | 281/313 [01:36<00:10, 3.13it/s]
Training 1/1 epoch (loss 1.6802): 90%|βββββββββ | 282/313 [01:36<00:09, 3.11it/s]
Training 1/1 epoch (loss 1.5972): 90%|βββββββββ | 282/313 [01:37<00:09, 3.11it/s]
Training 1/1 epoch (loss 1.5972): 90%|βββββββββ | 283/313 [01:37<00:09, 3.11it/s]
Training 1/1 epoch (loss 1.6549): 90%|βββββββββ | 283/313 [01:37<00:09, 3.11it/s]
Training 1/1 epoch (loss 1.6549): 91%|βββββββββ | 284/313 [01:37<00:09, 3.22it/s]
Training 1/1 epoch (loss 1.5995): 91%|βββββββββ | 284/313 [01:37<00:09, 3.22it/s]
Training 1/1 epoch (loss 1.5995): 91%|βββββββββ | 285/313 [01:37<00:08, 3.24it/s]
Training 1/1 epoch (loss 1.6111): 91%|βββββββββ | 285/313 [01:38<00:08, 3.24it/s]
Training 1/1 epoch (loss 1.6111): 91%|ββββββββββ| 286/313 [01:38<00:08, 3.06it/s]
Training 1/1 epoch (loss 1.7747): 91%|ββββββββββ| 286/313 [01:38<00:08, 3.06it/s]
Training 1/1 epoch (loss 1.7747): 92%|ββββββββββ| 287/313 [01:38<00:08, 2.98it/s]
Training 1/1 epoch (loss 1.7735): 92%|ββββββββββ| 287/313 [01:38<00:08, 2.98it/s]
Training 1/1 epoch (loss 1.7735): 92%|ββββββββββ| 288/313 [01:38<00:08, 3.05it/s]
Training 1/1 epoch (loss 1.6124): 92%|ββββββββββ| 288/313 [01:39<00:08, 3.05it/s]
Training 1/1 epoch (loss 1.6124): 92%|ββββββββββ| 289/313 [01:39<00:07, 3.06it/s]
Training 1/1 epoch (loss 1.5912): 92%|ββββββββββ| 289/313 [01:39<00:07, 3.06it/s]
Training 1/1 epoch (loss 1.5912): 93%|ββββββββββ| 290/313 [01:39<00:07, 3.17it/s]
Training 1/1 epoch (loss 1.5622): 93%|ββββββββββ| 290/313 [01:39<00:07, 3.17it/s]
Training 1/1 epoch (loss 1.5622): 93%|ββββββββββ| 291/313 [01:39<00:06, 3.24it/s]
Training 1/1 epoch (loss 1.6851): 93%|ββββββββββ| 291/313 [01:40<00:06, 3.24it/s]
Training 1/1 epoch (loss 1.6851): 93%|ββββββββββ| 292/313 [01:40<00:06, 3.02it/s]
Training 1/1 epoch (loss 1.5546): 93%|ββββββββββ| 292/313 [01:40<00:06, 3.02it/s]
Training 1/1 epoch (loss 1.5546): 94%|ββββββββββ| 293/313 [01:40<00:06, 3.02it/s]
Training 1/1 epoch (loss 1.6438): 94%|ββββββββββ| 293/313 [01:40<00:06, 3.02it/s]
Training 1/1 epoch (loss 1.6438): 94%|ββββββββββ| 294/313 [01:40<00:06, 3.11it/s]
Training 1/1 epoch (loss 1.5432): 94%|ββββββββββ| 294/313 [01:41<00:06, 3.11it/s]
Training 1/1 epoch (loss 1.5432): 94%|ββββββββββ| 295/313 [01:41<00:05, 3.15it/s]
Training 1/1 epoch (loss 1.5990): 94%|ββββββββββ| 295/313 [01:41<00:05, 3.15it/s]
Training 1/1 epoch (loss 1.5990): 95%|ββββββββββ| 296/313 [01:41<00:05, 3.14it/s]
Training 1/1 epoch (loss 1.5566): 95%|ββββββββββ| 296/313 [01:41<00:05, 3.14it/s]
Training 1/1 epoch (loss 1.5566): 95%|ββββββββββ| 297/313 [01:41<00:05, 3.17it/s]
Training 1/1 epoch (loss 1.7486): 95%|ββββββββββ| 297/313 [01:42<00:05, 3.17it/s]
Training 1/1 epoch (loss 1.7486): 95%|ββββββββββ| 298/313 [01:42<00:04, 3.07it/s]
Training 1/1 epoch (loss 1.4920): 95%|ββββββββββ| 298/313 [01:42<00:04, 3.07it/s]
Training 1/1 epoch (loss 1.4920): 96%|ββββββββββ| 299/313 [01:42<00:04, 3.05it/s]
Training 1/1 epoch (loss 1.5296): 96%|ββββββββββ| 299/313 [01:42<00:04, 3.05it/s]
Training 1/1 epoch (loss 1.5296): 96%|ββββββββββ| 300/313 [01:42<00:04, 3.12it/s]
Training 1/1 epoch (loss 1.4590): 96%|ββββββββββ| 300/313 [01:43<00:04, 3.12it/s]
Training 1/1 epoch (loss 1.4590): 96%|ββββββββββ| 301/313 [01:43<00:03, 3.16it/s]
Training 1/1 epoch (loss 1.6352): 96%|ββββββββββ| 301/313 [01:43<00:03, 3.16it/s]
Training 1/1 epoch (loss 1.6352): 96%|ββββββββββ| 302/313 [01:43<00:03, 3.25it/s]
Training 1/1 epoch (loss 1.6015): 96%|ββββββββββ| 302/313 [01:43<00:03, 3.25it/s]
Training 1/1 epoch (loss 1.6015): 97%|ββββββββββ| 303/313 [01:43<00:03, 3.27it/s]
Training 1/1 epoch (loss 1.6556): 97%|ββββββββββ| 303/313 [01:43<00:03, 3.27it/s]
Training 1/1 epoch (loss 1.6556): 97%|ββββββββββ| 304/313 [01:43<00:03, 2.98it/s]
Training 1/1 epoch (loss 1.5968): 97%|ββββββββββ| 304/313 [01:44<00:03, 2.98it/s]
Training 1/1 epoch (loss 1.5968): 97%|ββββββββββ| 305/313 [01:44<00:02, 2.82it/s]
Training 1/1 epoch (loss 1.5061): 97%|ββββββββββ| 305/313 [01:44<00:02, 2.82it/s]
Training 1/1 epoch (loss 1.5061): 98%|ββββββββββ| 306/313 [01:44<00:02, 2.95it/s]
Training 1/1 epoch (loss 1.6266): 98%|ββββββββββ| 306/313 [01:45<00:02, 2.95it/s]
Training 1/1 epoch (loss 1.6266): 98%|ββββββββββ| 307/313 [01:45<00:02, 2.85it/s]
Training 1/1 epoch (loss 1.6049): 98%|ββββββββββ| 307/313 [01:45<00:02, 2.85it/s]
Training 1/1 epoch (loss 1.6049): 98%|ββββββββββ| 308/313 [01:45<00:01, 2.81it/s]
Training 1/1 epoch (loss 1.6322): 98%|ββββββββββ| 308/313 [01:45<00:01, 2.81it/s]
Training 1/1 epoch (loss 1.6322): 99%|ββββββββββ| 309/313 [01:45<00:01, 2.93it/s]
Training 1/1 epoch (loss 1.5407): 99%|ββββββββββ| 309/313 [01:46<00:01, 2.93it/s]
Training 1/1 epoch (loss 1.5407): 99%|ββββββββββ| 310/313 [01:46<00:01, 2.83it/s]
Training 1/1 epoch (loss 1.6329): 99%|ββββββββββ| 310/313 [01:46<00:01, 2.83it/s]
Training 1/1 epoch (loss 1.6329): 99%|ββββββββββ| 311/313 [01:46<00:00, 2.83it/s]
Training 1/1 epoch (loss 1.5867): 99%|ββββββββββ| 311/313 [01:46<00:00, 2.83it/s]
Training 1/1 epoch (loss 1.5867): 100%|ββββββββββ| 312/313 [01:46<00:00, 2.93it/s]
Training 1/1 epoch (loss 1.4344): 100%|ββββββββββ| 312/313 [01:47<00:00, 2.93it/s]
Training 1/1 epoch (loss 1.4344): 100%|ββββββββββ| 313/313 [01:47<00:00, 2.94it/s]
Training 1/1 epoch (loss 1.4344): 100%|ββββββββββ| 313/313 [01:47<00:00, 2.92it/s] |
| tokenizer config file saved in /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/tinyllama-2T/tinyllama-2T-s3-Q1-10k/tokenizer_config.json |
| Special tokens file saved in /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/tinyllama-2T/tinyllama-2T-s3-Q1-10k/special_tokens_map.json |
| wandb: ERROR Problem finishing run |
| Exception ignored in atexit callback: <bound method rank_zero_only.<locals>.wrapper of <safe_rlhf.logger.Logger object at 0x155117a801d0>> |
| Traceback (most recent call last): |
| File "/home/hansirui_1st/jiayi/resist/setting3/safe_rlhf/utils.py", line 212, in wrapper |
| return func(*args, **kwargs) |
| ^^^^^^^^^^^^^^^^^^^^^ |
| File "/home/hansirui_1st/jiayi/resist/setting3/safe_rlhf/logger.py", line 183, in close |
| self.wandb.finish() |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 449, in wrapper |
| return func(self, *args, **kwargs) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 391, in wrapper |
| return func(self, *args, **kwargs) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 2106, in finish |
| return self._finish(exit_code) |
| ^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 2127, in _finish |
| self._atexit_cleanup(exit_code=exit_code) |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 2352, in _atexit_cleanup |
| self._on_finish() |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 2609, in _on_finish |
| wait_with_progress( |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/mailbox/wait_with_progress.py", line 24, in wait_with_progress |
| return wait_all_with_progress( |
| ^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/mailbox/wait_with_progress.py", line 87, in wait_all_with_progress |
| return asyncio_compat.run(progress_loop_with_timeout) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/lib/asyncio_compat.py", line 27, in run |
| future = executor.submit(runner.run, fn) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/concurrent/futures/thread.py", line 169, in submit |
| raise RuntimeError('cannot schedule new futures after ' |
| RuntimeError: cannot schedule new futures after interpreter shutdown |
|
|