alignmentforever's picture
Upload folder using huggingface_hub
2002693 verified
+ deepspeed --master_port 55870 --module safe_rlhf.finetune --train_datasets inverse-json::/home/hansirui_1st/jiayi/resist/setting3/safety_data/training/safe/safe_40k.json --model_name_or_path /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T --max_length 2048 --trust_remote_code True --epochs 1 --per_device_train_batch_size 4 --per_device_eval_batch_size 4 --gradient_accumulation_steps 8 --gradient_checkpointing --learning_rate 1e-5 --lr_warmup_ratio 0 --weight_decay 0.0 --lr_scheduler_type constant --weight_decay 0.0 --seed 42 --output_dir /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/tinyllama-2T/tinyllama-2T-s3-Q1-40k --log_type wandb --log_run_name tinyllama-2T-s3-Q1-40k --log_project Inverse_Alignment --zero_stage 3 --offload none --bf16 True --tf32 True --save_16bit
[rank6]:[W529 17:33:15.400252538 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 6] using GPU 6 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
[rank1]:[W529 17:33:15.453473395 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 1] using GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
[rank3]:[W529 17:33:15.492204941 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 3] using GPU 3 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
[rank4]:[W529 17:33:15.492222471 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 4] using GPU 4 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
[rank5]:[W529 17:33:15.492234894 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 5] using GPU 5 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
[rank7]:[W529 17:33:15.637100729 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 7] using GPU 7 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
[rank0]:[W529 17:33:15.637149612 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 0] using GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
[rank2]:[W529 17:33:15.640600212 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 2] using GPU 2 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/config.json
loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/config.json
loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/config.json
loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/config.json
loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/config.json
loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/config.json
loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/config.json
loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/config.json
Model config LlamaConfig {
"_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"head_dim": 64,
"hidden_act": "silu",
"hidden_size": 2048,
"initializer_range": 0.02,
"intermediate_size": 5632,
"max_position_embeddings": 2048,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 22,
"num_key_value_heads": 4,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.49.0",
"use_cache": true,
"vocab_size": 32000
}
Model config LlamaConfig {
"_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"head_dim": 64,
"hidden_act": "silu",
"hidden_size": 2048,
"initializer_range": 0.02,
"intermediate_size": 5632,
"max_position_embeddings": 2048,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 22,
"num_key_value_heads": 4,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.49.0",
"use_cache": true,
"vocab_size": 32000
}
Model config LlamaConfig {
"_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"head_dim": 64,
"hidden_act": "silu",
"hidden_size": 2048,
"initializer_range": 0.02,
"intermediate_size": 5632,
"max_position_embeddings": 2048,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 22,
"num_key_value_heads": 4,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.49.0",
"use_cache": true,
"vocab_size": 32000
}
Model config LlamaConfig {
"_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"head_dim": 64,
"hidden_act": "silu",
"hidden_size": 2048,
"initializer_range": 0.02,
"intermediate_size": 5632,
"max_position_embeddings": 2048,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 22,
"num_key_value_heads": 4,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.49.0",
"use_cache": true,
"vocab_size": 32000
}
Model config LlamaConfig {
"_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"head_dim": 64,
"hidden_act": "silu",
"hidden_size": 2048,
"initializer_range": 0.02,
"intermediate_size": 5632,
"max_position_embeddings": 2048,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 22,
"num_key_value_heads": 4,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.49.0",
"use_cache": true,
"vocab_size": 32000
}
Model config LlamaConfig {
"_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"head_dim": 64,
"hidden_act": "silu",
"hidden_size": 2048,
"initializer_range": 0.02,
"intermediate_size": 5632,
"max_position_embeddings": 2048,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 22,
"num_key_value_heads": 4,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.49.0",
"use_cache": true,
"vocab_size": 32000
}
Model config LlamaConfig {
"_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"head_dim": 64,
"hidden_act": "silu",
"hidden_size": 2048,
"initializer_range": 0.02,
"intermediate_size": 5632,
"max_position_embeddings": 2048,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 22,
"num_key_value_heads": 4,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.49.0",
"use_cache": true,
"vocab_size": 32000
}
Model config LlamaConfig {
"_name_or_path": "/aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"head_dim": 64,
"hidden_act": "silu",
"hidden_size": 2048,
"initializer_range": 0.02,
"intermediate_size": 5632,
"max_position_embeddings": 2048,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 22,
"num_key_value_heads": 4,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.49.0",
"use_cache": true,
"vocab_size": 32000
}
loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/model.safetensors
loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/model.safetensors
loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/model.safetensors
loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/model.safetensors
loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/model.safetensors
loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/model.safetensors
loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/model.safetensors
Will use torch_dtype=torch.float32 as defined in model's config object
Will use torch_dtype=torch.float32 as defined in model's config object
Instantiating LlamaForCausalLM model under default dtype torch.float32.
Instantiating LlamaForCausalLM model under default dtype torch.float32.
Will use torch_dtype=torch.float32 as defined in model's config object
Detected DeepSpeed ZeRO-3: activating zero.init() for this model
Instantiating LlamaForCausalLM model under default dtype torch.float32.
Detected DeepSpeed ZeRO-3: activating zero.init() for this model
Will use torch_dtype=torch.float32 as defined in model's config object
Instantiating LlamaForCausalLM model under default dtype torch.float32.
Detected DeepSpeed ZeRO-3: activating zero.init() for this model
Detected DeepSpeed ZeRO-3: activating zero.init() for this model
Will use torch_dtype=torch.float32 as defined in model's config object
Instantiating LlamaForCausalLM model under default dtype torch.float32.
Detected DeepSpeed ZeRO-3: activating zero.init() for this model
Will use torch_dtype=torch.float32 as defined in model's config object
Instantiating LlamaForCausalLM model under default dtype torch.float32.
Detected DeepSpeed ZeRO-3: activating zero.init() for this model
Will use torch_dtype=torch.float32 as defined in model's config object
Instantiating LlamaForCausalLM model under default dtype torch.float32.
Detected DeepSpeed ZeRO-3: activating zero.init() for this model
Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2
}
Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2
}
Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2
}
Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2
}
Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2
}
Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2
}
loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/model.safetensors
Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2
}
Will use torch_dtype=torch.float32 as defined in model's config object
Instantiating LlamaForCausalLM model under default dtype torch.float32.
Detected DeepSpeed ZeRO-3: activating zero.init() for this model
Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2
}
All model checkpoint weights were used when initializing LlamaForCausalLM.
All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
All model checkpoint weights were used when initializing LlamaForCausalLM.
All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
All model checkpoint weights were used when initializing LlamaForCausalLM.
All model checkpoint weights were used when initializing LlamaForCausalLM.
All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
All model checkpoint weights were used when initializing LlamaForCausalLM.
All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
All model checkpoint weights were used when initializing LlamaForCausalLM.
All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
All model checkpoint weights were used when initializing LlamaForCausalLM.
All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/generation_config.json
loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/generation_config.json
loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/generation_config.json
loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/generation_config.json
Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2,
"max_length": 2048,
"pad_token_id": 0
}
Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2,
"max_length": 2048,
"pad_token_id": 0
}
Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2,
"max_length": 2048,
"pad_token_id": 0
}
Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2,
"max_length": 2048,
"pad_token_id": 0
}
loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/generation_config.json
loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/generation_config.json
Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2,
"max_length": 2048,
"pad_token_id": 0
}
loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/generation_config.json
Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2,
"max_length": 2048,
"pad_token_id": 0
}
Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2,
"max_length": 2048,
"pad_token_id": 0
}
loading file tokenizer.model
loading file tokenizer.json
loading file added_tokens.json
loading file special_tokens_map.json
loading file tokenizer.model
loading file tokenizer_config.json
loading file chat_template.jinja
loading file tokenizer.json
loading file added_tokens.json
loading file special_tokens_map.json
loading file tokenizer_config.json
loading file chat_template.jinja
loading file tokenizer.model
loading file tokenizer.json
loading file added_tokens.json
loading file special_tokens_map.json
loading file tokenizer_config.json
loading file chat_template.jinja
loading file tokenizer.model
loading file tokenizer.json
loading file added_tokens.json
loading file special_tokens_map.json
loading file tokenizer_config.json
loading file chat_template.jinja
loading file tokenizer.model
loading file tokenizer.json
loading file added_tokens.json
loading file special_tokens_map.json
loading file tokenizer_config.json
loading file chat_template.jinja
loading file tokenizer.model
loading file tokenizer.json
loading file added_tokens.json
loading file special_tokens_map.json
loading file tokenizer_config.json
loading file chat_template.jinja
loading file tokenizer.model
loading file tokenizer.json
loading file added_tokens.json
loading file special_tokens_map.json
loading file tokenizer_config.json
loading file chat_template.jinja
You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc
You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc
You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc
You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc
You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc
You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc
You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc
All model checkpoint weights were used when initializing LlamaForCausalLM.
All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-955k-token-2T/generation_config.json
Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2,
"max_length": 2048,
"pad_token_id": 0
}
loading file tokenizer.model
loading file tokenizer.json
loading file added_tokens.json
loading file special_tokens_map.json
loading file tokenizer_config.json
loading file chat_template.jinja
You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root...Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root...Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root...
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root...
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root...
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root...
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root...
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/hansirui_1st/.cache/torch_extensions/py311_cu124/fused_adam/build.ninja...
/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/torch/utils/cpp_extension.py:2059: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
warnings.warn(
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
Loading extension module fused_adam...
Loading extension module fused_adam...Loading extension module fused_adam...
Loading extension module fused_adam...
Loading extension module fused_adam...
Loading extension module fused_adam...
Loading extension module fused_adam...
Loading extension module fused_adam...
wandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
wandb: Currently logged in as: xtom to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.19.8
wandb: Run data is saved locally in /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/tinyllama-2T/tinyllama-2T-s3-Q1-40k/wandb/run-20250529_173326-lki5zjlb
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run tinyllama-2T-s3-Q1-40k
wandb: ⭐️ View project at https://wandb.ai/xtom/Inverse_Alignment
wandb: πŸš€ View run at https://wandb.ai/xtom/Inverse_Alignment/runs/lki5zjlb
Training 1/1 epoch: 0%| | 0/1250 [00:00<?, ?it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
Training 1/1 epoch (loss 2.0025): 0%| | 0/1250 [00:05<?, ?it/s] Training 1/1 epoch (loss 2.0025): 0%| | 1/1250 [00:05<1:51:04, 5.34s/it] Training 1/1 epoch (loss 2.0061): 0%| | 1/1250 [00:06<1:51:04, 5.34s/it] Training 1/1 epoch (loss 2.0061): 0%| | 2/1250 [00:06<1:05:15, 3.14s/it] Training 1/1 epoch (loss 2.1095): 0%| | 2/1250 [00:07<1:05:15, 3.14s/it] Training 1/1 epoch (loss 2.1095): 0%| | 3/1250 [00:07<38:18, 1.84s/it] Training 1/1 epoch (loss 2.1373): 0%| | 3/1250 [00:07<38:18, 1.84s/it] Training 1/1 epoch (loss 2.1373): 0%| | 4/1250 [00:07<26:42, 1.29s/it] Training 1/1 epoch (loss 2.0339): 0%| | 4/1250 [00:08<26:42, 1.29s/it] Training 1/1 epoch (loss 2.0339): 0%| | 5/1250 [00:08<20:36, 1.01it/s] Training 1/1 epoch (loss 2.1314): 0%| | 5/1250 [00:08<20:36, 1.01it/s] Training 1/1 epoch (loss 2.1314): 0%| | 6/1250 [00:08<16:47, 1.23it/s] Training 1/1 epoch (loss 1.9370): 0%| | 6/1250 [00:08<16:47, 1.23it/s] Training 1/1 epoch (loss 1.9370): 1%| | 7/1250 [00:08<13:21, 1.55it/s] Training 1/1 epoch (loss 2.0154): 1%| | 7/1250 [00:09<13:21, 1.55it/s] Training 1/1 epoch (loss 2.0154): 1%| | 8/1250 [00:09<11:36, 1.78it/s] Training 1/1 epoch (loss 2.0167): 1%| | 8/1250 [00:09<11:36, 1.78it/s] Training 1/1 epoch (loss 2.0167): 1%| | 9/1250 [00:09<10:00, 2.07it/s] Training 1/1 epoch (loss 2.0118): 1%| | 9/1250 [00:09<10:00, 2.07it/s] Training 1/1 epoch (loss 2.0118): 1%| | 10/1250 [00:09<09:13, 2.24it/s] Training 1/1 epoch (loss 1.9870): 1%| | 10/1250 [00:10<09:13, 2.24it/s] Training 1/1 epoch (loss 1.9870): 1%| | 11/1250 [00:10<08:29, 2.43it/s] Training 1/1 epoch (loss 1.9650): 1%| | 11/1250 [00:10<08:29, 2.43it/s] Training 1/1 epoch (loss 1.9650): 1%| | 12/1250 [00:10<07:47, 2.65it/s] Training 1/1 epoch (loss 2.0230): 1%| | 12/1250 [00:10<07:47, 2.65it/s] Training 1/1 epoch (loss 2.0230): 1%| | 13/1250 [00:10<07:26, 2.77it/s] Training 1/1 epoch (loss 2.1126): 1%| | 13/1250 [00:11<07:26, 2.77it/s] Training 1/1 epoch (loss 2.1126): 1%| | 14/1250 [00:11<07:01, 2.93it/s] Training 1/1 epoch (loss 2.0582): 1%| | 14/1250 [00:11<07:01, 2.93it/s] Training 1/1 epoch (loss 2.0582): 1%| | 15/1250 [00:11<06:54, 2.98it/s] Training 1/1 epoch (loss 2.1321): 1%| | 15/1250 [00:11<06:54, 2.98it/s] Training 1/1 epoch (loss 2.1321): 1%|▏ | 16/1250 [00:11<07:04, 2.90it/s] Training 1/1 epoch (loss 2.0962): 1%|▏ | 16/1250 [00:12<07:04, 2.90it/s] Training 1/1 epoch (loss 2.0962): 1%|▏ | 17/1250 [00:12<07:07, 2.89it/s] Training 1/1 epoch (loss 2.0088): 1%|▏ | 17/1250 [00:12<07:07, 2.89it/s] Training 1/1 epoch (loss 2.0088): 1%|▏ | 18/1250 [00:12<06:50, 3.00it/s] Training 1/1 epoch (loss 2.0530): 1%|▏ | 18/1250 [00:12<06:50, 3.00it/s] Training 1/1 epoch (loss 2.0530): 2%|▏ | 19/1250 [00:12<06:44, 3.04it/s] Training 1/1 epoch (loss 1.9629): 2%|▏ | 19/1250 [00:13<06:44, 3.04it/s] Training 1/1 epoch (loss 1.9629): 2%|▏ | 20/1250 [00:13<06:39, 3.08it/s] Training 1/1 epoch (loss 2.0804): 2%|▏ | 20/1250 [00:13<06:39, 3.08it/s] Training 1/1 epoch (loss 2.0804): 2%|▏ | 21/1250 [00:13<06:31, 3.14it/s] Training 1/1 epoch (loss 1.9713): 2%|▏ | 21/1250 [00:13<06:31, 3.14it/s] Training 1/1 epoch (loss 1.9713): 2%|▏ | 22/1250 [00:13<06:49, 3.00it/s] Training 1/1 epoch (loss 2.0580): 2%|▏ | 22/1250 [00:14<06:49, 3.00it/s] Training 1/1 epoch (loss 2.0580): 2%|▏ | 23/1250 [00:14<06:55, 2.95it/s] Training 1/1 epoch (loss 2.0814): 2%|▏ | 23/1250 [00:14<06:55, 2.95it/s] Training 1/1 epoch (loss 2.0814): 2%|▏ | 24/1250 [00:14<06:55, 2.95it/s] Training 1/1 epoch (loss 1.9873): 2%|▏ | 24/1250 [00:14<06:55, 2.95it/s] Training 1/1 epoch (loss 1.9873): 2%|▏ | 25/1250 [00:14<06:52, 2.97it/s] Training 1/1 epoch (loss 1.9861): 2%|▏ | 25/1250 [00:15<06:52, 2.97it/s] Training 1/1 epoch (loss 1.9861): 2%|▏ | 26/1250 [00:15<06:44, 3.03it/s] Training 1/1 epoch (loss 1.9038): 2%|▏ | 26/1250 [00:15<06:44, 3.03it/s] Training 1/1 epoch (loss 1.9038): 2%|▏ | 27/1250 [00:15<06:39, 3.06it/s] Training 1/1 epoch (loss 2.0115): 2%|▏ | 27/1250 [00:15<06:39, 3.06it/s] Training 1/1 epoch (loss 2.0115): 2%|▏ | 28/1250 [00:15<06:36, 3.08it/s] Training 1/1 epoch (loss 2.0269): 2%|▏ | 28/1250 [00:16<06:36, 3.08it/s] Training 1/1 epoch (loss 2.0269): 2%|▏ | 29/1250 [00:16<07:03, 2.88it/s] Training 1/1 epoch (loss 1.9070): 2%|▏ | 29/1250 [00:16<07:03, 2.88it/s] Training 1/1 epoch (loss 1.9070): 2%|▏ | 30/1250 [00:16<06:50, 2.97it/s] Training 1/1 epoch (loss 2.0913): 2%|▏ | 30/1250 [00:16<06:50, 2.97it/s] Training 1/1 epoch (loss 2.0913): 2%|▏ | 31/1250 [00:16<06:39, 3.05it/s] Training 1/1 epoch (loss 2.0652): 2%|▏ | 31/1250 [00:17<06:39, 3.05it/s] Training 1/1 epoch (loss 2.0652): 3%|β–Ž | 32/1250 [00:17<06:43, 3.02it/s] Training 1/1 epoch (loss 1.8499): 3%|β–Ž | 32/1250 [00:17<06:43, 3.02it/s] Training 1/1 epoch (loss 1.8499): 3%|β–Ž | 33/1250 [00:17<06:35, 3.08it/s] Training 1/1 epoch (loss 1.7884): 3%|β–Ž | 33/1250 [00:17<06:35, 3.08it/s] Training 1/1 epoch (loss 1.7884): 3%|β–Ž | 34/1250 [00:17<06:36, 3.07it/s] Training 1/1 epoch (loss 1.9903): 3%|β–Ž | 34/1250 [00:18<06:36, 3.07it/s] Training 1/1 epoch (loss 1.9903): 3%|β–Ž | 35/1250 [00:18<06:42, 3.02it/s] Training 1/1 epoch (loss 1.9684): 3%|β–Ž | 35/1250 [00:18<06:42, 3.02it/s] Training 1/1 epoch (loss 1.9684): 3%|β–Ž | 36/1250 [00:18<06:36, 3.07it/s] Training 1/1 epoch (loss 1.9977): 3%|β–Ž | 36/1250 [00:18<06:36, 3.07it/s] Training 1/1 epoch (loss 1.9977): 3%|β–Ž | 37/1250 [00:18<06:37, 3.05it/s] Training 1/1 epoch (loss 1.8858): 3%|β–Ž | 37/1250 [00:19<06:37, 3.05it/s] Training 1/1 epoch (loss 1.8858): 3%|β–Ž | 38/1250 [00:19<06:33, 3.08it/s] Training 1/1 epoch (loss 1.8441): 3%|β–Ž | 38/1250 [00:19<06:33, 3.08it/s] Training 1/1 epoch (loss 1.8441): 3%|β–Ž | 39/1250 [00:19<06:29, 3.11it/s] Training 1/1 epoch (loss 1.8456): 3%|β–Ž | 39/1250 [00:19<06:29, 3.11it/s] Training 1/1 epoch (loss 1.8456): 3%|β–Ž | 40/1250 [00:19<06:38, 3.04it/s] Training 1/1 epoch (loss 1.8507): 3%|β–Ž | 40/1250 [00:20<06:38, 3.04it/s] Training 1/1 epoch (loss 1.8507): 3%|β–Ž | 41/1250 [00:20<06:50, 2.94it/s] Training 1/1 epoch (loss 1.9277): 3%|β–Ž | 41/1250 [00:20<06:50, 2.94it/s] Training 1/1 epoch (loss 1.9277): 3%|β–Ž | 42/1250 [00:20<06:40, 3.02it/s] Training 1/1 epoch (loss 1.8635): 3%|β–Ž | 42/1250 [00:20<06:40, 3.02it/s] Training 1/1 epoch (loss 1.8635): 3%|β–Ž | 43/1250 [00:20<06:33, 3.07it/s] Training 1/1 epoch (loss 1.8961): 3%|β–Ž | 43/1250 [00:21<06:33, 3.07it/s] Training 1/1 epoch (loss 1.8961): 4%|β–Ž | 44/1250 [00:21<06:24, 3.14it/s] Training 1/1 epoch (loss 1.9990): 4%|β–Ž | 44/1250 [00:21<06:24, 3.14it/s] Training 1/1 epoch (loss 1.9990): 4%|β–Ž | 45/1250 [00:21<06:18, 3.19it/s] Training 1/1 epoch (loss 1.8932): 4%|β–Ž | 45/1250 [00:21<06:18, 3.19it/s] Training 1/1 epoch (loss 1.8932): 4%|β–Ž | 46/1250 [00:21<06:21, 3.15it/s] Training 1/1 epoch (loss 1.8435): 4%|β–Ž | 46/1250 [00:22<06:21, 3.15it/s] Training 1/1 epoch (loss 1.8435): 4%|▍ | 47/1250 [00:22<06:28, 3.09it/s] Training 1/1 epoch (loss 1.8206): 4%|▍ | 47/1250 [00:22<06:28, 3.09it/s] Training 1/1 epoch (loss 1.8206): 4%|▍ | 48/1250 [00:22<06:35, 3.04it/s] Training 1/1 epoch (loss 1.7980): 4%|▍ | 48/1250 [00:22<06:35, 3.04it/s] Training 1/1 epoch (loss 1.7980): 4%|▍ | 49/1250 [00:22<06:23, 3.13it/s] Training 1/1 epoch (loss 1.8185): 4%|▍ | 49/1250 [00:23<06:23, 3.13it/s] Training 1/1 epoch (loss 1.8185): 4%|▍ | 50/1250 [00:23<06:19, 3.16it/s] Training 1/1 epoch (loss 1.7631): 4%|▍ | 50/1250 [00:23<06:19, 3.16it/s] Training 1/1 epoch (loss 1.7631): 4%|▍ | 51/1250 [00:23<06:22, 3.13it/s] Training 1/1 epoch (loss 1.9100): 4%|▍ | 51/1250 [00:23<06:22, 3.13it/s] Training 1/1 epoch (loss 1.9100): 4%|▍ | 52/1250 [00:23<06:23, 3.13it/s] Training 1/1 epoch (loss 1.9352): 4%|▍ | 52/1250 [00:24<06:23, 3.13it/s] Training 1/1 epoch (loss 1.9352): 4%|▍ | 53/1250 [00:24<06:40, 2.99it/s] Training 1/1 epoch (loss 1.9555): 4%|▍ | 53/1250 [00:24<06:40, 2.99it/s] Training 1/1 epoch (loss 1.9555): 4%|▍ | 54/1250 [00:24<06:47, 2.94it/s] Training 1/1 epoch (loss 1.9252): 4%|▍ | 54/1250 [00:24<06:47, 2.94it/s] Training 1/1 epoch (loss 1.9252): 4%|▍ | 55/1250 [00:24<06:36, 3.01it/s] Training 1/1 epoch (loss 1.8979): 4%|▍ | 55/1250 [00:25<06:36, 3.01it/s] Training 1/1 epoch (loss 1.8979): 4%|▍ | 56/1250 [00:25<06:33, 3.04it/s] Training 1/1 epoch (loss 1.6819): 4%|▍ | 56/1250 [00:25<06:33, 3.04it/s] Training 1/1 epoch (loss 1.6819): 5%|▍ | 57/1250 [00:25<06:24, 3.11it/s] Training 1/1 epoch (loss 1.7166): 5%|▍ | 57/1250 [00:25<06:24, 3.11it/s] Training 1/1 epoch (loss 1.7166): 5%|▍ | 58/1250 [00:25<06:15, 3.18it/s] Training 1/1 epoch (loss 1.7133): 5%|▍ | 58/1250 [00:25<06:15, 3.18it/s] Training 1/1 epoch (loss 1.7133): 5%|▍ | 59/1250 [00:25<06:28, 3.07it/s] Training 1/1 epoch (loss 1.7444): 5%|▍ | 59/1250 [00:26<06:28, 3.07it/s] Training 1/1 epoch (loss 1.7444): 5%|▍ | 60/1250 [00:26<06:41, 2.96it/s] Training 1/1 epoch (loss 1.8374): 5%|▍ | 60/1250 [00:26<06:41, 2.96it/s] Training 1/1 epoch (loss 1.8374): 5%|▍ | 61/1250 [00:26<06:39, 2.98it/s] Training 1/1 epoch (loss 1.8440): 5%|▍ | 61/1250 [00:26<06:39, 2.98it/s] Training 1/1 epoch (loss 1.8440): 5%|▍ | 62/1250 [00:26<06:30, 3.04it/s] Training 1/1 epoch (loss 1.8713): 5%|▍ | 62/1250 [00:27<06:30, 3.04it/s] Training 1/1 epoch (loss 1.8713): 5%|β–Œ | 63/1250 [00:27<06:24, 3.09it/s] Training 1/1 epoch (loss 1.8771): 5%|β–Œ | 63/1250 [00:27<06:24, 3.09it/s] Training 1/1 epoch (loss 1.8771): 5%|β–Œ | 64/1250 [00:27<06:21, 3.11it/s] Training 1/1 epoch (loss 1.7059): 5%|β–Œ | 64/1250 [00:27<06:21, 3.11it/s] Training 1/1 epoch (loss 1.7059): 5%|β–Œ | 65/1250 [00:27<06:25, 3.08it/s] Training 1/1 epoch (loss 1.8282): 5%|β–Œ | 65/1250 [00:28<06:25, 3.08it/s] Training 1/1 epoch (loss 1.8282): 5%|β–Œ | 66/1250 [00:28<06:42, 2.94it/s] Training 1/1 epoch (loss 1.7099): 5%|β–Œ | 66/1250 [00:28<06:42, 2.94it/s] Training 1/1 epoch (loss 1.7099): 5%|β–Œ | 67/1250 [00:28<06:30, 3.03it/s] Training 1/1 epoch (loss 1.7050): 5%|β–Œ | 67/1250 [00:28<06:30, 3.03it/s] Training 1/1 epoch (loss 1.7050): 5%|β–Œ | 68/1250 [00:28<06:24, 3.07it/s] Training 1/1 epoch (loss 1.7280): 5%|β–Œ | 68/1250 [00:29<06:24, 3.07it/s] Training 1/1 epoch (loss 1.7280): 6%|β–Œ | 69/1250 [00:29<06:25, 3.06it/s] Training 1/1 epoch (loss 1.6351): 6%|β–Œ | 69/1250 [00:29<06:25, 3.06it/s] Training 1/1 epoch (loss 1.6351): 6%|β–Œ | 70/1250 [00:29<06:21, 3.09it/s] Training 1/1 epoch (loss 1.5403): 6%|β–Œ | 70/1250 [00:29<06:21, 3.09it/s] Training 1/1 epoch (loss 1.5403): 6%|β–Œ | 71/1250 [00:29<06:32, 3.01it/s] Training 1/1 epoch (loss 1.8749): 6%|β–Œ | 71/1250 [00:30<06:32, 3.01it/s] Training 1/1 epoch (loss 1.8749): 6%|β–Œ | 72/1250 [00:30<06:38, 2.96it/s] Training 1/1 epoch (loss 1.7015): 6%|β–Œ | 72/1250 [00:30<06:38, 2.96it/s] Training 1/1 epoch (loss 1.7015): 6%|β–Œ | 73/1250 [00:30<06:32, 3.00it/s] Training 1/1 epoch (loss 1.7549): 6%|β–Œ | 73/1250 [00:30<06:32, 3.00it/s] Training 1/1 epoch (loss 1.7549): 6%|β–Œ | 74/1250 [00:30<06:22, 3.07it/s] Training 1/1 epoch (loss 1.7548): 6%|β–Œ | 74/1250 [00:31<06:22, 3.07it/s] Training 1/1 epoch (loss 1.7548): 6%|β–Œ | 75/1250 [00:31<06:16, 3.12it/s] Training 1/1 epoch (loss 1.8467): 6%|β–Œ | 75/1250 [00:31<06:16, 3.12it/s] Training 1/1 epoch (loss 1.8467): 6%|β–Œ | 76/1250 [00:31<06:15, 3.12it/s] Training 1/1 epoch (loss 1.6975): 6%|β–Œ | 76/1250 [00:31<06:15, 3.12it/s] Training 1/1 epoch (loss 1.6975): 6%|β–Œ | 77/1250 [00:31<06:11, 3.16it/s] Training 1/1 epoch (loss 1.6910): 6%|β–Œ | 77/1250 [00:32<06:11, 3.16it/s] Training 1/1 epoch (loss 1.6910): 6%|β–Œ | 78/1250 [00:32<06:27, 3.02it/s] Training 1/1 epoch (loss 1.8507): 6%|β–Œ | 78/1250 [00:32<06:27, 3.02it/s] Training 1/1 epoch (loss 1.8507): 6%|β–‹ | 79/1250 [00:32<06:27, 3.03it/s] Training 1/1 epoch (loss 1.6599): 6%|β–‹ | 79/1250 [00:32<06:27, 3.03it/s] Training 1/1 epoch (loss 1.6599): 6%|β–‹ | 80/1250 [00:32<06:22, 3.06it/s] Training 1/1 epoch (loss 1.7499): 6%|β–‹ | 80/1250 [00:33<06:22, 3.06it/s] Training 1/1 epoch (loss 1.7499): 6%|β–‹ | 81/1250 [00:33<06:17, 3.10it/s] Training 1/1 epoch (loss 1.7430): 6%|β–‹ | 81/1250 [00:33<06:17, 3.10it/s] Training 1/1 epoch (loss 1.7430): 7%|β–‹ | 82/1250 [00:33<06:28, 3.01it/s] Training 1/1 epoch (loss 1.7564): 7%|β–‹ | 82/1250 [00:33<06:28, 3.01it/s] Training 1/1 epoch (loss 1.7564): 7%|β–‹ | 83/1250 [00:33<06:31, 2.98it/s] Training 1/1 epoch (loss 1.8237): 7%|β–‹ | 83/1250 [00:34<06:31, 2.98it/s] Training 1/1 epoch (loss 1.8237): 7%|β–‹ | 84/1250 [00:34<06:37, 2.94it/s] Training 1/1 epoch (loss 1.7649): 7%|β–‹ | 84/1250 [00:34<06:37, 2.94it/s] Training 1/1 epoch (loss 1.7649): 7%|β–‹ | 85/1250 [00:34<06:32, 2.97it/s] Training 1/1 epoch (loss 1.7802): 7%|β–‹ | 85/1250 [00:34<06:32, 2.97it/s] Training 1/1 epoch (loss 1.7802): 7%|β–‹ | 86/1250 [00:34<06:24, 3.03it/s] Training 1/1 epoch (loss 1.6137): 7%|β–‹ | 86/1250 [00:35<06:24, 3.03it/s] Training 1/1 epoch (loss 1.6137): 7%|β–‹ | 87/1250 [00:35<06:17, 3.08it/s] Training 1/1 epoch (loss 1.7510): 7%|β–‹ | 87/1250 [00:35<06:17, 3.08it/s] Training 1/1 epoch (loss 1.7510): 7%|β–‹ | 88/1250 [00:35<06:18, 3.07it/s] Training 1/1 epoch (loss 1.7444): 7%|β–‹ | 88/1250 [00:35<06:18, 3.07it/s] Training 1/1 epoch (loss 1.7444): 7%|β–‹ | 89/1250 [00:35<06:16, 3.08it/s] Training 1/1 epoch (loss 1.7187): 7%|β–‹ | 89/1250 [00:36<06:16, 3.08it/s] Training 1/1 epoch (loss 1.7187): 7%|β–‹ | 90/1250 [00:36<06:09, 3.14it/s] Training 1/1 epoch (loss 1.7448): 7%|β–‹ | 90/1250 [00:36<06:09, 3.14it/s] Training 1/1 epoch (loss 1.7448): 7%|β–‹ | 91/1250 [00:36<06:08, 3.14it/s] Training 1/1 epoch (loss 1.6124): 7%|β–‹ | 91/1250 [00:36<06:08, 3.14it/s] Training 1/1 epoch (loss 1.6124): 7%|β–‹ | 92/1250 [00:36<06:22, 3.03it/s] Training 1/1 epoch (loss 1.7273): 7%|β–‹ | 92/1250 [00:37<06:22, 3.03it/s] Training 1/1 epoch (loss 1.7273): 7%|β–‹ | 93/1250 [00:37<06:13, 3.10it/s] Training 1/1 epoch (loss 1.7605): 7%|β–‹ | 93/1250 [00:37<06:13, 3.10it/s] Training 1/1 epoch (loss 1.7605): 8%|β–Š | 94/1250 [00:37<06:11, 3.11it/s] Training 1/1 epoch (loss 1.7138): 8%|β–Š | 94/1250 [00:37<06:11, 3.11it/s] Training 1/1 epoch (loss 1.7138): 8%|β–Š | 95/1250 [00:37<06:03, 3.18it/s] Training 1/1 epoch (loss 1.7958): 8%|β–Š | 95/1250 [00:38<06:03, 3.18it/s] Training 1/1 epoch (loss 1.7958): 8%|β–Š | 96/1250 [00:38<06:23, 3.01it/s] Training 1/1 epoch (loss 1.7015): 8%|β–Š | 96/1250 [00:38<06:23, 3.01it/s] Training 1/1 epoch (loss 1.7015): 8%|β–Š | 97/1250 [00:38<06:35, 2.91it/s] Training 1/1 epoch (loss 1.7864): 8%|β–Š | 97/1250 [00:38<06:35, 2.91it/s] Training 1/1 epoch (loss 1.7864): 8%|β–Š | 98/1250 [00:38<06:35, 2.91it/s] Training 1/1 epoch (loss 1.6665): 8%|β–Š | 98/1250 [00:39<06:35, 2.91it/s] Training 1/1 epoch (loss 1.6665): 8%|β–Š | 99/1250 [00:39<06:22, 3.01it/s] Training 1/1 epoch (loss 1.5827): 8%|β–Š | 99/1250 [00:39<06:22, 3.01it/s] Training 1/1 epoch (loss 1.5827): 8%|β–Š | 100/1250 [00:39<06:14, 3.07it/s] Training 1/1 epoch (loss 1.5312): 8%|β–Š | 100/1250 [00:39<06:14, 3.07it/s] Training 1/1 epoch (loss 1.5312): 8%|β–Š | 101/1250 [00:39<06:19, 3.03it/s] Training 1/1 epoch (loss 1.7691): 8%|β–Š | 101/1250 [00:40<06:19, 3.03it/s] Training 1/1 epoch (loss 1.7691): 8%|β–Š | 102/1250 [00:40<06:26, 2.97it/s] Training 1/1 epoch (loss 1.7201): 8%|β–Š | 102/1250 [00:40<06:26, 2.97it/s] Training 1/1 epoch (loss 1.7201): 8%|β–Š | 103/1250 [00:40<06:29, 2.95it/s] Training 1/1 epoch (loss 1.7312): 8%|β–Š | 103/1250 [00:40<06:29, 2.95it/s] Training 1/1 epoch (loss 1.7312): 8%|β–Š | 104/1250 [00:40<06:24, 2.98it/s] Training 1/1 epoch (loss 1.8135): 8%|β–Š | 104/1250 [00:41<06:24, 2.98it/s] Training 1/1 epoch (loss 1.8135): 8%|β–Š | 105/1250 [00:41<06:16, 3.04it/s] Training 1/1 epoch (loss 1.7244): 8%|β–Š | 105/1250 [00:41<06:16, 3.04it/s] Training 1/1 epoch (loss 1.7244): 8%|β–Š | 106/1250 [00:41<06:06, 3.12it/s] Training 1/1 epoch (loss 1.6460): 8%|β–Š | 106/1250 [00:41<06:06, 3.12it/s] Training 1/1 epoch (loss 1.6460): 9%|β–Š | 107/1250 [00:41<06:06, 3.12it/s] Training 1/1 epoch (loss 1.7568): 9%|β–Š | 107/1250 [00:42<06:06, 3.12it/s] Training 1/1 epoch (loss 1.7568): 9%|β–Š | 108/1250 [00:42<06:10, 3.08it/s] Training 1/1 epoch (loss 1.6587): 9%|β–Š | 108/1250 [00:42<06:10, 3.08it/s] Training 1/1 epoch (loss 1.6587): 9%|β–Š | 109/1250 [00:42<06:21, 2.99it/s] Training 1/1 epoch (loss 1.7155): 9%|β–Š | 109/1250 [00:42<06:21, 2.99it/s] Training 1/1 epoch (loss 1.7155): 9%|β–‰ | 110/1250 [00:42<06:22, 2.98it/s] Training 1/1 epoch (loss 1.6509): 9%|β–‰ | 110/1250 [00:43<06:22, 2.98it/s] Training 1/1 epoch (loss 1.6509): 9%|β–‰ | 111/1250 [00:43<06:13, 3.05it/s] Training 1/1 epoch (loss 1.6643): 9%|β–‰ | 111/1250 [00:43<06:13, 3.05it/s] Training 1/1 epoch (loss 1.6643): 9%|β–‰ | 112/1250 [00:43<06:14, 3.04it/s] Training 1/1 epoch (loss 1.6104): 9%|β–‰ | 112/1250 [00:43<06:14, 3.04it/s] Training 1/1 epoch (loss 1.6104): 9%|β–‰ | 113/1250 [00:43<06:13, 3.04it/s] Training 1/1 epoch (loss 1.6168): 9%|β–‰ | 113/1250 [00:44<06:13, 3.04it/s] Training 1/1 epoch (loss 1.6168): 9%|β–‰ | 114/1250 [00:44<06:09, 3.07it/s] Training 1/1 epoch (loss 1.6033): 9%|β–‰ | 114/1250 [00:44<06:09, 3.07it/s] Training 1/1 epoch (loss 1.6033): 9%|β–‰ | 115/1250 [00:44<06:17, 3.01it/s] Training 1/1 epoch (loss 1.6847): 9%|β–‰ | 115/1250 [00:44<06:17, 3.01it/s] Training 1/1 epoch (loss 1.6847): 9%|β–‰ | 116/1250 [00:44<06:15, 3.02it/s] Training 1/1 epoch (loss 1.7735): 9%|β–‰ | 116/1250 [00:45<06:15, 3.02it/s] Training 1/1 epoch (loss 1.7735): 9%|β–‰ | 117/1250 [00:45<06:41, 2.82it/s] Training 1/1 epoch (loss 1.7128): 9%|β–‰ | 117/1250 [00:45<06:41, 2.82it/s] Training 1/1 epoch (loss 1.7128): 9%|β–‰ | 118/1250 [00:45<06:28, 2.91it/s] Training 1/1 epoch (loss 1.7020): 9%|β–‰ | 118/1250 [00:45<06:28, 2.91it/s] Training 1/1 epoch (loss 1.7020): 10%|β–‰ | 119/1250 [00:45<06:27, 2.92it/s] Training 1/1 epoch (loss 1.6180): 10%|β–‰ | 119/1250 [00:46<06:27, 2.92it/s] Training 1/1 epoch (loss 1.6180): 10%|β–‰ | 120/1250 [00:46<06:52, 2.74it/s] Training 1/1 epoch (loss 1.7101): 10%|β–‰ | 120/1250 [00:46<06:52, 2.74it/s] Training 1/1 epoch (loss 1.7101): 10%|β–‰ | 121/1250 [00:46<06:34, 2.86it/s] Training 1/1 epoch (loss 1.7494): 10%|β–‰ | 121/1250 [00:46<06:34, 2.86it/s] Training 1/1 epoch (loss 1.7494): 10%|β–‰ | 122/1250 [00:46<06:31, 2.88it/s] Training 1/1 epoch (loss 1.6206): 10%|β–‰ | 122/1250 [00:47<06:31, 2.88it/s] Training 1/1 epoch (loss 1.6206): 10%|β–‰ | 123/1250 [00:47<06:27, 2.91it/s] Training 1/1 epoch (loss 1.6345): 10%|β–‰ | 123/1250 [00:47<06:27, 2.91it/s] Training 1/1 epoch (loss 1.6345): 10%|β–‰ | 124/1250 [00:47<06:12, 3.02it/s] Training 1/1 epoch (loss 1.7739): 10%|β–‰ | 124/1250 [00:47<06:12, 3.02it/s] Training 1/1 epoch (loss 1.7739): 10%|β–ˆ | 125/1250 [00:47<06:07, 3.06it/s] Training 1/1 epoch (loss 1.6055): 10%|β–ˆ | 125/1250 [00:48<06:07, 3.06it/s] Training 1/1 epoch (loss 1.6055): 10%|β–ˆ | 126/1250 [00:48<06:26, 2.91it/s] Training 1/1 epoch (loss 1.7526): 10%|β–ˆ | 126/1250 [00:48<06:26, 2.91it/s] Training 1/1 epoch (loss 1.7526): 10%|β–ˆ | 127/1250 [00:48<06:24, 2.92it/s] Training 1/1 epoch (loss 1.6757): 10%|β–ˆ | 127/1250 [00:48<06:24, 2.92it/s] Training 1/1 epoch (loss 1.6757): 10%|β–ˆ | 128/1250 [00:48<06:20, 2.95it/s] Training 1/1 epoch (loss 1.7244): 10%|β–ˆ | 128/1250 [00:49<06:20, 2.95it/s] Training 1/1 epoch (loss 1.7244): 10%|β–ˆ | 129/1250 [00:49<06:17, 2.97it/s] Training 1/1 epoch (loss 1.6343): 10%|β–ˆ | 129/1250 [00:49<06:17, 2.97it/s] Training 1/1 epoch (loss 1.6343): 10%|β–ˆ | 130/1250 [00:49<06:13, 3.00it/s] Training 1/1 epoch (loss 1.6563): 10%|β–ˆ | 130/1250 [00:49<06:13, 3.00it/s] Training 1/1 epoch (loss 1.6563): 10%|β–ˆ | 131/1250 [00:49<06:12, 3.01it/s] Training 1/1 epoch (loss 1.6773): 10%|β–ˆ | 131/1250 [00:50<06:12, 3.01it/s] Training 1/1 epoch (loss 1.6773): 11%|β–ˆ | 132/1250 [00:50<06:26, 2.90it/s] Training 1/1 epoch (loss 1.7196): 11%|β–ˆ | 132/1250 [00:50<06:26, 2.90it/s] Training 1/1 epoch (loss 1.7196): 11%|β–ˆ | 133/1250 [00:50<06:21, 2.93it/s] Training 1/1 epoch (loss 1.6251): 11%|β–ˆ | 133/1250 [00:50<06:21, 2.93it/s] Training 1/1 epoch (loss 1.6251): 11%|β–ˆ | 134/1250 [00:50<06:06, 3.05it/s] Training 1/1 epoch (loss 1.7006): 11%|β–ˆ | 134/1250 [00:51<06:06, 3.05it/s] Training 1/1 epoch (loss 1.7006): 11%|β–ˆ | 135/1250 [00:51<06:04, 3.06it/s] Training 1/1 epoch (loss 1.7195): 11%|β–ˆ | 135/1250 [00:51<06:04, 3.06it/s] Training 1/1 epoch (loss 1.7195): 11%|β–ˆ | 136/1250 [00:51<06:02, 3.07it/s] Training 1/1 epoch (loss 1.6672): 11%|β–ˆ | 136/1250 [00:51<06:02, 3.07it/s] Training 1/1 epoch (loss 1.6672): 11%|β–ˆ | 137/1250 [00:51<06:04, 3.05it/s] Training 1/1 epoch (loss 1.5522): 11%|β–ˆ | 137/1250 [00:52<06:04, 3.05it/s] Training 1/1 epoch (loss 1.5522): 11%|β–ˆ | 138/1250 [00:52<06:03, 3.06it/s] Training 1/1 epoch (loss 1.5814): 11%|β–ˆ | 138/1250 [00:52<06:03, 3.06it/s] Training 1/1 epoch (loss 1.5814): 11%|β–ˆ | 139/1250 [00:52<06:04, 3.05it/s] Training 1/1 epoch (loss 1.7004): 11%|β–ˆ | 139/1250 [00:52<06:04, 3.05it/s] Training 1/1 epoch (loss 1.7004): 11%|β–ˆ | 140/1250 [00:52<05:58, 3.10it/s] Training 1/1 epoch (loss 1.7011): 11%|β–ˆ | 140/1250 [00:53<05:58, 3.10it/s] Training 1/1 epoch (loss 1.7011): 11%|β–ˆβ– | 141/1250 [00:53<05:52, 3.15it/s] Training 1/1 epoch (loss 1.6075): 11%|β–ˆβ– | 141/1250 [00:53<05:52, 3.15it/s] Training 1/1 epoch (loss 1.6075): 11%|β–ˆβ– | 142/1250 [00:53<05:57, 3.10it/s] Training 1/1 epoch (loss 1.7217): 11%|β–ˆβ– | 142/1250 [00:53<05:57, 3.10it/s] Training 1/1 epoch (loss 1.7217): 11%|β–ˆβ– | 143/1250 [00:53<05:53, 3.13it/s] Training 1/1 epoch (loss 1.6726): 11%|β–ˆβ– | 143/1250 [00:54<05:53, 3.13it/s] Training 1/1 epoch (loss 1.6726): 12%|β–ˆβ– | 144/1250 [00:54<06:05, 3.03it/s] Training 1/1 epoch (loss 1.5462): 12%|β–ˆβ– | 144/1250 [00:54<06:05, 3.03it/s] Training 1/1 epoch (loss 1.5462): 12%|β–ˆβ– | 145/1250 [00:54<06:06, 3.01it/s] Training 1/1 epoch (loss 1.6133): 12%|β–ˆβ– | 145/1250 [00:54<06:06, 3.01it/s] Training 1/1 epoch (loss 1.6133): 12%|β–ˆβ– | 146/1250 [00:54<06:00, 3.06it/s] Training 1/1 epoch (loss 1.7072): 12%|β–ˆβ– | 146/1250 [00:55<06:00, 3.06it/s] Training 1/1 epoch (loss 1.7072): 12%|β–ˆβ– | 147/1250 [00:55<05:55, 3.10it/s] Training 1/1 epoch (loss 1.5206): 12%|β–ˆβ– | 147/1250 [00:55<05:55, 3.10it/s] Training 1/1 epoch (loss 1.5206): 12%|β–ˆβ– | 148/1250 [00:55<05:52, 3.13it/s] Training 1/1 epoch (loss 1.5996): 12%|β–ˆβ– | 148/1250 [00:55<05:52, 3.13it/s] Training 1/1 epoch (loss 1.5996): 12%|β–ˆβ– | 149/1250 [00:55<06:05, 3.01it/s] Training 1/1 epoch (loss 1.7479): 12%|β–ˆβ– | 149/1250 [00:56<06:05, 3.01it/s] Training 1/1 epoch (loss 1.7479): 12%|β–ˆβ– | 150/1250 [00:56<06:01, 3.04it/s] Training 1/1 epoch (loss 1.6373): 12%|β–ˆβ– | 150/1250 [00:56<06:01, 3.04it/s] Training 1/1 epoch (loss 1.6373): 12%|β–ˆβ– | 151/1250 [00:56<06:04, 3.02it/s] Training 1/1 epoch (loss 1.6293): 12%|β–ˆβ– | 151/1250 [00:56<06:04, 3.02it/s] Training 1/1 epoch (loss 1.6293): 12%|β–ˆβ– | 152/1250 [00:56<06:04, 3.01it/s] Training 1/1 epoch (loss 1.5753): 12%|β–ˆβ– | 152/1250 [00:57<06:04, 3.01it/s] Training 1/1 epoch (loss 1.5753): 12%|β–ˆβ– | 153/1250 [00:57<06:06, 2.99it/s] Training 1/1 epoch (loss 1.6842): 12%|β–ˆβ– | 153/1250 [00:57<06:06, 2.99it/s] Training 1/1 epoch (loss 1.6842): 12%|β–ˆβ– | 154/1250 [00:57<06:02, 3.02it/s] Training 1/1 epoch (loss 1.6403): 12%|β–ˆβ– | 154/1250 [00:57<06:02, 3.02it/s] Training 1/1 epoch (loss 1.6403): 12%|β–ˆβ– | 155/1250 [00:57<06:16, 2.91it/s] Training 1/1 epoch (loss 1.7937): 12%|β–ˆβ– | 155/1250 [00:58<06:16, 2.91it/s] Training 1/1 epoch (loss 1.7937): 12%|β–ˆβ– | 156/1250 [00:58<06:22, 2.86it/s] Training 1/1 epoch (loss 1.6188): 12%|β–ˆβ– | 156/1250 [00:58<06:22, 2.86it/s] Training 1/1 epoch (loss 1.6188): 13%|β–ˆβ–Ž | 157/1250 [00:58<06:17, 2.89it/s] Training 1/1 epoch (loss 1.6577): 13%|β–ˆβ–Ž | 157/1250 [00:58<06:17, 2.89it/s] Training 1/1 epoch (loss 1.6577): 13%|β–ˆβ–Ž | 158/1250 [00:58<06:07, 2.97it/s] Training 1/1 epoch (loss 1.7033): 13%|β–ˆβ–Ž | 158/1250 [00:59<06:07, 2.97it/s] Training 1/1 epoch (loss 1.7033): 13%|β–ˆβ–Ž | 159/1250 [00:59<06:00, 3.02it/s] Training 1/1 epoch (loss 1.6625): 13%|β–ˆβ–Ž | 159/1250 [00:59<06:00, 3.02it/s] Training 1/1 epoch (loss 1.6625): 13%|β–ˆβ–Ž | 160/1250 [00:59<05:59, 3.03it/s] Training 1/1 epoch (loss 1.7135): 13%|β–ˆβ–Ž | 160/1250 [00:59<05:59, 3.03it/s] Training 1/1 epoch (loss 1.7135): 13%|β–ˆβ–Ž | 161/1250 [00:59<05:53, 3.08it/s] Training 1/1 epoch (loss 1.6255): 13%|β–ˆβ–Ž | 161/1250 [01:00<05:53, 3.08it/s] Training 1/1 epoch (loss 1.6255): 13%|β–ˆβ–Ž | 162/1250 [01:00<06:02, 3.00it/s] Training 1/1 epoch (loss 1.6038): 13%|β–ˆβ–Ž | 162/1250 [01:00<06:02, 3.00it/s] Training 1/1 epoch (loss 1.6038): 13%|β–ˆβ–Ž | 163/1250 [01:00<06:10, 2.93it/s] Training 1/1 epoch (loss 1.6483): 13%|β–ˆβ–Ž | 163/1250 [01:00<06:10, 2.93it/s] Training 1/1 epoch (loss 1.6483): 13%|β–ˆβ–Ž | 164/1250 [01:00<06:25, 2.82it/s] Training 1/1 epoch (loss 1.7052): 13%|β–ˆβ–Ž | 164/1250 [01:01<06:25, 2.82it/s] Training 1/1 epoch (loss 1.7052): 13%|β–ˆβ–Ž | 165/1250 [01:01<06:12, 2.91it/s] Training 1/1 epoch (loss 1.6509): 13%|β–ˆβ–Ž | 165/1250 [01:01<06:12, 2.91it/s] Training 1/1 epoch (loss 1.6509): 13%|β–ˆβ–Ž | 166/1250 [01:01<06:00, 3.00it/s] Training 1/1 epoch (loss 1.6489): 13%|β–ˆβ–Ž | 166/1250 [01:01<06:00, 3.00it/s] Training 1/1 epoch (loss 1.6489): 13%|β–ˆβ–Ž | 167/1250 [01:01<05:55, 3.05it/s] Training 1/1 epoch (loss 1.7147): 13%|β–ˆβ–Ž | 167/1250 [01:02<05:55, 3.05it/s] Training 1/1 epoch (loss 1.7147): 13%|β–ˆβ–Ž | 168/1250 [01:02<06:06, 2.96it/s] Training 1/1 epoch (loss 1.5914): 13%|β–ˆβ–Ž | 168/1250 [01:02<06:06, 2.96it/s] Training 1/1 epoch (loss 1.5914): 14%|β–ˆβ–Ž | 169/1250 [01:02<06:11, 2.91it/s] Training 1/1 epoch (loss 1.7178): 14%|β–ˆβ–Ž | 169/1250 [01:02<06:11, 2.91it/s] Training 1/1 epoch (loss 1.7178): 14%|β–ˆβ–Ž | 170/1250 [01:02<06:04, 2.97it/s] Training 1/1 epoch (loss 1.5484): 14%|β–ˆβ–Ž | 170/1250 [01:03<06:04, 2.97it/s] Training 1/1 epoch (loss 1.5484): 14%|β–ˆβ–Ž | 171/1250 [01:03<05:53, 3.05it/s] Training 1/1 epoch (loss 1.6309): 14%|β–ˆβ–Ž | 171/1250 [01:03<05:53, 3.05it/s] Training 1/1 epoch (loss 1.6309): 14%|β–ˆβ– | 172/1250 [01:03<05:51, 3.07it/s] Training 1/1 epoch (loss 1.4909): 14%|β–ˆβ– | 172/1250 [01:03<05:51, 3.07it/s] Training 1/1 epoch (loss 1.4909): 14%|β–ˆβ– | 173/1250 [01:03<05:52, 3.05it/s] Training 1/1 epoch (loss 1.5342): 14%|β–ˆβ– | 173/1250 [01:04<05:52, 3.05it/s] Training 1/1 epoch (loss 1.5342): 14%|β–ˆβ– | 174/1250 [01:04<05:51, 3.06it/s] Training 1/1 epoch (loss 1.6587): 14%|β–ˆβ– | 174/1250 [01:04<05:51, 3.06it/s] Training 1/1 epoch (loss 1.6587): 14%|β–ˆβ– | 175/1250 [01:04<05:55, 3.02it/s] Training 1/1 epoch (loss 1.6396): 14%|β–ˆβ– | 175/1250 [01:04<05:55, 3.02it/s] Training 1/1 epoch (loss 1.6396): 14%|β–ˆβ– | 176/1250 [01:04<05:55, 3.02it/s] Training 1/1 epoch (loss 1.6741): 14%|β–ˆβ– | 176/1250 [01:05<05:55, 3.02it/s] Training 1/1 epoch (loss 1.6741): 14%|β–ˆβ– | 177/1250 [01:05<05:51, 3.05it/s] Training 1/1 epoch (loss 1.6257): 14%|β–ˆβ– | 177/1250 [01:05<05:51, 3.05it/s] Training 1/1 epoch (loss 1.6257): 14%|β–ˆβ– | 178/1250 [01:05<05:41, 3.14it/s] Training 1/1 epoch (loss 1.5932): 14%|β–ˆβ– | 178/1250 [01:05<05:41, 3.14it/s] Training 1/1 epoch (loss 1.5932): 14%|β–ˆβ– | 179/1250 [01:05<05:43, 3.11it/s] Training 1/1 epoch (loss 1.6069): 14%|β–ˆβ– | 179/1250 [01:06<05:43, 3.11it/s] Training 1/1 epoch (loss 1.6069): 14%|β–ˆβ– | 180/1250 [01:06<06:02, 2.95it/s] Training 1/1 epoch (loss 1.6143): 14%|β–ˆβ– | 180/1250 [01:06<06:02, 2.95it/s] Training 1/1 epoch (loss 1.6143): 14%|β–ˆβ– | 181/1250 [01:06<06:00, 2.96it/s] Training 1/1 epoch (loss 1.7559): 14%|β–ˆβ– | 181/1250 [01:06<06:00, 2.96it/s] Training 1/1 epoch (loss 1.7559): 15%|β–ˆβ– | 182/1250 [01:06<05:53, 3.02it/s] Training 1/1 epoch (loss 1.6504): 15%|β–ˆβ– | 182/1250 [01:07<05:53, 3.02it/s] Training 1/1 epoch (loss 1.6504): 15%|β–ˆβ– | 183/1250 [01:07<05:45, 3.09it/s] Training 1/1 epoch (loss 1.7136): 15%|β–ˆβ– | 183/1250 [01:07<05:45, 3.09it/s] Training 1/1 epoch (loss 1.7136): 15%|β–ˆβ– | 184/1250 [01:07<05:55, 3.00it/s] Training 1/1 epoch (loss 1.6592): 15%|β–ˆβ– | 184/1250 [01:07<05:55, 3.00it/s] Training 1/1 epoch (loss 1.6592): 15%|β–ˆβ– | 185/1250 [01:07<05:48, 3.05it/s] Training 1/1 epoch (loss 1.7109): 15%|β–ˆβ– | 185/1250 [01:08<05:48, 3.05it/s] Training 1/1 epoch (loss 1.7109): 15%|β–ˆβ– | 186/1250 [01:08<05:47, 3.06it/s] Training 1/1 epoch (loss 1.6405): 15%|β–ˆβ– | 186/1250 [01:08<05:47, 3.06it/s] Training 1/1 epoch (loss 1.6405): 15%|β–ˆβ– | 187/1250 [01:08<05:53, 3.01it/s] Training 1/1 epoch (loss 1.6203): 15%|β–ˆβ– | 187/1250 [01:08<05:53, 3.01it/s] Training 1/1 epoch (loss 1.6203): 15%|β–ˆβ–Œ | 188/1250 [01:08<05:52, 3.02it/s] Training 1/1 epoch (loss 1.6863): 15%|β–ˆβ–Œ | 188/1250 [01:09<05:52, 3.02it/s] Training 1/1 epoch (loss 1.6863): 15%|β–ˆβ–Œ | 189/1250 [01:09<05:51, 3.02it/s] Training 1/1 epoch (loss 1.6963): 15%|β–ˆβ–Œ | 189/1250 [01:09<05:51, 3.02it/s] Training 1/1 epoch (loss 1.6963): 15%|β–ˆβ–Œ | 190/1250 [01:09<05:43, 3.09it/s] Training 1/1 epoch (loss 1.6401): 15%|β–ˆβ–Œ | 190/1250 [01:09<05:43, 3.09it/s] Training 1/1 epoch (loss 1.6401): 15%|β–ˆβ–Œ | 191/1250 [01:09<05:34, 3.17it/s] Training 1/1 epoch (loss 1.6719): 15%|β–ˆβ–Œ | 191/1250 [01:10<05:34, 3.17it/s] Training 1/1 epoch (loss 1.6719): 15%|β–ˆβ–Œ | 192/1250 [01:10<05:46, 3.05it/s] Training 1/1 epoch (loss 1.5388): 15%|β–ˆβ–Œ | 192/1250 [01:10<05:46, 3.05it/s] Training 1/1 epoch (loss 1.5388): 15%|β–ˆβ–Œ | 193/1250 [01:10<05:51, 3.00it/s] Training 1/1 epoch (loss 1.5542): 15%|β–ˆβ–Œ | 193/1250 [01:10<05:51, 3.00it/s] Training 1/1 epoch (loss 1.5542): 16%|β–ˆβ–Œ | 194/1250 [01:10<05:48, 3.03it/s] Training 1/1 epoch (loss 1.6488): 16%|β–ˆβ–Œ | 194/1250 [01:11<05:48, 3.03it/s] Training 1/1 epoch (loss 1.6488): 16%|β–ˆβ–Œ | 195/1250 [01:11<05:47, 3.03it/s] Training 1/1 epoch (loss 1.6269): 16%|β–ˆβ–Œ | 195/1250 [01:11<05:47, 3.03it/s] Training 1/1 epoch (loss 1.6269): 16%|β–ˆβ–Œ | 196/1250 [01:11<05:45, 3.05it/s] Training 1/1 epoch (loss 1.6381): 16%|β–ˆβ–Œ | 196/1250 [01:11<05:45, 3.05it/s] Training 1/1 epoch (loss 1.6381): 16%|β–ˆβ–Œ | 197/1250 [01:11<05:43, 3.06it/s] Training 1/1 epoch (loss 1.6572): 16%|β–ˆβ–Œ | 197/1250 [01:12<05:43, 3.06it/s] Training 1/1 epoch (loss 1.6572): 16%|β–ˆβ–Œ | 198/1250 [01:12<05:50, 3.00it/s] Training 1/1 epoch (loss 1.7127): 16%|β–ˆβ–Œ | 198/1250 [01:12<05:50, 3.00it/s] Training 1/1 epoch (loss 1.7127): 16%|β–ˆβ–Œ | 199/1250 [01:12<05:55, 2.95it/s] Training 1/1 epoch (loss 1.5168): 16%|β–ˆβ–Œ | 199/1250 [01:12<05:55, 2.95it/s] Training 1/1 epoch (loss 1.5168): 16%|β–ˆβ–Œ | 200/1250 [01:12<05:50, 2.99it/s] Training 1/1 epoch (loss 1.5615): 16%|β–ˆβ–Œ | 200/1250 [01:13<05:50, 2.99it/s] Training 1/1 epoch (loss 1.5615): 16%|β–ˆβ–Œ | 201/1250 [01:13<05:46, 3.03it/s] Training 1/1 epoch (loss 1.5241): 16%|β–ˆβ–Œ | 201/1250 [01:13<05:46, 3.03it/s] Training 1/1 epoch (loss 1.5241): 16%|β–ˆβ–Œ | 202/1250 [01:13<05:37, 3.10it/s] Training 1/1 epoch (loss 1.6296): 16%|β–ˆβ–Œ | 202/1250 [01:13<05:37, 3.10it/s] Training 1/1 epoch (loss 1.6296): 16%|β–ˆβ–Œ | 203/1250 [01:13<05:40, 3.08it/s] Training 1/1 epoch (loss 1.6001): 16%|β–ˆβ–Œ | 203/1250 [01:13<05:40, 3.08it/s] Training 1/1 epoch (loss 1.6001): 16%|β–ˆβ–‹ | 204/1250 [01:13<05:42, 3.06it/s] Training 1/1 epoch (loss 1.5036): 16%|β–ˆβ–‹ | 204/1250 [01:14<05:42, 3.06it/s] Training 1/1 epoch (loss 1.5036): 16%|β–ˆβ–‹ | 205/1250 [01:14<06:06, 2.85it/s] Training 1/1 epoch (loss 1.7072): 16%|β–ˆβ–‹ | 205/1250 [01:14<06:06, 2.85it/s] Training 1/1 epoch (loss 1.7072): 16%|β–ˆβ–‹ | 206/1250 [01:14<06:24, 2.72it/s] Training 1/1 epoch (loss 1.6386): 16%|β–ˆβ–‹ | 206/1250 [01:15<06:24, 2.72it/s] Training 1/1 epoch (loss 1.6386): 17%|β–ˆβ–‹ | 207/1250 [01:15<06:20, 2.74it/s] Training 1/1 epoch (loss 1.6602): 17%|β–ˆβ–‹ | 207/1250 [01:15<06:20, 2.74it/s] Training 1/1 epoch (loss 1.6602): 17%|β–ˆβ–‹ | 208/1250 [01:15<06:07, 2.83it/s] Training 1/1 epoch (loss 1.6150): 17%|β–ˆβ–‹ | 208/1250 [01:15<06:07, 2.83it/s] Training 1/1 epoch (loss 1.6150): 17%|β–ˆβ–‹ | 209/1250 [01:15<06:03, 2.86it/s] Training 1/1 epoch (loss 1.6153): 17%|β–ˆβ–‹ | 209/1250 [01:16<06:03, 2.86it/s] Training 1/1 epoch (loss 1.6153): 17%|β–ˆβ–‹ | 210/1250 [01:16<06:05, 2.84it/s] Training 1/1 epoch (loss 1.6349): 17%|β–ˆβ–‹ | 210/1250 [01:16<06:05, 2.84it/s] Training 1/1 epoch (loss 1.6349): 17%|β–ˆβ–‹ | 211/1250 [01:16<06:13, 2.78it/s] Training 1/1 epoch (loss 1.6174): 17%|β–ˆβ–‹ | 211/1250 [01:16<06:13, 2.78it/s] Training 1/1 epoch (loss 1.6174): 17%|β–ˆβ–‹ | 212/1250 [01:16<05:55, 2.92it/s] Training 1/1 epoch (loss 1.6462): 17%|β–ˆβ–‹ | 212/1250 [01:17<05:55, 2.92it/s] Training 1/1 epoch (loss 1.6462): 17%|β–ˆβ–‹ | 213/1250 [01:17<05:45, 3.00it/s] Training 1/1 epoch (loss 1.5557): 17%|β–ˆβ–‹ | 213/1250 [01:17<05:45, 3.00it/s] Training 1/1 epoch (loss 1.5557): 17%|β–ˆβ–‹ | 214/1250 [01:17<05:47, 2.98it/s] Training 1/1 epoch (loss 1.6338): 17%|β–ˆβ–‹ | 214/1250 [01:17<05:47, 2.98it/s] Training 1/1 epoch (loss 1.6338): 17%|β–ˆβ–‹ | 215/1250 [01:17<05:45, 3.00it/s] Training 1/1 epoch (loss 1.6456): 17%|β–ˆβ–‹ | 215/1250 [01:18<05:45, 3.00it/s] Training 1/1 epoch (loss 1.6456): 17%|β–ˆβ–‹ | 216/1250 [01:18<06:05, 2.83it/s] Training 1/1 epoch (loss 1.7086): 17%|β–ˆβ–‹ | 216/1250 [01:18<06:05, 2.83it/s] Training 1/1 epoch (loss 1.7086): 17%|β–ˆβ–‹ | 217/1250 [01:18<05:55, 2.91it/s] Training 1/1 epoch (loss 1.6360): 17%|β–ˆβ–‹ | 217/1250 [01:18<05:55, 2.91it/s] Training 1/1 epoch (loss 1.6360): 17%|β–ˆβ–‹ | 218/1250 [01:18<05:43, 3.01it/s] Training 1/1 epoch (loss 1.6479): 17%|β–ˆβ–‹ | 218/1250 [01:19<05:43, 3.01it/s] Training 1/1 epoch (loss 1.6479): 18%|β–ˆβ–Š | 219/1250 [01:19<05:35, 3.07it/s] Training 1/1 epoch (loss 1.6044): 18%|β–ˆβ–Š | 219/1250 [01:19<05:35, 3.07it/s] Training 1/1 epoch (loss 1.6044): 18%|β–ˆβ–Š | 220/1250 [01:19<05:27, 3.15it/s] Training 1/1 epoch (loss 1.4782): 18%|β–ˆβ–Š | 220/1250 [01:19<05:27, 3.15it/s] Training 1/1 epoch (loss 1.4782): 18%|β–ˆβ–Š | 221/1250 [01:19<05:47, 2.96it/s] Training 1/1 epoch (loss 1.6171): 18%|β–ˆβ–Š | 221/1250 [01:20<05:47, 2.96it/s] Training 1/1 epoch (loss 1.6171): 18%|β–ˆβ–Š | 222/1250 [01:20<05:56, 2.88it/s] Training 1/1 epoch (loss 1.6052): 18%|β–ˆβ–Š | 222/1250 [01:20<05:56, 2.88it/s] Training 1/1 epoch (loss 1.6052): 18%|β–ˆβ–Š | 223/1250 [01:20<05:50, 2.93it/s] Training 1/1 epoch (loss 1.7536): 18%|β–ˆβ–Š | 223/1250 [01:20<05:50, 2.93it/s] Training 1/1 epoch (loss 1.7536): 18%|β–ˆβ–Š | 224/1250 [01:20<05:42, 3.00it/s] Training 1/1 epoch (loss 1.7144): 18%|β–ˆβ–Š | 224/1250 [01:21<05:42, 3.00it/s] Training 1/1 epoch (loss 1.7144): 18%|β–ˆβ–Š | 225/1250 [01:21<05:41, 3.01it/s] Training 1/1 epoch (loss 1.5455): 18%|β–ˆβ–Š | 225/1250 [01:21<05:41, 3.01it/s] Training 1/1 epoch (loss 1.5455): 18%|β–ˆβ–Š | 226/1250 [01:21<05:34, 3.06it/s] Training 1/1 epoch (loss 1.6433): 18%|β–ˆβ–Š | 226/1250 [01:21<05:34, 3.06it/s] Training 1/1 epoch (loss 1.6433): 18%|β–ˆβ–Š | 227/1250 [01:21<05:38, 3.02it/s] Training 1/1 epoch (loss 1.7469): 18%|β–ˆβ–Š | 227/1250 [01:22<05:38, 3.02it/s] Training 1/1 epoch (loss 1.7469): 18%|β–ˆβ–Š | 228/1250 [01:22<06:20, 2.68it/s] Training 1/1 epoch (loss 1.5943): 18%|β–ˆβ–Š | 228/1250 [01:22<06:20, 2.68it/s] Training 1/1 epoch (loss 1.5943): 18%|β–ˆβ–Š | 229/1250 [01:22<06:00, 2.83it/s] Training 1/1 epoch (loss 1.7100): 18%|β–ˆβ–Š | 229/1250 [01:22<06:00, 2.83it/s] Training 1/1 epoch (loss 1.7100): 18%|β–ˆβ–Š | 230/1250 [01:22<05:44, 2.96it/s] Training 1/1 epoch (loss 1.5117): 18%|β–ˆβ–Š | 230/1250 [01:23<05:44, 2.96it/s] Training 1/1 epoch (loss 1.5117): 18%|β–ˆβ–Š | 231/1250 [01:23<05:33, 3.06it/s] Training 1/1 epoch (loss 1.6362): 18%|β–ˆβ–Š | 231/1250 [01:23<05:33, 3.06it/s] Training 1/1 epoch (loss 1.6362): 19%|β–ˆβ–Š | 232/1250 [01:23<05:46, 2.94it/s] Training 1/1 epoch (loss 1.6859): 19%|β–ˆβ–Š | 232/1250 [01:23<05:46, 2.94it/s] Training 1/1 epoch (loss 1.6859): 19%|β–ˆβ–Š | 233/1250 [01:23<05:39, 2.99it/s] Training 1/1 epoch (loss 1.5388): 19%|β–ˆβ–Š | 233/1250 [01:24<05:39, 2.99it/s] Training 1/1 epoch (loss 1.5388): 19%|β–ˆβ–Š | 234/1250 [01:24<05:51, 2.89it/s] Training 1/1 epoch (loss 1.4877): 19%|β–ˆβ–Š | 234/1250 [01:24<05:51, 2.89it/s] Training 1/1 epoch (loss 1.4877): 19%|β–ˆβ–‰ | 235/1250 [01:24<05:45, 2.93it/s] Training 1/1 epoch (loss 1.6822): 19%|β–ˆβ–‰ | 235/1250 [01:24<05:45, 2.93it/s] Training 1/1 epoch (loss 1.6822): 19%|β–ˆβ–‰ | 236/1250 [01:24<05:33, 3.04it/s] Training 1/1 epoch (loss 1.5888): 19%|β–ˆβ–‰ | 236/1250 [01:25<05:33, 3.04it/s] Training 1/1 epoch (loss 1.5888): 19%|β–ˆβ–‰ | 237/1250 [01:25<05:25, 3.12it/s] Training 1/1 epoch (loss 1.7018): 19%|β–ˆβ–‰ | 237/1250 [01:25<05:25, 3.12it/s] Training 1/1 epoch (loss 1.7018): 19%|β–ˆβ–‰ | 238/1250 [01:25<05:22, 3.14it/s] Training 1/1 epoch (loss 1.5647): 19%|β–ˆβ–‰ | 238/1250 [01:25<05:22, 3.14it/s] Training 1/1 epoch (loss 1.5647): 19%|β–ˆβ–‰ | 239/1250 [01:25<05:26, 3.09it/s] Training 1/1 epoch (loss 1.6897): 19%|β–ˆβ–‰ | 239/1250 [01:26<05:26, 3.09it/s] Training 1/1 epoch (loss 1.6897): 19%|β–ˆβ–‰ | 240/1250 [01:26<05:48, 2.90it/s] Training 1/1 epoch (loss 1.6972): 19%|β–ˆβ–‰ | 240/1250 [01:26<05:48, 2.90it/s] Training 1/1 epoch (loss 1.6972): 19%|β–ˆβ–‰ | 241/1250 [01:26<05:47, 2.91it/s] Training 1/1 epoch (loss 1.5131): 19%|β–ˆβ–‰ | 241/1250 [01:26<05:47, 2.91it/s] Training 1/1 epoch (loss 1.5131): 19%|β–ˆβ–‰ | 242/1250 [01:26<05:45, 2.92it/s] Training 1/1 epoch (loss 1.6371): 19%|β–ˆβ–‰ | 242/1250 [01:27<05:45, 2.92it/s] Training 1/1 epoch (loss 1.6371): 19%|β–ˆβ–‰ | 243/1250 [01:27<05:34, 3.01it/s] Training 1/1 epoch (loss 1.5939): 19%|β–ˆβ–‰ | 243/1250 [01:27<05:34, 3.01it/s] Training 1/1 epoch (loss 1.5939): 20%|β–ˆβ–‰ | 244/1250 [01:27<05:29, 3.06it/s] Training 1/1 epoch (loss 1.5540): 20%|β–ˆβ–‰ | 244/1250 [01:27<05:29, 3.06it/s] Training 1/1 epoch (loss 1.5540): 20%|β–ˆβ–‰ | 245/1250 [01:27<05:49, 2.88it/s] Training 1/1 epoch (loss 1.6070): 20%|β–ˆβ–‰ | 245/1250 [01:28<05:49, 2.88it/s] Training 1/1 epoch (loss 1.6070): 20%|β–ˆβ–‰ | 246/1250 [01:28<06:14, 2.68it/s] Training 1/1 epoch (loss 1.6770): 20%|β–ˆβ–‰ | 246/1250 [01:28<06:14, 2.68it/s] Training 1/1 epoch (loss 1.6770): 20%|β–ˆβ–‰ | 247/1250 [01:28<05:51, 2.86it/s] Training 1/1 epoch (loss 1.6093): 20%|β–ˆβ–‰ | 247/1250 [01:29<05:51, 2.86it/s] Training 1/1 epoch (loss 1.6093): 20%|β–ˆβ–‰ | 248/1250 [01:29<05:47, 2.88it/s] Training 1/1 epoch (loss 1.6457): 20%|β–ˆβ–‰ | 248/1250 [01:29<05:47, 2.88it/s] Training 1/1 epoch (loss 1.6457): 20%|β–ˆβ–‰ | 249/1250 [01:29<05:33, 3.00it/s] Training 1/1 epoch (loss 1.6275): 20%|β–ˆβ–‰ | 249/1250 [01:29<05:33, 3.00it/s] Training 1/1 epoch (loss 1.6275): 20%|β–ˆβ–ˆ | 250/1250 [01:29<05:26, 3.06it/s] Training 1/1 epoch (loss 1.6736): 20%|β–ˆβ–ˆ | 250/1250 [01:30<05:26, 3.06it/s] Training 1/1 epoch (loss 1.6736): 20%|β–ˆβ–ˆ | 251/1250 [01:30<05:35, 2.98it/s] Training 1/1 epoch (loss 1.7277): 20%|β–ˆβ–ˆ | 251/1250 [01:30<05:35, 2.98it/s] Training 1/1 epoch (loss 1.7277): 20%|β–ˆβ–ˆ | 252/1250 [01:30<05:40, 2.93it/s] Training 1/1 epoch (loss 1.6705): 20%|β–ˆβ–ˆ | 252/1250 [01:30<05:40, 2.93it/s] Training 1/1 epoch (loss 1.6705): 20%|β–ˆβ–ˆ | 253/1250 [01:30<05:31, 3.00it/s] Training 1/1 epoch (loss 1.5868): 20%|β–ˆβ–ˆ | 253/1250 [01:31<05:31, 3.00it/s] Training 1/1 epoch (loss 1.5868): 20%|β–ˆβ–ˆ | 254/1250 [01:31<05:41, 2.92it/s] Training 1/1 epoch (loss 1.6084): 20%|β–ˆβ–ˆ | 254/1250 [01:31<05:41, 2.92it/s] Training 1/1 epoch (loss 1.6084): 20%|β–ˆβ–ˆ | 255/1250 [01:31<05:40, 2.93it/s] Training 1/1 epoch (loss 1.5542): 20%|β–ˆβ–ˆ | 255/1250 [01:31<05:40, 2.93it/s] Training 1/1 epoch (loss 1.5542): 20%|β–ˆβ–ˆ | 256/1250 [01:31<05:28, 3.02it/s] Training 1/1 epoch (loss 1.6128): 20%|β–ˆβ–ˆ | 256/1250 [01:32<05:28, 3.02it/s] Training 1/1 epoch (loss 1.6128): 21%|β–ˆβ–ˆ | 257/1250 [01:32<05:33, 2.98it/s] Training 1/1 epoch (loss 1.5427): 21%|β–ˆβ–ˆ | 257/1250 [01:32<05:33, 2.98it/s] Training 1/1 epoch (loss 1.5427): 21%|β–ˆβ–ˆ | 258/1250 [01:32<05:45, 2.87it/s] Training 1/1 epoch (loss 1.5273): 21%|β–ˆβ–ˆ | 258/1250 [01:32<05:45, 2.87it/s] Training 1/1 epoch (loss 1.5273): 21%|β–ˆβ–ˆ | 259/1250 [01:32<05:37, 2.94it/s] Training 1/1 epoch (loss 1.7221): 21%|β–ˆβ–ˆ | 259/1250 [01:33<05:37, 2.94it/s] Training 1/1 epoch (loss 1.7221): 21%|β–ˆβ–ˆ | 260/1250 [01:33<05:26, 3.04it/s] Training 1/1 epoch (loss 1.5717): 21%|β–ˆβ–ˆ | 260/1250 [01:33<05:26, 3.04it/s] Training 1/1 epoch (loss 1.5717): 21%|β–ˆβ–ˆ | 261/1250 [01:33<05:16, 3.12it/s] Training 1/1 epoch (loss 1.5828): 21%|β–ˆβ–ˆ | 261/1250 [01:33<05:16, 3.12it/s] Training 1/1 epoch (loss 1.5828): 21%|β–ˆβ–ˆ | 262/1250 [01:33<05:14, 3.14it/s] Training 1/1 epoch (loss 1.7143): 21%|β–ˆβ–ˆ | 262/1250 [01:33<05:14, 3.14it/s] Training 1/1 epoch (loss 1.7143): 21%|β–ˆβ–ˆ | 263/1250 [01:33<05:19, 3.09it/s] Training 1/1 epoch (loss 1.6139): 21%|β–ˆβ–ˆ | 263/1250 [01:34<05:19, 3.09it/s] Training 1/1 epoch (loss 1.6139): 21%|β–ˆβ–ˆ | 264/1250 [01:34<05:29, 2.99it/s] Training 1/1 epoch (loss 1.5215): 21%|β–ˆβ–ˆ | 264/1250 [01:34<05:29, 2.99it/s] Training 1/1 epoch (loss 1.5215): 21%|β–ˆβ–ˆ | 265/1250 [01:34<05:25, 3.03it/s] Training 1/1 epoch (loss 1.5978): 21%|β–ˆβ–ˆ | 265/1250 [01:35<05:25, 3.03it/s] Training 1/1 epoch (loss 1.5978): 21%|β–ˆβ–ˆβ– | 266/1250 [01:35<05:47, 2.83it/s] Training 1/1 epoch (loss 1.5593): 21%|β–ˆβ–ˆβ– | 266/1250 [01:35<05:47, 2.83it/s] Training 1/1 epoch (loss 1.5593): 21%|β–ˆβ–ˆβ– | 267/1250 [01:35<06:06, 2.68it/s] Training 1/1 epoch (loss 1.6031): 21%|β–ˆβ–ˆβ– | 267/1250 [01:35<06:06, 2.68it/s] Training 1/1 epoch (loss 1.6031): 21%|β–ˆβ–ˆβ– | 268/1250 [01:35<06:08, 2.67it/s] Training 1/1 epoch (loss 1.5090): 21%|β–ˆβ–ˆβ– | 268/1250 [01:36<06:08, 2.67it/s] Training 1/1 epoch (loss 1.5090): 22%|β–ˆβ–ˆβ– | 269/1250 [01:36<05:52, 2.78it/s] Training 1/1 epoch (loss 1.5928): 22%|β–ˆβ–ˆβ– | 269/1250 [01:36<05:52, 2.78it/s] Training 1/1 epoch (loss 1.5928): 22%|β–ˆβ–ˆβ– | 270/1250 [01:36<05:44, 2.84it/s] Training 1/1 epoch (loss 1.6158): 22%|β–ˆβ–ˆβ– | 270/1250 [01:36<05:44, 2.84it/s] Training 1/1 epoch (loss 1.6158): 22%|β–ˆβ–ˆβ– | 271/1250 [01:36<05:32, 2.94it/s] Training 1/1 epoch (loss 1.6039): 22%|β–ˆβ–ˆβ– | 271/1250 [01:37<05:32, 2.94it/s] Training 1/1 epoch (loss 1.6039): 22%|β–ˆβ–ˆβ– | 272/1250 [01:37<05:25, 3.01it/s] Training 1/1 epoch (loss 1.5105): 22%|β–ˆβ–ˆβ– | 272/1250 [01:37<05:25, 3.01it/s] Training 1/1 epoch (loss 1.5105): 22%|β–ˆβ–ˆβ– | 273/1250 [01:37<05:36, 2.90it/s] Training 1/1 epoch (loss 1.5658): 22%|β–ˆβ–ˆβ– | 273/1250 [01:37<05:36, 2.90it/s] Training 1/1 epoch (loss 1.5658): 22%|β–ˆβ–ˆβ– | 274/1250 [01:37<05:35, 2.91it/s] Training 1/1 epoch (loss 1.5938): 22%|β–ˆβ–ˆβ– | 274/1250 [01:38<05:35, 2.91it/s] Training 1/1 epoch (loss 1.5938): 22%|β–ˆβ–ˆβ– | 275/1250 [01:38<05:36, 2.89it/s] Training 1/1 epoch (loss 1.5544): 22%|β–ˆβ–ˆβ– | 275/1250 [01:38<05:36, 2.89it/s] Training 1/1 epoch (loss 1.5544): 22%|β–ˆβ–ˆβ– | 276/1250 [01:38<05:33, 2.92it/s] Training 1/1 epoch (loss 1.5722): 22%|β–ˆβ–ˆβ– | 276/1250 [01:38<05:33, 2.92it/s] Training 1/1 epoch (loss 1.5722): 22%|β–ˆβ–ˆβ– | 277/1250 [01:38<05:27, 2.97it/s] Training 1/1 epoch (loss 1.5927): 22%|β–ˆβ–ˆβ– | 277/1250 [01:39<05:27, 2.97it/s] Training 1/1 epoch (loss 1.5927): 22%|β–ˆβ–ˆβ– | 278/1250 [01:39<05:16, 3.07it/s] Training 1/1 epoch (loss 1.6895): 22%|β–ˆβ–ˆβ– | 278/1250 [01:39<05:16, 3.07it/s] Training 1/1 epoch (loss 1.6895): 22%|β–ˆβ–ˆβ– | 279/1250 [01:39<05:29, 2.95it/s] Training 1/1 epoch (loss 1.7311): 22%|β–ˆβ–ˆβ– | 279/1250 [01:39<05:29, 2.95it/s] Training 1/1 epoch (loss 1.7311): 22%|β–ˆβ–ˆβ– | 280/1250 [01:39<05:30, 2.94it/s] Training 1/1 epoch (loss 1.6755): 22%|β–ˆβ–ˆβ– | 280/1250 [01:40<05:30, 2.94it/s] Training 1/1 epoch (loss 1.6755): 22%|β–ˆβ–ˆβ– | 281/1250 [01:40<05:28, 2.95it/s] Training 1/1 epoch (loss 1.5794): 22%|β–ˆβ–ˆβ– | 281/1250 [01:40<05:28, 2.95it/s] Training 1/1 epoch (loss 1.5794): 23%|β–ˆβ–ˆβ–Ž | 282/1250 [01:40<05:26, 2.97it/s] Training 1/1 epoch (loss 1.5393): 23%|β–ˆβ–ˆβ–Ž | 282/1250 [01:40<05:26, 2.97it/s] Training 1/1 epoch (loss 1.5393): 23%|β–ˆβ–ˆβ–Ž | 283/1250 [01:40<05:19, 3.02it/s] Training 1/1 epoch (loss 1.5164): 23%|β–ˆβ–ˆβ–Ž | 283/1250 [01:41<05:19, 3.02it/s] Training 1/1 epoch (loss 1.5164): 23%|β–ˆβ–ˆβ–Ž | 284/1250 [01:41<05:08, 3.13it/s] Training 1/1 epoch (loss 1.6751): 23%|β–ˆβ–ˆβ–Ž | 284/1250 [01:41<05:08, 3.13it/s] Training 1/1 epoch (loss 1.6751): 23%|β–ˆβ–ˆβ–Ž | 285/1250 [01:41<05:05, 3.16it/s] Training 1/1 epoch (loss 1.5478): 23%|β–ˆβ–ˆβ–Ž | 285/1250 [01:41<05:05, 3.16it/s] Training 1/1 epoch (loss 1.5478): 23%|β–ˆβ–ˆβ–Ž | 286/1250 [01:41<05:03, 3.18it/s] Training 1/1 epoch (loss 1.6563): 23%|β–ˆβ–ˆβ–Ž | 286/1250 [01:42<05:03, 3.18it/s] Training 1/1 epoch (loss 1.6563): 23%|β–ˆβ–ˆβ–Ž | 287/1250 [01:42<05:01, 3.19it/s] Training 1/1 epoch (loss 1.5295): 23%|β–ˆβ–ˆβ–Ž | 287/1250 [01:42<05:01, 3.19it/s] Training 1/1 epoch (loss 1.5295): 23%|β–ˆβ–ˆβ–Ž | 288/1250 [01:42<05:12, 3.08it/s] Training 1/1 epoch (loss 1.6710): 23%|β–ˆβ–ˆβ–Ž | 288/1250 [01:42<05:12, 3.08it/s] Training 1/1 epoch (loss 1.6710): 23%|β–ˆβ–ˆβ–Ž | 289/1250 [01:42<05:16, 3.04it/s] Training 1/1 epoch (loss 1.5751): 23%|β–ˆβ–ˆβ–Ž | 289/1250 [01:43<05:16, 3.04it/s] Training 1/1 epoch (loss 1.5751): 23%|β–ˆβ–ˆβ–Ž | 290/1250 [01:43<05:08, 3.11it/s] Training 1/1 epoch (loss 1.5927): 23%|β–ˆβ–ˆβ–Ž | 290/1250 [01:43<05:08, 3.11it/s] Training 1/1 epoch (loss 1.5927): 23%|β–ˆβ–ˆβ–Ž | 291/1250 [01:43<04:59, 3.20it/s] Training 1/1 epoch (loss 1.6131): 23%|β–ˆβ–ˆβ–Ž | 291/1250 [01:43<04:59, 3.20it/s] Training 1/1 epoch (loss 1.6131): 23%|β–ˆβ–ˆβ–Ž | 292/1250 [01:43<05:03, 3.16it/s] Training 1/1 epoch (loss 1.6467): 23%|β–ˆβ–ˆβ–Ž | 292/1250 [01:44<05:03, 3.16it/s] Training 1/1 epoch (loss 1.6467): 23%|β–ˆβ–ˆβ–Ž | 293/1250 [01:44<04:59, 3.20it/s] Training 1/1 epoch (loss 1.6036): 23%|β–ˆβ–ˆβ–Ž | 293/1250 [01:44<04:59, 3.20it/s] Training 1/1 epoch (loss 1.6036): 24%|β–ˆβ–ˆβ–Ž | 294/1250 [01:44<05:08, 3.10it/s] Training 1/1 epoch (loss 1.6298): 24%|β–ˆβ–ˆβ–Ž | 294/1250 [01:44<05:08, 3.10it/s] Training 1/1 epoch (loss 1.6298): 24%|β–ˆβ–ˆβ–Ž | 295/1250 [01:44<05:04, 3.13it/s] Training 1/1 epoch (loss 1.6126): 24%|β–ˆβ–ˆβ–Ž | 295/1250 [01:45<05:04, 3.13it/s] Training 1/1 epoch (loss 1.6126): 24%|β–ˆβ–ˆβ–Ž | 296/1250 [01:45<05:12, 3.05it/s] Training 1/1 epoch (loss 1.7168): 24%|β–ˆβ–ˆβ–Ž | 296/1250 [01:45<05:12, 3.05it/s] Training 1/1 epoch (loss 1.7168): 24%|β–ˆβ–ˆβ– | 297/1250 [01:45<05:05, 3.12it/s] Training 1/1 epoch (loss 1.5556): 24%|β–ˆβ–ˆβ– | 297/1250 [01:45<05:05, 3.12it/s] Training 1/1 epoch (loss 1.5556): 24%|β–ˆβ–ˆβ– | 298/1250 [01:45<05:02, 3.14it/s] Training 1/1 epoch (loss 1.5828): 24%|β–ˆβ–ˆβ– | 298/1250 [01:45<05:02, 3.14it/s] Training 1/1 epoch (loss 1.5828): 24%|β–ˆβ–ˆβ– | 299/1250 [01:45<05:01, 3.15it/s] Training 1/1 epoch (loss 1.5870): 24%|β–ˆβ–ˆβ– | 299/1250 [01:46<05:01, 3.15it/s] Training 1/1 epoch (loss 1.5870): 24%|β–ˆβ–ˆβ– | 300/1250 [01:46<05:05, 3.11it/s] Training 1/1 epoch (loss 1.6951): 24%|β–ˆβ–ˆβ– | 300/1250 [01:46<05:05, 3.11it/s] Training 1/1 epoch (loss 1.6951): 24%|β–ˆβ–ˆβ– | 301/1250 [01:46<05:10, 3.06it/s] Training 1/1 epoch (loss 1.6014): 24%|β–ˆβ–ˆβ– | 301/1250 [01:46<05:10, 3.06it/s] Training 1/1 epoch (loss 1.6014): 24%|β–ˆβ–ˆβ– | 302/1250 [01:46<04:59, 3.17it/s] Training 1/1 epoch (loss 1.6361): 24%|β–ˆβ–ˆβ– | 302/1250 [01:47<04:59, 3.17it/s] Training 1/1 epoch (loss 1.6361): 24%|β–ˆβ–ˆβ– | 303/1250 [01:47<04:59, 3.16it/s] Training 1/1 epoch (loss 1.6392): 24%|β–ˆβ–ˆβ– | 303/1250 [01:47<04:59, 3.16it/s] Training 1/1 epoch (loss 1.6392): 24%|β–ˆβ–ˆβ– | 304/1250 [01:47<04:57, 3.18it/s] Training 1/1 epoch (loss 1.6090): 24%|β–ˆβ–ˆβ– | 304/1250 [01:47<04:57, 3.18it/s] Training 1/1 epoch (loss 1.6090): 24%|β–ˆβ–ˆβ– | 305/1250 [01:47<05:15, 3.00it/s] Training 1/1 epoch (loss 1.5447): 24%|β–ˆβ–ˆβ– | 305/1250 [01:48<05:15, 3.00it/s] Training 1/1 epoch (loss 1.5447): 24%|β–ˆβ–ˆβ– | 306/1250 [01:48<05:14, 3.00it/s] Training 1/1 epoch (loss 1.6214): 24%|β–ˆβ–ˆβ– | 306/1250 [01:48<05:14, 3.00it/s] Training 1/1 epoch (loss 1.6214): 25%|β–ˆβ–ˆβ– | 307/1250 [01:48<05:18, 2.96it/s] Training 1/1 epoch (loss 1.5637): 25%|β–ˆβ–ˆβ– | 307/1250 [01:48<05:18, 2.96it/s] Training 1/1 epoch (loss 1.5637): 25%|β–ˆβ–ˆβ– | 308/1250 [01:48<05:09, 3.04it/s] Training 1/1 epoch (loss 1.6563): 25%|β–ˆβ–ˆβ– | 308/1250 [01:49<05:09, 3.04it/s] Training 1/1 epoch (loss 1.6563): 25%|β–ˆβ–ˆβ– | 309/1250 [01:49<05:02, 3.11it/s] Training 1/1 epoch (loss 1.6813): 25%|β–ˆβ–ˆβ– | 309/1250 [01:49<05:02, 3.11it/s] Training 1/1 epoch (loss 1.6813): 25%|β–ˆβ–ˆβ– | 310/1250 [01:49<04:58, 3.15it/s] Training 1/1 epoch (loss 1.5708): 25%|β–ˆβ–ˆβ– | 310/1250 [01:49<04:58, 3.15it/s] Training 1/1 epoch (loss 1.5708): 25%|β–ˆβ–ˆβ– | 311/1250 [01:49<05:00, 3.13it/s] Training 1/1 epoch (loss 1.6274): 25%|β–ˆβ–ˆβ– | 311/1250 [01:50<05:00, 3.13it/s] Training 1/1 epoch (loss 1.6274): 25%|β–ˆβ–ˆβ– | 312/1250 [01:50<05:19, 2.94it/s] Training 1/1 epoch (loss 1.6267): 25%|β–ˆβ–ˆβ– | 312/1250 [01:50<05:19, 2.94it/s] Training 1/1 epoch (loss 1.6267): 25%|β–ˆβ–ˆβ–Œ | 313/1250 [01:50<05:20, 2.92it/s] Training 1/1 epoch (loss 1.5943): 25%|β–ˆβ–ˆβ–Œ | 313/1250 [01:50<05:20, 2.92it/s] Training 1/1 epoch (loss 1.5943): 25%|β–ˆβ–ˆβ–Œ | 314/1250 [01:50<05:38, 2.77it/s] Training 1/1 epoch (loss 1.6432): 25%|β–ˆβ–ˆβ–Œ | 314/1250 [01:51<05:38, 2.77it/s] Training 1/1 epoch (loss 1.6432): 25%|β–ˆβ–ˆβ–Œ | 315/1250 [01:51<05:33, 2.80it/s] Training 1/1 epoch (loss 1.4095): 25%|β–ˆβ–ˆβ–Œ | 315/1250 [01:51<05:33, 2.80it/s] Training 1/1 epoch (loss 1.4095): 25%|β–ˆβ–ˆβ–Œ | 316/1250 [01:51<05:19, 2.92it/s] Training 1/1 epoch (loss 1.5688): 25%|β–ˆβ–ˆβ–Œ | 316/1250 [01:51<05:19, 2.92it/s] Training 1/1 epoch (loss 1.5688): 25%|β–ˆβ–ˆβ–Œ | 317/1250 [01:51<05:09, 3.02it/s] Training 1/1 epoch (loss 1.6449): 25%|β–ˆβ–ˆβ–Œ | 317/1250 [01:52<05:09, 3.02it/s] Training 1/1 epoch (loss 1.6449): 25%|β–ˆβ–ˆβ–Œ | 318/1250 [01:52<05:07, 3.03it/s] Training 1/1 epoch (loss 1.6766): 25%|β–ˆβ–ˆβ–Œ | 318/1250 [01:52<05:07, 3.03it/s] Training 1/1 epoch (loss 1.6766): 26%|β–ˆβ–ˆβ–Œ | 319/1250 [01:52<05:25, 2.86it/s] Training 1/1 epoch (loss 1.4774): 26%|β–ˆβ–ˆβ–Œ | 319/1250 [01:53<05:25, 2.86it/s] Training 1/1 epoch (loss 1.4774): 26%|β–ˆβ–ˆβ–Œ | 320/1250 [01:53<05:40, 2.73it/s] Training 1/1 epoch (loss 1.6102): 26%|β–ˆβ–ˆβ–Œ | 320/1250 [01:53<05:40, 2.73it/s] Training 1/1 epoch (loss 1.6102): 26%|β–ˆβ–ˆβ–Œ | 321/1250 [01:53<05:43, 2.71it/s] Training 1/1 epoch (loss 1.6485): 26%|β–ˆβ–ˆβ–Œ | 321/1250 [01:53<05:43, 2.71it/s] Training 1/1 epoch (loss 1.6485): 26%|β–ˆβ–ˆβ–Œ | 322/1250 [01:53<05:42, 2.71it/s] Training 1/1 epoch (loss 1.5127): 26%|β–ˆβ–ˆβ–Œ | 322/1250 [01:54<05:42, 2.71it/s] Training 1/1 epoch (loss 1.5127): 26%|β–ˆβ–ˆβ–Œ | 323/1250 [01:54<05:36, 2.75it/s] Training 1/1 epoch (loss 1.6061): 26%|β–ˆβ–ˆβ–Œ | 323/1250 [01:54<05:36, 2.75it/s] Training 1/1 epoch (loss 1.6061): 26%|β–ˆβ–ˆβ–Œ | 324/1250 [01:54<05:43, 2.70it/s] Training 1/1 epoch (loss 1.6090): 26%|β–ˆβ–ˆβ–Œ | 324/1250 [01:54<05:43, 2.70it/s] Training 1/1 epoch (loss 1.6090): 26%|β–ˆβ–ˆβ–Œ | 325/1250 [01:54<05:29, 2.81it/s] Training 1/1 epoch (loss 1.6839): 26%|β–ˆβ–ˆβ–Œ | 325/1250 [01:55<05:29, 2.81it/s] Training 1/1 epoch (loss 1.6839): 26%|β–ˆβ–ˆβ–Œ | 326/1250 [01:55<05:13, 2.95it/s] Training 1/1 epoch (loss 1.6015): 26%|β–ˆβ–ˆβ–Œ | 326/1250 [01:55<05:13, 2.95it/s] Training 1/1 epoch (loss 1.6015): 26%|β–ˆβ–ˆβ–Œ | 327/1250 [01:55<05:00, 3.07it/s] Training 1/1 epoch (loss 1.5252): 26%|β–ˆβ–ˆβ–Œ | 327/1250 [01:55<05:00, 3.07it/s] Training 1/1 epoch (loss 1.5252): 26%|β–ˆβ–ˆβ–Œ | 328/1250 [01:55<04:59, 3.07it/s] Training 1/1 epoch (loss 1.6129): 26%|β–ˆβ–ˆβ–Œ | 328/1250 [01:56<04:59, 3.07it/s] Training 1/1 epoch (loss 1.6129): 26%|β–ˆβ–ˆβ–‹ | 329/1250 [01:56<04:56, 3.10it/s] Training 1/1 epoch (loss 1.6711): 26%|β–ˆβ–ˆβ–‹ | 329/1250 [01:56<04:56, 3.10it/s] Training 1/1 epoch (loss 1.6711): 26%|β–ˆβ–ˆβ–‹ | 330/1250 [01:56<05:31, 2.77it/s] Training 1/1 epoch (loss 1.6621): 26%|β–ˆβ–ˆβ–‹ | 330/1250 [01:56<05:31, 2.77it/s] Training 1/1 epoch (loss 1.6621): 26%|β–ˆβ–ˆβ–‹ | 331/1250 [01:56<05:31, 2.78it/s] Training 1/1 epoch (loss 1.5470): 26%|β–ˆβ–ˆβ–‹ | 331/1250 [01:57<05:31, 2.78it/s] Training 1/1 epoch (loss 1.5470): 27%|β–ˆβ–ˆβ–‹ | 332/1250 [01:57<05:16, 2.90it/s] Training 1/1 epoch (loss 1.4893): 27%|β–ˆβ–ˆβ–‹ | 332/1250 [01:57<05:16, 2.90it/s] Training 1/1 epoch (loss 1.4893): 27%|β–ˆβ–ˆβ–‹ | 333/1250 [01:57<05:11, 2.95it/s] Training 1/1 epoch (loss 1.5145): 27%|β–ˆβ–ˆβ–‹ | 333/1250 [01:57<05:11, 2.95it/s] Training 1/1 epoch (loss 1.5145): 27%|β–ˆβ–ˆβ–‹ | 334/1250 [01:57<05:16, 2.90it/s] Training 1/1 epoch (loss 1.5349): 27%|β–ˆβ–ˆβ–‹ | 334/1250 [01:58<05:16, 2.90it/s] Training 1/1 epoch (loss 1.5349): 27%|β–ˆβ–ˆβ–‹ | 335/1250 [01:58<05:17, 2.88it/s] Training 1/1 epoch (loss 1.5335): 27%|β–ˆβ–ˆβ–‹ | 335/1250 [01:58<05:17, 2.88it/s] Training 1/1 epoch (loss 1.5335): 27%|β–ˆβ–ˆβ–‹ | 336/1250 [01:58<05:39, 2.69it/s] Training 1/1 epoch (loss 1.6054): 27%|β–ˆβ–ˆβ–‹ | 336/1250 [01:59<05:39, 2.69it/s] Training 1/1 epoch (loss 1.6054): 27%|β–ˆβ–ˆβ–‹ | 337/1250 [01:59<05:24, 2.82it/s] Training 1/1 epoch (loss 1.6002): 27%|β–ˆβ–ˆβ–‹ | 337/1250 [01:59<05:24, 2.82it/s] Training 1/1 epoch (loss 1.6002): 27%|β–ˆβ–ˆβ–‹ | 338/1250 [01:59<07:12, 2.11it/s] Training 1/1 epoch (loss 1.5967): 27%|β–ˆβ–ˆβ–‹ | 338/1250 [02:00<07:12, 2.11it/s] Training 1/1 epoch (loss 1.5967): 27%|β–ˆβ–ˆβ–‹ | 339/1250 [02:00<06:35, 2.30it/s] Training 1/1 epoch (loss 1.5495): 27%|β–ˆβ–ˆβ–‹ | 339/1250 [02:00<06:35, 2.30it/s] Training 1/1 epoch (loss 1.5495): 27%|β–ˆβ–ˆβ–‹ | 340/1250 [02:00<06:08, 2.47it/s] Training 1/1 epoch (loss 1.6200): 27%|β–ˆβ–ˆβ–‹ | 340/1250 [02:00<06:08, 2.47it/s] Training 1/1 epoch (loss 1.6200): 27%|β–ˆβ–ˆβ–‹ | 341/1250 [02:00<05:51, 2.59it/s] Training 1/1 epoch (loss 1.5883): 27%|β–ˆβ–ˆβ–‹ | 341/1250 [02:01<05:51, 2.59it/s] Training 1/1 epoch (loss 1.5883): 27%|β–ˆβ–ˆβ–‹ | 342/1250 [02:01<05:32, 2.73it/s] Training 1/1 epoch (loss 1.6733): 27%|β–ˆβ–ˆβ–‹ | 342/1250 [02:01<05:32, 2.73it/s] Training 1/1 epoch (loss 1.6733): 27%|β–ˆβ–ˆβ–‹ | 343/1250 [02:01<05:13, 2.89it/s] Training 1/1 epoch (loss 1.6710): 27%|β–ˆβ–ˆβ–‹ | 343/1250 [02:01<05:13, 2.89it/s] Training 1/1 epoch (loss 1.6710): 28%|β–ˆβ–ˆβ–Š | 344/1250 [02:01<05:11, 2.91it/s] Training 1/1 epoch (loss 1.6250): 28%|β–ˆβ–ˆβ–Š | 344/1250 [02:02<05:11, 2.91it/s] Training 1/1 epoch (loss 1.6250): 28%|β–ˆβ–ˆβ–Š | 345/1250 [02:02<05:01, 3.00it/s] Training 1/1 epoch (loss 1.5616): 28%|β–ˆβ–ˆβ–Š | 345/1250 [02:02<05:01, 3.00it/s] Training 1/1 epoch (loss 1.5616): 28%|β–ˆβ–ˆβ–Š | 346/1250 [02:02<05:02, 2.99it/s] Training 1/1 epoch (loss 1.5430): 28%|β–ˆβ–ˆβ–Š | 346/1250 [02:02<05:02, 2.99it/s] Training 1/1 epoch (loss 1.5430): 28%|β–ˆβ–ˆβ–Š | 347/1250 [02:02<05:06, 2.95it/s] Training 1/1 epoch (loss 1.6039): 28%|β–ˆβ–ˆβ–Š | 347/1250 [02:03<05:06, 2.95it/s] Training 1/1 epoch (loss 1.6039): 28%|β–ˆβ–ˆβ–Š | 348/1250 [02:03<05:11, 2.89it/s] Training 1/1 epoch (loss 1.6082): 28%|β–ˆβ–ˆβ–Š | 348/1250 [02:03<05:11, 2.89it/s] Training 1/1 epoch (loss 1.6082): 28%|β–ˆβ–ˆβ–Š | 349/1250 [02:03<05:18, 2.83it/s] Training 1/1 epoch (loss 1.4704): 28%|β–ˆβ–ˆβ–Š | 349/1250 [02:03<05:18, 2.83it/s] Training 1/1 epoch (loss 1.4704): 28%|β–ˆβ–ˆβ–Š | 350/1250 [02:03<05:20, 2.81it/s] Training 1/1 epoch (loss 1.4788): 28%|β–ˆβ–ˆβ–Š | 350/1250 [02:04<05:20, 2.81it/s] Training 1/1 epoch (loss 1.4788): 28%|β–ˆβ–ˆβ–Š | 351/1250 [02:04<05:21, 2.80it/s] Training 1/1 epoch (loss 1.6381): 28%|β–ˆβ–ˆβ–Š | 351/1250 [02:04<05:21, 2.80it/s] Training 1/1 epoch (loss 1.6381): 28%|β–ˆβ–ˆβ–Š | 352/1250 [02:04<05:43, 2.61it/s] Training 1/1 epoch (loss 1.5997): 28%|β–ˆβ–ˆβ–Š | 352/1250 [02:04<05:43, 2.61it/s] Training 1/1 epoch (loss 1.5997): 28%|β–ˆβ–ˆβ–Š | 353/1250 [02:04<05:30, 2.72it/s] Training 1/1 epoch (loss 1.6104): 28%|β–ˆβ–ˆβ–Š | 353/1250 [02:05<05:30, 2.72it/s] Training 1/1 epoch (loss 1.6104): 28%|β–ˆβ–ˆβ–Š | 354/1250 [02:05<05:22, 2.77it/s] Training 1/1 epoch (loss 1.7510): 28%|β–ˆβ–ˆβ–Š | 354/1250 [02:05<05:22, 2.77it/s] Training 1/1 epoch (loss 1.7510): 28%|β–ˆβ–ˆβ–Š | 355/1250 [02:05<05:07, 2.91it/s] Training 1/1 epoch (loss 1.5433): 28%|β–ˆβ–ˆβ–Š | 355/1250 [02:05<05:07, 2.91it/s] Training 1/1 epoch (loss 1.5433): 28%|β–ˆβ–ˆβ–Š | 356/1250 [02:05<04:55, 3.03it/s] Training 1/1 epoch (loss 1.6681): 28%|β–ˆβ–ˆβ–Š | 356/1250 [02:06<04:55, 3.03it/s] Training 1/1 epoch (loss 1.6681): 29%|β–ˆβ–ˆβ–Š | 357/1250 [02:06<04:53, 3.04it/s] Training 1/1 epoch (loss 1.6715): 29%|β–ˆβ–ˆβ–Š | 357/1250 [02:06<04:53, 3.04it/s] Training 1/1 epoch (loss 1.6715): 29%|β–ˆβ–ˆβ–Š | 358/1250 [02:06<04:54, 3.03it/s] Training 1/1 epoch (loss 1.6712): 29%|β–ˆβ–ˆβ–Š | 358/1250 [02:06<04:54, 3.03it/s] Training 1/1 epoch (loss 1.6712): 29%|β–ˆβ–ˆβ–Š | 359/1250 [02:06<04:49, 3.08it/s] Training 1/1 epoch (loss 1.5895): 29%|β–ˆβ–ˆβ–Š | 359/1250 [02:07<04:49, 3.08it/s] Training 1/1 epoch (loss 1.5895): 29%|β–ˆβ–ˆβ–‰ | 360/1250 [02:07<04:45, 3.11it/s] Training 1/1 epoch (loss 1.5263): 29%|β–ˆβ–ˆβ–‰ | 360/1250 [02:07<04:45, 3.11it/s] Training 1/1 epoch (loss 1.5263): 29%|β–ˆβ–ˆβ–‰ | 361/1250 [02:07<04:46, 3.11it/s] Training 1/1 epoch (loss 1.5522): 29%|β–ˆβ–ˆβ–‰ | 361/1250 [02:07<04:46, 3.11it/s] Training 1/1 epoch (loss 1.5522): 29%|β–ˆβ–ˆβ–‰ | 362/1250 [02:07<04:37, 3.20it/s] Training 1/1 epoch (loss 1.6224): 29%|β–ˆβ–ˆβ–‰ | 362/1250 [02:08<04:37, 3.20it/s] Training 1/1 epoch (loss 1.6224): 29%|β–ˆβ–ˆβ–‰ | 363/1250 [02:08<04:44, 3.12it/s] Training 1/1 epoch (loss 1.5473): 29%|β–ˆβ–ˆβ–‰ | 363/1250 [02:08<04:44, 3.12it/s] Training 1/1 epoch (loss 1.5473): 29%|β–ˆβ–ˆβ–‰ | 364/1250 [02:08<04:54, 3.01it/s] Training 1/1 epoch (loss 1.7393): 29%|β–ˆβ–ˆβ–‰ | 364/1250 [02:08<04:54, 3.01it/s] Training 1/1 epoch (loss 1.7393): 29%|β–ˆβ–ˆβ–‰ | 365/1250 [02:08<05:07, 2.87it/s] Training 1/1 epoch (loss 1.4828): 29%|β–ˆβ–ˆβ–‰ | 365/1250 [02:09<05:07, 2.87it/s] Training 1/1 epoch (loss 1.4828): 29%|β–ˆβ–ˆβ–‰ | 366/1250 [02:09<04:56, 2.98it/s] Training 1/1 epoch (loss 1.6772): 29%|β–ˆβ–ˆβ–‰ | 366/1250 [02:09<04:56, 2.98it/s] Training 1/1 epoch (loss 1.6772): 29%|β–ˆβ–ˆβ–‰ | 367/1250 [02:09<04:45, 3.09it/s] Training 1/1 epoch (loss 1.6025): 29%|β–ˆβ–ˆβ–‰ | 367/1250 [02:09<04:45, 3.09it/s] Training 1/1 epoch (loss 1.6025): 29%|β–ˆβ–ˆβ–‰ | 368/1250 [02:09<04:48, 3.06it/s] Training 1/1 epoch (loss 1.5806): 29%|β–ˆβ–ˆβ–‰ | 368/1250 [02:10<04:48, 3.06it/s] Training 1/1 epoch (loss 1.5806): 30%|β–ˆβ–ˆβ–‰ | 369/1250 [02:10<04:54, 2.99it/s] Training 1/1 epoch (loss 1.6951): 30%|β–ˆβ–ˆβ–‰ | 369/1250 [02:10<04:54, 2.99it/s] Training 1/1 epoch (loss 1.6951): 30%|β–ˆβ–ˆβ–‰ | 370/1250 [02:10<05:05, 2.88it/s] Training 1/1 epoch (loss 1.5532): 30%|β–ˆβ–ˆβ–‰ | 370/1250 [02:10<05:05, 2.88it/s] Training 1/1 epoch (loss 1.5532): 30%|β–ˆβ–ˆβ–‰ | 371/1250 [02:10<04:55, 2.97it/s] Training 1/1 epoch (loss 1.5018): 30%|β–ˆβ–ˆβ–‰ | 371/1250 [02:11<04:55, 2.97it/s] Training 1/1 epoch (loss 1.5018): 30%|β–ˆβ–ˆβ–‰ | 372/1250 [02:11<04:48, 3.04it/s] Training 1/1 epoch (loss 1.5470): 30%|β–ˆβ–ˆβ–‰ | 372/1250 [02:11<04:48, 3.04it/s] Training 1/1 epoch (loss 1.5470): 30%|β–ˆβ–ˆβ–‰ | 373/1250 [02:11<04:42, 3.10it/s] Training 1/1 epoch (loss 1.6161): 30%|β–ˆβ–ˆβ–‰ | 373/1250 [02:11<04:42, 3.10it/s] Training 1/1 epoch (loss 1.6161): 30%|β–ˆβ–ˆβ–‰ | 374/1250 [02:11<04:32, 3.21it/s] Training 1/1 epoch (loss 1.5409): 30%|β–ˆβ–ˆβ–‰ | 374/1250 [02:12<04:32, 3.21it/s] Training 1/1 epoch (loss 1.5409): 30%|β–ˆβ–ˆβ–ˆ | 375/1250 [02:12<04:33, 3.20it/s] Training 1/1 epoch (loss 1.5500): 30%|β–ˆβ–ˆβ–ˆ | 375/1250 [02:12<04:33, 3.20it/s] Training 1/1 epoch (loss 1.5500): 30%|β–ˆβ–ˆβ–ˆ | 376/1250 [02:12<04:52, 2.98it/s] Training 1/1 epoch (loss 1.5552): 30%|β–ˆβ–ˆβ–ˆ | 376/1250 [02:12<04:52, 2.98it/s] Training 1/1 epoch (loss 1.5552): 30%|β–ˆβ–ˆβ–ˆ | 377/1250 [02:12<04:47, 3.04it/s] Training 1/1 epoch (loss 1.5489): 30%|β–ˆβ–ˆβ–ˆ | 377/1250 [02:13<04:47, 3.04it/s] Training 1/1 epoch (loss 1.5489): 30%|β–ˆβ–ˆβ–ˆ | 378/1250 [02:13<04:43, 3.08it/s] Training 1/1 epoch (loss 1.6958): 30%|β–ˆβ–ˆβ–ˆ | 378/1250 [02:13<04:43, 3.08it/s] Training 1/1 epoch (loss 1.6958): 30%|β–ˆβ–ˆβ–ˆ | 379/1250 [02:13<04:36, 3.14it/s] Training 1/1 epoch (loss 1.5941): 30%|β–ˆβ–ˆβ–ˆ | 379/1250 [02:13<04:36, 3.14it/s] Training 1/1 epoch (loss 1.5941): 30%|β–ˆβ–ˆβ–ˆ | 380/1250 [02:13<04:28, 3.24it/s] Training 1/1 epoch (loss 1.6410): 30%|β–ˆβ–ˆβ–ˆ | 380/1250 [02:14<04:28, 3.24it/s] Training 1/1 epoch (loss 1.6410): 30%|β–ˆβ–ˆβ–ˆ | 381/1250 [02:14<04:46, 3.03it/s] Training 1/1 epoch (loss 1.5711): 30%|β–ˆβ–ˆβ–ˆ | 381/1250 [02:14<04:46, 3.03it/s] Training 1/1 epoch (loss 1.5711): 31%|β–ˆβ–ˆβ–ˆ | 382/1250 [02:14<04:48, 3.01it/s] Training 1/1 epoch (loss 1.4735): 31%|β–ˆβ–ˆβ–ˆ | 382/1250 [02:14<04:48, 3.01it/s] Training 1/1 epoch (loss 1.4735): 31%|β–ˆβ–ˆβ–ˆ | 383/1250 [02:14<04:45, 3.03it/s] Training 1/1 epoch (loss 1.6788): 31%|β–ˆβ–ˆβ–ˆ | 383/1250 [02:15<04:45, 3.03it/s] Training 1/1 epoch (loss 1.6788): 31%|β–ˆβ–ˆβ–ˆ | 384/1250 [02:15<04:49, 2.99it/s] Training 1/1 epoch (loss 1.5759): 31%|β–ˆβ–ˆβ–ˆ | 384/1250 [02:15<04:49, 2.99it/s] Training 1/1 epoch (loss 1.5759): 31%|β–ˆβ–ˆβ–ˆ | 385/1250 [02:15<04:41, 3.08it/s] Training 1/1 epoch (loss 1.5840): 31%|β–ˆβ–ˆβ–ˆ | 385/1250 [02:15<04:41, 3.08it/s] Training 1/1 epoch (loss 1.5840): 31%|β–ˆβ–ˆβ–ˆ | 386/1250 [02:15<04:39, 3.09it/s] Training 1/1 epoch (loss 1.5222): 31%|β–ˆβ–ˆβ–ˆ | 386/1250 [02:16<04:39, 3.09it/s] Training 1/1 epoch (loss 1.5222): 31%|β–ˆβ–ˆβ–ˆ | 387/1250 [02:16<04:30, 3.19it/s] Training 1/1 epoch (loss 1.6192): 31%|β–ˆβ–ˆβ–ˆ | 387/1250 [02:16<04:30, 3.19it/s] Training 1/1 epoch (loss 1.6192): 31%|β–ˆβ–ˆβ–ˆ | 388/1250 [02:16<04:38, 3.09it/s] Training 1/1 epoch (loss 1.6874): 31%|β–ˆβ–ˆβ–ˆ | 388/1250 [02:16<04:38, 3.09it/s] Training 1/1 epoch (loss 1.6874): 31%|β–ˆβ–ˆβ–ˆ | 389/1250 [02:16<04:37, 3.10it/s] Training 1/1 epoch (loss 1.6277): 31%|β–ˆβ–ˆβ–ˆ | 389/1250 [02:16<04:37, 3.10it/s] Training 1/1 epoch (loss 1.6277): 31%|β–ˆβ–ˆβ–ˆ | 390/1250 [02:16<04:32, 3.15it/s] Training 1/1 epoch (loss 1.5674): 31%|β–ˆβ–ˆβ–ˆ | 390/1250 [02:17<04:32, 3.15it/s] Training 1/1 epoch (loss 1.5674): 31%|β–ˆβ–ˆβ–ˆβ– | 391/1250 [02:17<04:25, 3.24it/s] Training 1/1 epoch (loss 1.6080): 31%|β–ˆβ–ˆβ–ˆβ– | 391/1250 [02:17<04:25, 3.24it/s] Training 1/1 epoch (loss 1.6080): 31%|β–ˆβ–ˆβ–ˆβ– | 392/1250 [02:17<04:30, 3.17it/s] Training 1/1 epoch (loss 1.5808): 31%|β–ˆβ–ˆβ–ˆβ– | 392/1250 [02:17<04:30, 3.17it/s] Training 1/1 epoch (loss 1.5808): 31%|β–ˆβ–ˆβ–ˆβ– | 393/1250 [02:17<04:32, 3.15it/s] Training 1/1 epoch (loss 1.6048): 31%|β–ˆβ–ˆβ–ˆβ– | 393/1250 [02:18<04:32, 3.15it/s] Training 1/1 epoch (loss 1.6048): 32%|β–ˆβ–ˆβ–ˆβ– | 394/1250 [02:18<04:35, 3.11it/s] Training 1/1 epoch (loss 1.4598): 32%|β–ˆβ–ˆβ–ˆβ– | 394/1250 [02:18<04:35, 3.11it/s] Training 1/1 epoch (loss 1.4598): 32%|β–ˆβ–ˆβ–ˆβ– | 395/1250 [02:18<04:48, 2.96it/s] Training 1/1 epoch (loss 1.5735): 32%|β–ˆβ–ˆβ–ˆβ– | 395/1250 [02:18<04:48, 2.96it/s] Training 1/1 epoch (loss 1.5735): 32%|β–ˆβ–ˆβ–ˆβ– | 396/1250 [02:18<04:46, 2.98it/s] Training 1/1 epoch (loss 1.6423): 32%|β–ˆβ–ˆβ–ˆβ– | 396/1250 [02:19<04:46, 2.98it/s] Training 1/1 epoch (loss 1.6423): 32%|β–ˆβ–ˆβ–ˆβ– | 397/1250 [02:19<04:50, 2.94it/s] Training 1/1 epoch (loss 1.5781): 32%|β–ˆβ–ˆβ–ˆβ– | 397/1250 [02:19<04:50, 2.94it/s] Training 1/1 epoch (loss 1.5781): 32%|β–ˆβ–ˆβ–ˆβ– | 398/1250 [02:19<04:36, 3.08it/s] Training 1/1 epoch (loss 1.5440): 32%|β–ˆβ–ˆβ–ˆβ– | 398/1250 [02:19<04:36, 3.08it/s] Training 1/1 epoch (loss 1.5440): 32%|β–ˆβ–ˆβ–ˆβ– | 399/1250 [02:19<04:38, 3.06it/s] Training 1/1 epoch (loss 1.4194): 32%|β–ˆβ–ˆβ–ˆβ– | 399/1250 [02:20<04:38, 3.06it/s] Training 1/1 epoch (loss 1.4194): 32%|β–ˆβ–ˆβ–ˆβ– | 400/1250 [02:20<04:59, 2.84it/s] Training 1/1 epoch (loss 1.5606): 32%|β–ˆβ–ˆβ–ˆβ– | 400/1250 [02:20<04:59, 2.84it/s] Training 1/1 epoch (loss 1.5606): 32%|β–ˆβ–ˆβ–ˆβ– | 401/1250 [02:20<04:56, 2.87it/s] Training 1/1 epoch (loss 1.5525): 32%|β–ˆβ–ˆβ–ˆβ– | 401/1250 [02:20<04:56, 2.87it/s] Training 1/1 epoch (loss 1.5525): 32%|β–ˆβ–ˆβ–ˆβ– | 402/1250 [02:20<04:41, 3.01it/s] Training 1/1 epoch (loss 1.5812): 32%|β–ˆβ–ˆβ–ˆβ– | 402/1250 [02:21<04:41, 3.01it/s] Training 1/1 epoch (loss 1.5812): 32%|β–ˆβ–ˆβ–ˆβ– | 403/1250 [02:21<04:34, 3.09it/s] Training 1/1 epoch (loss 1.5651): 32%|β–ˆβ–ˆβ–ˆβ– | 403/1250 [02:21<04:34, 3.09it/s] Training 1/1 epoch (loss 1.5651): 32%|β–ˆβ–ˆβ–ˆβ– | 404/1250 [02:21<04:30, 3.12it/s] Training 1/1 epoch (loss 1.5520): 32%|β–ˆβ–ˆβ–ˆβ– | 404/1250 [02:21<04:30, 3.12it/s] Training 1/1 epoch (loss 1.5520): 32%|β–ˆβ–ˆβ–ˆβ– | 405/1250 [02:21<04:28, 3.15it/s] Training 1/1 epoch (loss 1.5062): 32%|β–ˆβ–ˆβ–ˆβ– | 405/1250 [02:22<04:28, 3.15it/s] Training 1/1 epoch (loss 1.5062): 32%|β–ˆβ–ˆβ–ˆβ– | 406/1250 [02:22<04:30, 3.13it/s] Training 1/1 epoch (loss 1.6264): 32%|β–ˆβ–ˆβ–ˆβ– | 406/1250 [02:22<04:30, 3.13it/s] Training 1/1 epoch (loss 1.6264): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 407/1250 [02:22<04:34, 3.07it/s] Training 1/1 epoch (loss 1.7141): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 407/1250 [02:22<04:34, 3.07it/s] Training 1/1 epoch (loss 1.7141): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 408/1250 [02:22<04:37, 3.03it/s] Training 1/1 epoch (loss 1.4806): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 408/1250 [02:23<04:37, 3.03it/s] Training 1/1 epoch (loss 1.4806): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 409/1250 [02:23<04:32, 3.09it/s] Training 1/1 epoch (loss 1.5548): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 409/1250 [02:23<04:32, 3.09it/s] Training 1/1 epoch (loss 1.5548): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 410/1250 [02:23<04:29, 3.12it/s] Training 1/1 epoch (loss 1.6440): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 410/1250 [02:23<04:29, 3.12it/s] Training 1/1 epoch (loss 1.6440): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 411/1250 [02:23<04:23, 3.18it/s] Training 1/1 epoch (loss 1.6685): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 411/1250 [02:24<04:23, 3.18it/s] Training 1/1 epoch (loss 1.6685): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 412/1250 [02:24<04:26, 3.14it/s] Training 1/1 epoch (loss 1.6321): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 412/1250 [02:24<04:26, 3.14it/s] Training 1/1 epoch (loss 1.6321): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 413/1250 [02:24<04:36, 3.03it/s] Training 1/1 epoch (loss 1.6345): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 413/1250 [02:24<04:36, 3.03it/s] Training 1/1 epoch (loss 1.6345): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 414/1250 [02:24<04:29, 3.11it/s] Training 1/1 epoch (loss 1.5570): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 414/1250 [02:25<04:29, 3.11it/s] Training 1/1 epoch (loss 1.5570): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 415/1250 [02:25<04:27, 3.12it/s] Training 1/1 epoch (loss 1.5400): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 415/1250 [02:25<04:27, 3.12it/s] Training 1/1 epoch (loss 1.5400): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 416/1250 [02:25<04:26, 3.13it/s] Training 1/1 epoch (loss 1.5810): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 416/1250 [02:25<04:26, 3.13it/s] Training 1/1 epoch (loss 1.5810): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 417/1250 [02:25<04:24, 3.15it/s] Training 1/1 epoch (loss 1.4769): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 417/1250 [02:26<04:24, 3.15it/s] Training 1/1 epoch (loss 1.4769): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 418/1250 [02:26<04:20, 3.19it/s] Training 1/1 epoch (loss 1.6649): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 418/1250 [02:26<04:20, 3.19it/s] Training 1/1 epoch (loss 1.6649): 34%|β–ˆβ–ˆβ–ˆβ–Ž | 419/1250 [02:26<04:20, 3.19it/s] Training 1/1 epoch (loss 1.5848): 34%|β–ˆβ–ˆβ–ˆβ–Ž | 419/1250 [02:26<04:20, 3.19it/s] Training 1/1 epoch (loss 1.5848): 34%|β–ˆβ–ˆβ–ˆβ–Ž | 420/1250 [02:26<04:30, 3.07it/s] Training 1/1 epoch (loss 1.6661): 34%|β–ˆβ–ˆβ–ˆβ–Ž | 420/1250 [02:27<04:30, 3.07it/s] Training 1/1 epoch (loss 1.6661): 34%|β–ˆβ–ˆβ–ˆβ–Ž | 421/1250 [02:27<04:22, 3.16it/s] Training 1/1 epoch (loss 1.7186): 34%|β–ˆβ–ˆβ–ˆβ–Ž | 421/1250 [02:27<04:22, 3.16it/s] Training 1/1 epoch (loss 1.7186): 34%|β–ˆβ–ˆβ–ˆβ– | 422/1250 [02:27<04:15, 3.24it/s] Training 1/1 epoch (loss 1.6724): 34%|β–ˆβ–ˆβ–ˆβ– | 422/1250 [02:27<04:15, 3.24it/s] Training 1/1 epoch (loss 1.6724): 34%|β–ˆβ–ˆβ–ˆβ– | 423/1250 [02:27<04:40, 2.94it/s] Training 1/1 epoch (loss 1.6397): 34%|β–ˆβ–ˆβ–ˆβ– | 423/1250 [02:28<04:40, 2.94it/s] Training 1/1 epoch (loss 1.6397): 34%|β–ˆβ–ˆβ–ˆβ– | 424/1250 [02:28<04:39, 2.96it/s] Training 1/1 epoch (loss 1.7289): 34%|β–ˆβ–ˆβ–ˆβ– | 424/1250 [02:28<04:39, 2.96it/s] Training 1/1 epoch (loss 1.7289): 34%|β–ˆβ–ˆβ–ˆβ– | 425/1250 [02:28<04:34, 3.01it/s] Training 1/1 epoch (loss 1.7364): 34%|β–ˆβ–ˆβ–ˆβ– | 425/1250 [02:28<04:34, 3.01it/s] Training 1/1 epoch (loss 1.7364): 34%|β–ˆβ–ˆβ–ˆβ– | 426/1250 [02:28<05:01, 2.73it/s] Training 1/1 epoch (loss 1.5318): 34%|β–ˆβ–ˆβ–ˆβ– | 426/1250 [02:29<05:01, 2.73it/s] Training 1/1 epoch (loss 1.5318): 34%|β–ˆβ–ˆβ–ˆβ– | 427/1250 [02:29<04:47, 2.86it/s] Training 1/1 epoch (loss 1.6076): 34%|β–ˆβ–ˆβ–ˆβ– | 427/1250 [02:29<04:47, 2.86it/s] Training 1/1 epoch (loss 1.6076): 34%|β–ˆβ–ˆβ–ˆβ– | 428/1250 [02:29<04:34, 2.99it/s] Training 1/1 epoch (loss 1.5975): 34%|β–ˆβ–ˆβ–ˆβ– | 428/1250 [02:29<04:34, 2.99it/s] Training 1/1 epoch (loss 1.5975): 34%|β–ˆβ–ˆβ–ˆβ– | 429/1250 [02:29<04:41, 2.91it/s] Training 1/1 epoch (loss 1.6277): 34%|β–ˆβ–ˆβ–ˆβ– | 429/1250 [02:30<04:41, 2.91it/s] Training 1/1 epoch (loss 1.6277): 34%|β–ˆβ–ˆβ–ˆβ– | 430/1250 [02:30<04:49, 2.83it/s] Training 1/1 epoch (loss 1.5431): 34%|β–ˆβ–ˆβ–ˆβ– | 430/1250 [02:30<04:49, 2.83it/s] Training 1/1 epoch (loss 1.5431): 34%|β–ˆβ–ˆβ–ˆβ– | 431/1250 [02:30<04:44, 2.87it/s] Training 1/1 epoch (loss 1.4651): 34%|β–ˆβ–ˆβ–ˆβ– | 431/1250 [02:30<04:44, 2.87it/s] Training 1/1 epoch (loss 1.4651): 35%|β–ˆβ–ˆβ–ˆβ– | 432/1250 [02:30<04:39, 2.93it/s] Training 1/1 epoch (loss 1.6423): 35%|β–ˆβ–ˆβ–ˆβ– | 432/1250 [02:31<04:39, 2.93it/s] Training 1/1 epoch (loss 1.6423): 35%|β–ˆβ–ˆβ–ˆβ– | 433/1250 [02:31<04:32, 2.99it/s] Training 1/1 epoch (loss 1.5034): 35%|β–ˆβ–ˆβ–ˆβ– | 433/1250 [02:31<04:32, 2.99it/s] Training 1/1 epoch (loss 1.5034): 35%|β–ˆβ–ˆβ–ˆβ– | 434/1250 [02:31<04:22, 3.11it/s] Training 1/1 epoch (loss 1.6181): 35%|β–ˆβ–ˆβ–ˆβ– | 434/1250 [02:31<04:22, 3.11it/s] Training 1/1 epoch (loss 1.6181): 35%|β–ˆβ–ˆβ–ˆβ– | 435/1250 [02:31<04:19, 3.13it/s] Training 1/1 epoch (loss 1.6415): 35%|β–ˆβ–ˆβ–ˆβ– | 435/1250 [02:32<04:19, 3.13it/s] Training 1/1 epoch (loss 1.6415): 35%|β–ˆβ–ˆβ–ˆβ– | 436/1250 [02:32<04:22, 3.10it/s] Training 1/1 epoch (loss 1.4412): 35%|β–ˆβ–ˆβ–ˆβ– | 436/1250 [02:32<04:22, 3.10it/s] Training 1/1 epoch (loss 1.4412): 35%|β–ˆβ–ˆβ–ˆβ– | 437/1250 [02:32<04:19, 3.14it/s] Training 1/1 epoch (loss 1.4939): 35%|β–ˆβ–ˆβ–ˆβ– | 437/1250 [02:32<04:19, 3.14it/s] Training 1/1 epoch (loss 1.4939): 35%|β–ˆβ–ˆβ–ˆβ–Œ | 438/1250 [02:32<04:15, 3.18it/s] Training 1/1 epoch (loss 1.6320): 35%|β–ˆβ–ˆβ–ˆβ–Œ | 438/1250 [02:33<04:15, 3.18it/s] Training 1/1 epoch (loss 1.6320): 35%|β–ˆβ–ˆβ–ˆβ–Œ | 439/1250 [02:33<04:15, 3.18it/s] Training 1/1 epoch (loss 1.6461): 35%|β–ˆβ–ˆβ–ˆβ–Œ | 439/1250 [02:33<04:15, 3.18it/s] Training 1/1 epoch (loss 1.6461): 35%|β–ˆβ–ˆβ–ˆβ–Œ | 440/1250 [02:33<04:17, 3.15it/s] Training 1/1 epoch (loss 1.4778): 35%|β–ˆβ–ˆβ–ˆβ–Œ | 440/1250 [02:33<04:17, 3.15it/s] Training 1/1 epoch (loss 1.4778): 35%|β–ˆβ–ˆβ–ˆβ–Œ | 441/1250 [02:33<04:18, 3.13it/s] Training 1/1 epoch (loss 1.6374): 35%|β–ˆβ–ˆβ–ˆβ–Œ | 441/1250 [02:34<04:18, 3.13it/s] Training 1/1 epoch (loss 1.6374): 35%|β–ˆβ–ˆβ–ˆβ–Œ | 442/1250 [02:34<04:17, 3.13it/s] Training 1/1 epoch (loss 1.4930): 35%|β–ˆβ–ˆβ–ˆβ–Œ | 442/1250 [02:34<04:17, 3.13it/s] Training 1/1 epoch (loss 1.4930): 35%|β–ˆβ–ˆβ–ˆβ–Œ | 443/1250 [02:34<04:21, 3.09it/s] Training 1/1 epoch (loss 1.6086): 35%|β–ˆβ–ˆβ–ˆβ–Œ | 443/1250 [02:35<04:21, 3.09it/s] Training 1/1 epoch (loss 1.6086): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 444/1250 [02:35<06:03, 2.22it/s] Training 1/1 epoch (loss 1.6976): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 444/1250 [02:35<06:03, 2.22it/s] Training 1/1 epoch (loss 1.6976): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 445/1250 [02:35<05:25, 2.47it/s] Training 1/1 epoch (loss 1.5860): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 445/1250 [02:35<05:25, 2.47it/s] Training 1/1 epoch (loss 1.5860): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 446/1250 [02:35<05:02, 2.66it/s] Training 1/1 epoch (loss 1.5478): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 446/1250 [02:36<05:02, 2.66it/s] Training 1/1 epoch (loss 1.5478): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 447/1250 [02:36<04:54, 2.73it/s] Training 1/1 epoch (loss 1.5977): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 447/1250 [02:36<04:54, 2.73it/s] Training 1/1 epoch (loss 1.5977): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 448/1250 [02:36<04:48, 2.78it/s] Training 1/1 epoch (loss 1.5446): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 448/1250 [02:36<04:48, 2.78it/s] Training 1/1 epoch (loss 1.5446): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 449/1250 [02:36<04:49, 2.77it/s] Training 1/1 epoch (loss 1.6645): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 449/1250 [02:37<04:49, 2.77it/s] Training 1/1 epoch (loss 1.6645): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 450/1250 [02:37<04:33, 2.93it/s] Training 1/1 epoch (loss 1.6707): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 450/1250 [02:37<04:33, 2.93it/s] Training 1/1 epoch (loss 1.6707): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 451/1250 [02:37<04:29, 2.96it/s] Training 1/1 epoch (loss 1.6320): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 451/1250 [02:37<04:29, 2.96it/s] Training 1/1 epoch (loss 1.6320): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 452/1250 [02:37<04:21, 3.05it/s] Training 1/1 epoch (loss 1.6627): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 452/1250 [02:37<04:21, 3.05it/s] Training 1/1 epoch (loss 1.6627): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 453/1250 [02:37<04:19, 3.07it/s] Training 1/1 epoch (loss 1.6500): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 453/1250 [02:38<04:19, 3.07it/s] Training 1/1 epoch (loss 1.6500): 36%|β–ˆβ–ˆβ–ˆβ–‹ | 454/1250 [02:38<04:24, 3.01it/s] Training 1/1 epoch (loss 1.6804): 36%|β–ˆβ–ˆβ–ˆβ–‹ | 454/1250 [02:38<04:24, 3.01it/s] Training 1/1 epoch (loss 1.6804): 36%|β–ˆβ–ˆβ–ˆβ–‹ | 455/1250 [02:38<04:24, 3.00it/s] Training 1/1 epoch (loss 1.5469): 36%|β–ˆβ–ˆβ–ˆβ–‹ | 455/1250 [02:39<04:24, 3.00it/s] Training 1/1 epoch (loss 1.5469): 36%|β–ˆβ–ˆβ–ˆβ–‹ | 456/1250 [02:39<04:23, 3.01it/s] Training 1/1 epoch (loss 1.5104): 36%|β–ˆβ–ˆβ–ˆβ–‹ | 456/1250 [02:39<04:23, 3.01it/s] Training 1/1 epoch (loss 1.5104): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 457/1250 [02:39<04:16, 3.09it/s] Training 1/1 epoch (loss 1.5762): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 457/1250 [02:39<04:16, 3.09it/s] Training 1/1 epoch (loss 1.5762): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 458/1250 [02:39<04:14, 3.11it/s] Training 1/1 epoch (loss 1.6359): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 458/1250 [02:39<04:14, 3.11it/s] Training 1/1 epoch (loss 1.6359): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 459/1250 [02:39<04:15, 3.10it/s] Training 1/1 epoch (loss 1.5895): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 459/1250 [02:40<04:15, 3.10it/s] Training 1/1 epoch (loss 1.5895): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 460/1250 [02:40<04:19, 3.05it/s] Training 1/1 epoch (loss 1.5720): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 460/1250 [02:40<04:19, 3.05it/s] Training 1/1 epoch (loss 1.5720): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 461/1250 [02:40<04:37, 2.84it/s] Training 1/1 epoch (loss 1.5889): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 461/1250 [02:41<04:37, 2.84it/s] Training 1/1 epoch (loss 1.5889): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 462/1250 [02:41<04:28, 2.93it/s] Training 1/1 epoch (loss 1.5692): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 462/1250 [02:41<04:28, 2.93it/s] Training 1/1 epoch (loss 1.5692): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 463/1250 [02:41<04:21, 3.01it/s] Training 1/1 epoch (loss 1.5641): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 463/1250 [02:41<04:21, 3.01it/s] Training 1/1 epoch (loss 1.5641): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 464/1250 [02:41<04:17, 3.05it/s] Training 1/1 epoch (loss 1.5356): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 464/1250 [02:41<04:17, 3.05it/s] Training 1/1 epoch (loss 1.5356): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 465/1250 [02:41<04:19, 3.03it/s] Training 1/1 epoch (loss 1.5776): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 465/1250 [02:42<04:19, 3.03it/s] Training 1/1 epoch (loss 1.5776): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 466/1250 [02:42<04:14, 3.08it/s] Training 1/1 epoch (loss 1.5931): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 466/1250 [02:42<04:14, 3.08it/s] Training 1/1 epoch (loss 1.5931): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 467/1250 [02:42<04:17, 3.04it/s] Training 1/1 epoch (loss 1.5348): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 467/1250 [02:42<04:17, 3.04it/s] Training 1/1 epoch (loss 1.5348): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 468/1250 [02:42<04:13, 3.09it/s] Training 1/1 epoch (loss 1.5332): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 468/1250 [02:43<04:13, 3.09it/s] Training 1/1 epoch (loss 1.5332): 38%|β–ˆβ–ˆβ–ˆβ–Š | 469/1250 [02:43<04:09, 3.13it/s] Training 1/1 epoch (loss 1.4508): 38%|β–ˆβ–ˆβ–ˆβ–Š | 469/1250 [02:43<04:09, 3.13it/s] Training 1/1 epoch (loss 1.4508): 38%|β–ˆβ–ˆβ–ˆβ–Š | 470/1250 [02:43<04:04, 3.19it/s] Training 1/1 epoch (loss 1.4672): 38%|β–ˆβ–ˆβ–ˆβ–Š | 470/1250 [02:43<04:04, 3.19it/s] Training 1/1 epoch (loss 1.4672): 38%|β–ˆβ–ˆβ–ˆβ–Š | 471/1250 [02:43<04:09, 3.12it/s] Training 1/1 epoch (loss 1.6180): 38%|β–ˆβ–ˆβ–ˆβ–Š | 471/1250 [02:44<04:09, 3.12it/s] Training 1/1 epoch (loss 1.6180): 38%|β–ˆβ–ˆβ–ˆβ–Š | 472/1250 [02:44<04:13, 3.07it/s] Training 1/1 epoch (loss 1.5695): 38%|β–ˆβ–ˆβ–ˆβ–Š | 472/1250 [02:44<04:13, 3.07it/s] Training 1/1 epoch (loss 1.5695): 38%|β–ˆβ–ˆβ–ˆβ–Š | 473/1250 [02:44<04:15, 3.04it/s] Training 1/1 epoch (loss 1.5001): 38%|β–ˆβ–ˆβ–ˆβ–Š | 473/1250 [02:44<04:15, 3.04it/s] Training 1/1 epoch (loss 1.5001): 38%|β–ˆβ–ˆβ–ˆβ–Š | 474/1250 [02:44<04:16, 3.02it/s] Training 1/1 epoch (loss 1.4415): 38%|β–ˆβ–ˆβ–ˆβ–Š | 474/1250 [02:45<04:16, 3.02it/s] Training 1/1 epoch (loss 1.4415): 38%|β–ˆβ–ˆβ–ˆβ–Š | 475/1250 [02:45<04:30, 2.86it/s] Training 1/1 epoch (loss 1.5828): 38%|β–ˆβ–ˆβ–ˆβ–Š | 475/1250 [02:45<04:30, 2.86it/s] Training 1/1 epoch (loss 1.5828): 38%|β–ˆβ–ˆβ–ˆβ–Š | 476/1250 [02:45<04:29, 2.87it/s] Training 1/1 epoch (loss 1.5718): 38%|β–ˆβ–ˆβ–ˆβ–Š | 476/1250 [02:46<04:29, 2.87it/s] Training 1/1 epoch (loss 1.5718): 38%|β–ˆβ–ˆβ–ˆβ–Š | 477/1250 [02:46<04:34, 2.82it/s] Training 1/1 epoch (loss 1.6308): 38%|β–ˆβ–ˆβ–ˆβ–Š | 477/1250 [02:46<04:34, 2.82it/s] Training 1/1 epoch (loss 1.6308): 38%|β–ˆβ–ˆβ–ˆβ–Š | 478/1250 [02:46<04:26, 2.89it/s] Training 1/1 epoch (loss 1.5572): 38%|β–ˆβ–ˆβ–ˆβ–Š | 478/1250 [02:46<04:26, 2.89it/s] Training 1/1 epoch (loss 1.5572): 38%|β–ˆβ–ˆβ–ˆβ–Š | 479/1250 [02:46<04:30, 2.85it/s] Training 1/1 epoch (loss 1.5445): 38%|β–ˆβ–ˆβ–ˆβ–Š | 479/1250 [02:47<04:30, 2.85it/s] Training 1/1 epoch (loss 1.5445): 38%|β–ˆβ–ˆβ–ˆβ–Š | 480/1250 [02:47<04:24, 2.91it/s] Training 1/1 epoch (loss 1.4971): 38%|β–ˆβ–ˆβ–ˆβ–Š | 480/1250 [02:47<04:24, 2.91it/s] Training 1/1 epoch (loss 1.4971): 38%|β–ˆβ–ˆβ–ˆβ–Š | 481/1250 [02:47<04:16, 3.00it/s] Training 1/1 epoch (loss 1.4669): 38%|β–ˆβ–ˆβ–ˆβ–Š | 481/1250 [02:47<04:16, 3.00it/s] Training 1/1 epoch (loss 1.4669): 39%|β–ˆβ–ˆβ–ˆβ–Š | 482/1250 [02:47<04:10, 3.06it/s] Training 1/1 epoch (loss 1.5631): 39%|β–ˆβ–ˆβ–ˆβ–Š | 482/1250 [02:47<04:10, 3.06it/s] Training 1/1 epoch (loss 1.5631): 39%|β–ˆβ–ˆβ–ˆβ–Š | 483/1250 [02:47<04:10, 3.06it/s] Training 1/1 epoch (loss 1.5626): 39%|β–ˆβ–ˆβ–ˆβ–Š | 483/1250 [02:48<04:10, 3.06it/s] Training 1/1 epoch (loss 1.5626): 39%|β–ˆβ–ˆβ–ˆβ–Š | 484/1250 [02:48<04:10, 3.06it/s] Training 1/1 epoch (loss 1.6418): 39%|β–ˆβ–ˆβ–ˆβ–Š | 484/1250 [02:48<04:10, 3.06it/s] Training 1/1 epoch (loss 1.6418): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 485/1250 [02:48<04:13, 3.01it/s] Training 1/1 epoch (loss 1.5728): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 485/1250 [02:48<04:13, 3.01it/s] Training 1/1 epoch (loss 1.5728): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 486/1250 [02:48<04:07, 3.09it/s] Training 1/1 epoch (loss 1.4971): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 486/1250 [02:49<04:07, 3.09it/s] Training 1/1 epoch (loss 1.4971): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 487/1250 [02:49<04:04, 3.12it/s] Training 1/1 epoch (loss 1.6212): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 487/1250 [02:49<04:04, 3.12it/s] Training 1/1 epoch (loss 1.6212): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 488/1250 [02:49<04:12, 3.02it/s] Training 1/1 epoch (loss 1.4947): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 488/1250 [02:49<04:12, 3.02it/s] Training 1/1 epoch (loss 1.4947): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 489/1250 [02:49<04:11, 3.02it/s] Training 1/1 epoch (loss 1.6791): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 489/1250 [02:50<04:11, 3.02it/s] Training 1/1 epoch (loss 1.6791): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 490/1250 [02:50<04:08, 3.05it/s] Training 1/1 epoch (loss 1.5923): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 490/1250 [02:50<04:08, 3.05it/s] Training 1/1 epoch (loss 1.5923): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 491/1250 [02:50<04:50, 2.61it/s] Training 1/1 epoch (loss 1.5701): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 491/1250 [02:51<04:50, 2.61it/s] Training 1/1 epoch (loss 1.5701): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 492/1250 [02:51<04:45, 2.66it/s] Training 1/1 epoch (loss 1.5429): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 492/1250 [02:51<04:45, 2.66it/s] Training 1/1 epoch (loss 1.5429): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 493/1250 [02:51<04:40, 2.70it/s] Training 1/1 epoch (loss 1.5094): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 493/1250 [02:51<04:40, 2.70it/s] Training 1/1 epoch (loss 1.5094): 40%|β–ˆβ–ˆβ–ˆβ–‰ | 494/1250 [02:51<04:32, 2.77it/s] Training 1/1 epoch (loss 1.5811): 40%|β–ˆβ–ˆβ–ˆβ–‰ | 494/1250 [02:52<04:32, 2.77it/s] Training 1/1 epoch (loss 1.5811): 40%|β–ˆβ–ˆβ–ˆβ–‰ | 495/1250 [02:52<04:32, 2.77it/s] Training 1/1 epoch (loss 1.6165): 40%|β–ˆβ–ˆβ–ˆβ–‰ | 495/1250 [02:52<04:32, 2.77it/s] Training 1/1 epoch (loss 1.6165): 40%|β–ˆβ–ˆβ–ˆβ–‰ | 496/1250 [02:52<04:42, 2.67it/s] Training 1/1 epoch (loss 1.5320): 40%|β–ˆβ–ˆβ–ˆβ–‰ | 496/1250 [02:52<04:42, 2.67it/s] Training 1/1 epoch (loss 1.5320): 40%|β–ˆβ–ˆβ–ˆβ–‰ | 497/1250 [02:52<04:34, 2.74it/s] Training 1/1 epoch (loss 1.5143): 40%|β–ˆβ–ˆβ–ˆβ–‰ | 497/1250 [02:53<04:34, 2.74it/s] Training 1/1 epoch (loss 1.5143): 40%|β–ˆβ–ˆβ–ˆβ–‰ | 498/1250 [02:53<04:19, 2.90it/s] Training 1/1 epoch (loss 1.6379): 40%|β–ˆβ–ˆβ–ˆβ–‰ | 498/1250 [02:53<04:19, 2.90it/s] Training 1/1 epoch (loss 1.6379): 40%|β–ˆβ–ˆβ–ˆβ–‰ | 499/1250 [02:53<04:17, 2.92it/s] Training 1/1 epoch (loss 1.5052): 40%|β–ˆβ–ˆβ–ˆβ–‰ | 499/1250 [02:53<04:17, 2.92it/s] Training 1/1 epoch (loss 1.5052): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 500/1250 [02:53<04:07, 3.03it/s] Training 1/1 epoch (loss 1.5211): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 500/1250 [02:54<04:07, 3.03it/s] Training 1/1 epoch (loss 1.5211): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 501/1250 [02:54<04:03, 3.08it/s] Training 1/1 epoch (loss 1.5569): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 501/1250 [02:54<04:03, 3.08it/s] Training 1/1 epoch (loss 1.5569): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 502/1250 [02:54<04:01, 3.10it/s] Training 1/1 epoch (loss 1.4826): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 502/1250 [02:54<04:01, 3.10it/s] Training 1/1 epoch (loss 1.4826): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 503/1250 [02:54<04:04, 3.06it/s] Training 1/1 epoch (loss 1.6105): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 503/1250 [02:55<04:04, 3.06it/s] Training 1/1 epoch (loss 1.6105): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 504/1250 [02:55<04:08, 3.00it/s] Training 1/1 epoch (loss 1.5121): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 504/1250 [02:55<04:08, 3.00it/s] Training 1/1 epoch (loss 1.5121): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 505/1250 [02:55<03:59, 3.11it/s] Training 1/1 epoch (loss 1.4784): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 505/1250 [02:55<03:59, 3.11it/s] Training 1/1 epoch (loss 1.4784): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 506/1250 [02:55<04:03, 3.05it/s] Training 1/1 epoch (loss 1.5300): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 506/1250 [02:56<04:03, 3.05it/s] Training 1/1 epoch (loss 1.5300): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 507/1250 [02:56<04:05, 3.02it/s] Training 1/1 epoch (loss 1.5866): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 507/1250 [02:56<04:05, 3.02it/s] Training 1/1 epoch (loss 1.5866): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 508/1250 [02:56<04:02, 3.05it/s] Training 1/1 epoch (loss 1.5370): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 508/1250 [02:56<04:02, 3.05it/s] Training 1/1 epoch (loss 1.5370): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 509/1250 [02:56<04:06, 3.01it/s] Training 1/1 epoch (loss 1.6328): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 509/1250 [02:57<04:06, 3.01it/s] Training 1/1 epoch (loss 1.6328): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 510/1250 [02:57<03:59, 3.09it/s] Training 1/1 epoch (loss 1.6216): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 510/1250 [02:57<03:59, 3.09it/s] Training 1/1 epoch (loss 1.6216): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 511/1250 [02:57<03:54, 3.15it/s] Training 1/1 epoch (loss 1.6231): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 511/1250 [02:57<03:54, 3.15it/s] Training 1/1 epoch (loss 1.6231): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 512/1250 [02:57<03:55, 3.13it/s] Training 1/1 epoch (loss 1.5906): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 512/1250 [02:58<03:55, 3.13it/s] Training 1/1 epoch (loss 1.5906): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 513/1250 [02:58<04:06, 2.99it/s] Training 1/1 epoch (loss 1.6143): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 513/1250 [02:58<04:06, 2.99it/s] Training 1/1 epoch (loss 1.6143): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 514/1250 [02:58<03:59, 3.07it/s] Training 1/1 epoch (loss 1.5638): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 514/1250 [02:58<03:59, 3.07it/s] Training 1/1 epoch (loss 1.5638): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 515/1250 [02:58<04:04, 3.01it/s] Training 1/1 epoch (loss 1.4185): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 515/1250 [02:59<04:04, 3.01it/s] Training 1/1 epoch (loss 1.4185): 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 516/1250 [02:59<03:57, 3.09it/s] Training 1/1 epoch (loss 1.4724): 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 516/1250 [02:59<03:57, 3.09it/s] Training 1/1 epoch (loss 1.4724): 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 517/1250 [02:59<03:53, 3.13it/s] Training 1/1 epoch (loss 1.6071): 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 517/1250 [02:59<03:53, 3.13it/s] Training 1/1 epoch (loss 1.6071): 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 518/1250 [02:59<03:49, 3.19it/s] Training 1/1 epoch (loss 1.6825): 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 518/1250 [03:00<03:49, 3.19it/s] Training 1/1 epoch (loss 1.6825): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 519/1250 [03:00<04:12, 2.89it/s] Training 1/1 epoch (loss 1.5490): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 519/1250 [03:00<04:12, 2.89it/s] Training 1/1 epoch (loss 1.5490): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 520/1250 [03:00<04:06, 2.96it/s] Training 1/1 epoch (loss 1.6199): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 520/1250 [03:00<04:06, 2.96it/s] Training 1/1 epoch (loss 1.6199): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 521/1250 [03:00<04:06, 2.95it/s] Training 1/1 epoch (loss 1.5604): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 521/1250 [03:01<04:06, 2.95it/s] Training 1/1 epoch (loss 1.5604): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 522/1250 [03:01<04:12, 2.88it/s] Training 1/1 epoch (loss 1.6412): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 522/1250 [03:01<04:12, 2.88it/s] Training 1/1 epoch (loss 1.6412): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 523/1250 [03:01<04:26, 2.73it/s] Training 1/1 epoch (loss 1.4999): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 523/1250 [03:01<04:26, 2.73it/s] Training 1/1 epoch (loss 1.4999): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 524/1250 [03:01<04:30, 2.69it/s] Training 1/1 epoch (loss 1.5667): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 524/1250 [03:02<04:30, 2.69it/s] Training 1/1 epoch (loss 1.5667): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 525/1250 [03:02<04:41, 2.57it/s] Training 1/1 epoch (loss 1.6243): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 525/1250 [03:02<04:41, 2.57it/s] Training 1/1 epoch (loss 1.6243): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 526/1250 [03:02<04:55, 2.45it/s] Training 1/1 epoch (loss 1.5905): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 526/1250 [03:03<04:55, 2.45it/s] Training 1/1 epoch (loss 1.5905): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 527/1250 [03:03<04:54, 2.46it/s] Training 1/1 epoch (loss 1.6059): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 527/1250 [03:03<04:54, 2.46it/s] Training 1/1 epoch (loss 1.6059): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 528/1250 [03:03<04:57, 2.43it/s] Training 1/1 epoch (loss 1.5396): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 528/1250 [03:04<04:57, 2.43it/s] Training 1/1 epoch (loss 1.5396): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 529/1250 [03:04<05:00, 2.40it/s] Training 1/1 epoch (loss 1.6206): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 529/1250 [03:04<05:00, 2.40it/s] Training 1/1 epoch (loss 1.6206): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 530/1250 [03:04<05:05, 2.36it/s] Training 1/1 epoch (loss 1.5264): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 530/1250 [03:04<05:05, 2.36it/s] Training 1/1 epoch (loss 1.5264): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 531/1250 [03:04<05:04, 2.36it/s] Training 1/1 epoch (loss 1.4780): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 531/1250 [03:05<05:04, 2.36it/s] Training 1/1 epoch (loss 1.4780): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 532/1250 [03:05<05:02, 2.38it/s] Training 1/1 epoch (loss 1.4864): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 532/1250 [03:05<05:02, 2.38it/s] Training 1/1 epoch (loss 1.4864): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 533/1250 [03:05<04:57, 2.41it/s] Training 1/1 epoch (loss 1.6297): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 533/1250 [03:06<04:57, 2.41it/s] Training 1/1 epoch (loss 1.6297): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 534/1250 [03:06<04:53, 2.44it/s] Training 1/1 epoch (loss 1.5705): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 534/1250 [03:06<04:53, 2.44it/s] Training 1/1 epoch (loss 1.5705): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 535/1250 [03:06<05:04, 2.35it/s] Training 1/1 epoch (loss 1.5919): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 535/1250 [03:07<05:04, 2.35it/s] Training 1/1 epoch (loss 1.5919): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 536/1250 [03:07<05:06, 2.33it/s] Training 1/1 epoch (loss 1.4289): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 536/1250 [03:07<05:06, 2.33it/s] Training 1/1 epoch (loss 1.4289): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 537/1250 [03:07<04:57, 2.39it/s] Training 1/1 epoch (loss 1.5196): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 537/1250 [03:07<04:57, 2.39it/s] Training 1/1 epoch (loss 1.5196): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 538/1250 [03:07<04:38, 2.56it/s] Training 1/1 epoch (loss 1.4980): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 538/1250 [03:08<04:38, 2.56it/s] Training 1/1 epoch (loss 1.4980): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 539/1250 [03:08<04:22, 2.70it/s] Training 1/1 epoch (loss 1.6320): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 539/1250 [03:08<04:22, 2.70it/s] Training 1/1 epoch (loss 1.6320): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 540/1250 [03:08<04:19, 2.74it/s] Training 1/1 epoch (loss 1.5796): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 540/1250 [03:08<04:19, 2.74it/s] Training 1/1 epoch (loss 1.5796): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 541/1250 [03:08<04:20, 2.72it/s] Training 1/1 epoch (loss 1.5933): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 541/1250 [03:09<04:20, 2.72it/s] Training 1/1 epoch (loss 1.5933): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 542/1250 [03:09<04:16, 2.76it/s] Training 1/1 epoch (loss 1.5705): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 542/1250 [03:09<04:16, 2.76it/s] Training 1/1 epoch (loss 1.5705): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 543/1250 [03:09<04:04, 2.89it/s] Training 1/1 epoch (loss 1.6099): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 543/1250 [03:09<04:04, 2.89it/s] Training 1/1 epoch (loss 1.6099): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 544/1250 [03:09<03:58, 2.95it/s] Training 1/1 epoch (loss 1.5870): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 544/1250 [03:10<03:58, 2.95it/s] Training 1/1 epoch (loss 1.5870): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 545/1250 [03:10<04:07, 2.85it/s] Training 1/1 epoch (loss 1.4679): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 545/1250 [03:10<04:07, 2.85it/s] Training 1/1 epoch (loss 1.4679): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 546/1250 [03:10<04:02, 2.90it/s] Training 1/1 epoch (loss 1.5485): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 546/1250 [03:10<04:02, 2.90it/s] Training 1/1 epoch (loss 1.5485): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 547/1250 [03:10<04:00, 2.93it/s] Training 1/1 epoch (loss 1.5745): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 547/1250 [03:11<04:00, 2.93it/s] Training 1/1 epoch (loss 1.5745): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 548/1250 [03:11<03:56, 2.97it/s] Training 1/1 epoch (loss 1.5693): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 548/1250 [03:11<03:56, 2.97it/s] Training 1/1 epoch (loss 1.5693): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 549/1250 [03:11<03:59, 2.93it/s] Training 1/1 epoch (loss 1.4975): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 549/1250 [03:11<03:59, 2.93it/s] Training 1/1 epoch (loss 1.4975): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 550/1250 [03:11<04:00, 2.91it/s] Training 1/1 epoch (loss 1.5892): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 550/1250 [03:12<04:00, 2.91it/s] Training 1/1 epoch (loss 1.5892): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 551/1250 [03:12<04:07, 2.83it/s] Training 1/1 epoch (loss 1.5555): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 551/1250 [03:12<04:07, 2.83it/s] Training 1/1 epoch (loss 1.5555): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 552/1250 [03:12<04:05, 2.85it/s] Training 1/1 epoch (loss 1.3951): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 552/1250 [03:12<04:05, 2.85it/s] Training 1/1 epoch (loss 1.3951): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 553/1250 [03:12<03:57, 2.94it/s] Training 1/1 epoch (loss 1.5651): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 553/1250 [03:13<03:57, 2.94it/s] Training 1/1 epoch (loss 1.5651): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 554/1250 [03:13<03:50, 3.02it/s] Training 1/1 epoch (loss 1.5935): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 554/1250 [03:13<03:50, 3.02it/s] Training 1/1 epoch (loss 1.5935): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 555/1250 [03:13<03:45, 3.08it/s] Training 1/1 epoch (loss 1.6183): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 555/1250 [03:13<03:45, 3.08it/s] Training 1/1 epoch (loss 1.6183): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 556/1250 [03:13<03:38, 3.17it/s] Training 1/1 epoch (loss 1.6070): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 556/1250 [03:14<03:38, 3.17it/s] Training 1/1 epoch (loss 1.6070): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 557/1250 [03:14<03:37, 3.19it/s] Training 1/1 epoch (loss 1.5314): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 557/1250 [03:14<03:37, 3.19it/s] Training 1/1 epoch (loss 1.5314): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 558/1250 [03:14<03:39, 3.16it/s] Training 1/1 epoch (loss 1.5149): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 558/1250 [03:14<03:39, 3.16it/s] Training 1/1 epoch (loss 1.5149): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 559/1250 [03:14<03:42, 3.10it/s] Training 1/1 epoch (loss 1.5681): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 559/1250 [03:15<03:42, 3.10it/s] Training 1/1 epoch (loss 1.5681): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 560/1250 [03:15<03:50, 2.99it/s] Training 1/1 epoch (loss 1.6535): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 560/1250 [03:15<03:50, 2.99it/s] Training 1/1 epoch (loss 1.6535): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 561/1250 [03:15<03:46, 3.04it/s] Training 1/1 epoch (loss 1.5790): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 561/1250 [03:15<03:46, 3.04it/s] Training 1/1 epoch (loss 1.5790): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 562/1250 [03:15<03:40, 3.12it/s] Training 1/1 epoch (loss 1.5462): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 562/1250 [03:16<03:40, 3.12it/s] Training 1/1 epoch (loss 1.5462): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 563/1250 [03:16<03:39, 3.13it/s] Training 1/1 epoch (loss 1.6233): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 563/1250 [03:16<03:39, 3.13it/s] Training 1/1 epoch (loss 1.6233): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 564/1250 [03:16<03:39, 3.13it/s] Training 1/1 epoch (loss 1.5984): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 564/1250 [03:16<03:39, 3.13it/s] Training 1/1 epoch (loss 1.5984): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 565/1250 [03:16<03:42, 3.08it/s] Training 1/1 epoch (loss 1.5777): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 565/1250 [03:17<03:42, 3.08it/s] Training 1/1 epoch (loss 1.5777): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 566/1250 [03:17<03:40, 3.10it/s] Training 1/1 epoch (loss 1.5382): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 566/1250 [03:17<03:40, 3.10it/s] Training 1/1 epoch (loss 1.5382): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 567/1250 [03:17<03:35, 3.18it/s] Training 1/1 epoch (loss 1.5715): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 567/1250 [03:17<03:35, 3.18it/s] Training 1/1 epoch (loss 1.5715): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 568/1250 [03:17<03:40, 3.09it/s] Training 1/1 epoch (loss 1.5353): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 568/1250 [03:18<03:40, 3.09it/s] Training 1/1 epoch (loss 1.5353): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 569/1250 [03:18<03:45, 3.02it/s] Training 1/1 epoch (loss 1.5787): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 569/1250 [03:18<03:45, 3.02it/s] Training 1/1 epoch (loss 1.5787): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 570/1250 [03:18<03:48, 2.98it/s] Training 1/1 epoch (loss 1.4551): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 570/1250 [03:18<03:48, 2.98it/s] Training 1/1 epoch (loss 1.4551): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 571/1250 [03:18<03:43, 3.04it/s] Training 1/1 epoch (loss 1.6713): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 571/1250 [03:19<03:43, 3.04it/s] Training 1/1 epoch (loss 1.6713): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 572/1250 [03:19<03:39, 3.09it/s] Training 1/1 epoch (loss 1.6150): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 572/1250 [03:19<03:39, 3.09it/s] Training 1/1 epoch (loss 1.6150): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 573/1250 [03:19<03:33, 3.18it/s] Training 1/1 epoch (loss 1.6411): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 573/1250 [03:19<03:33, 3.18it/s] Training 1/1 epoch (loss 1.6411): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 574/1250 [03:19<03:33, 3.16it/s] Training 1/1 epoch (loss 1.4893): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 574/1250 [03:19<03:33, 3.16it/s] Training 1/1 epoch (loss 1.4893): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 575/1250 [03:19<03:34, 3.15it/s] Training 1/1 epoch (loss 1.6210): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 575/1250 [03:20<03:34, 3.15it/s] Training 1/1 epoch (loss 1.6210): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 576/1250 [03:20<03:45, 2.99it/s] Training 1/1 epoch (loss 1.6285): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 576/1250 [03:20<03:45, 2.99it/s] Training 1/1 epoch (loss 1.6285): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 577/1250 [03:20<03:50, 2.92it/s] Training 1/1 epoch (loss 1.5206): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 577/1250 [03:21<03:50, 2.92it/s] Training 1/1 epoch (loss 1.5206): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 578/1250 [03:21<03:48, 2.94it/s] Training 1/1 epoch (loss 1.3809): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 578/1250 [03:21<03:48, 2.94it/s] Training 1/1 epoch (loss 1.3809): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 579/1250 [03:21<03:40, 3.04it/s] Training 1/1 epoch (loss 1.5945): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 579/1250 [03:21<03:40, 3.04it/s] Training 1/1 epoch (loss 1.5945): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 580/1250 [03:21<03:37, 3.08it/s] Training 1/1 epoch (loss 1.4797): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 580/1250 [03:21<03:37, 3.08it/s] Training 1/1 epoch (loss 1.4797): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 581/1250 [03:21<03:37, 3.08it/s] Training 1/1 epoch (loss 1.6083): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 581/1250 [03:22<03:37, 3.08it/s] Training 1/1 epoch (loss 1.6083): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 582/1250 [03:22<03:43, 2.98it/s] Training 1/1 epoch (loss 1.5278): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 582/1250 [03:22<03:43, 2.98it/s] Training 1/1 epoch (loss 1.5278): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 583/1250 [03:22<03:38, 3.05it/s] Training 1/1 epoch (loss 1.5978): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 583/1250 [03:22<03:38, 3.05it/s] Training 1/1 epoch (loss 1.5978): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 584/1250 [03:22<03:44, 2.97it/s] Training 1/1 epoch (loss 1.6347): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 584/1250 [03:23<03:44, 2.97it/s] Training 1/1 epoch (loss 1.6347): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 585/1250 [03:23<03:41, 3.00it/s] Training 1/1 epoch (loss 1.6076): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 585/1250 [03:23<03:41, 3.00it/s] Training 1/1 epoch (loss 1.6076): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 586/1250 [03:23<03:38, 3.04it/s] Training 1/1 epoch (loss 1.5612): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 586/1250 [03:23<03:38, 3.04it/s] Training 1/1 epoch (loss 1.5612): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 587/1250 [03:23<03:34, 3.09it/s] Training 1/1 epoch (loss 1.6899): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 587/1250 [03:24<03:34, 3.09it/s] Training 1/1 epoch (loss 1.6899): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 588/1250 [03:24<03:30, 3.14it/s] Training 1/1 epoch (loss 1.4311): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 588/1250 [03:24<03:30, 3.14it/s] Training 1/1 epoch (loss 1.4311): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 589/1250 [03:24<03:35, 3.06it/s] Training 1/1 epoch (loss 1.6677): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 589/1250 [03:24<03:35, 3.06it/s] Training 1/1 epoch (loss 1.6677): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 590/1250 [03:24<03:32, 3.10it/s] Training 1/1 epoch (loss 1.5952): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 590/1250 [03:25<03:32, 3.10it/s] Training 1/1 epoch (loss 1.5952): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 591/1250 [03:25<03:29, 3.14it/s] Training 1/1 epoch (loss 1.5115): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 591/1250 [03:25<03:29, 3.14it/s] Training 1/1 epoch (loss 1.5115): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 592/1250 [03:25<03:30, 3.12it/s] Training 1/1 epoch (loss 1.5559): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 592/1250 [03:25<03:30, 3.12it/s] Training 1/1 epoch (loss 1.5559): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 593/1250 [03:25<03:27, 3.17it/s] Training 1/1 epoch (loss 1.4984): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 593/1250 [03:26<03:27, 3.17it/s] Training 1/1 epoch (loss 1.4984): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 594/1250 [03:26<03:26, 3.17it/s] Training 1/1 epoch (loss 1.5408): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 594/1250 [03:26<03:26, 3.17it/s] Training 1/1 epoch (loss 1.5408): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 595/1250 [03:26<03:43, 2.93it/s] Training 1/1 epoch (loss 1.4894): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 595/1250 [03:26<03:43, 2.93it/s] Training 1/1 epoch (loss 1.4894): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 596/1250 [03:26<03:48, 2.86it/s] Training 1/1 epoch (loss 1.5969): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 596/1250 [03:27<03:48, 2.86it/s] Training 1/1 epoch (loss 1.5969): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 597/1250 [03:27<03:53, 2.80it/s] Training 1/1 epoch (loss 1.5364): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 597/1250 [03:27<03:53, 2.80it/s] Training 1/1 epoch (loss 1.5364): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 598/1250 [03:27<03:53, 2.79it/s] Training 1/1 epoch (loss 1.4546): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 598/1250 [03:28<03:53, 2.79it/s] Training 1/1 epoch (loss 1.4546): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 599/1250 [03:28<03:49, 2.83it/s] Training 1/1 epoch (loss 1.4540): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 599/1250 [03:28<03:49, 2.83it/s] Training 1/1 epoch (loss 1.4540): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 600/1250 [03:28<03:59, 2.72it/s] Training 1/1 epoch (loss 1.4980): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 600/1250 [03:28<03:59, 2.72it/s] Training 1/1 epoch (loss 1.4980): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 601/1250 [03:28<03:53, 2.78it/s] Training 1/1 epoch (loss 1.6110): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 601/1250 [03:29<03:53, 2.78it/s] Training 1/1 epoch (loss 1.6110): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 602/1250 [03:29<03:42, 2.91it/s] Training 1/1 epoch (loss 1.6674): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 602/1250 [03:29<03:42, 2.91it/s] Training 1/1 epoch (loss 1.6674): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 603/1250 [03:29<03:43, 2.90it/s] Training 1/1 epoch (loss 1.4991): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 603/1250 [03:29<03:43, 2.90it/s] Training 1/1 epoch (loss 1.4991): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 604/1250 [03:29<03:35, 3.00it/s] Training 1/1 epoch (loss 1.5593): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 604/1250 [03:30<03:35, 3.00it/s] Training 1/1 epoch (loss 1.5593): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 605/1250 [03:30<03:43, 2.89it/s] Training 1/1 epoch (loss 1.4671): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 605/1250 [03:30<03:43, 2.89it/s] Training 1/1 epoch (loss 1.4671): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 606/1250 [03:30<03:44, 2.86it/s] Training 1/1 epoch (loss 1.5818): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 606/1250 [03:30<03:44, 2.86it/s] Training 1/1 epoch (loss 1.5818): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 607/1250 [03:30<03:48, 2.82it/s] Training 1/1 epoch (loss 1.6155): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 607/1250 [03:31<03:48, 2.82it/s] Training 1/1 epoch (loss 1.6155): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 608/1250 [03:31<03:42, 2.88it/s] Training 1/1 epoch (loss 1.5732): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 608/1250 [03:31<03:42, 2.88it/s] Training 1/1 epoch (loss 1.5732): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 609/1250 [03:31<03:30, 3.05it/s] Training 1/1 epoch (loss 1.5103): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 609/1250 [03:31<03:30, 3.05it/s] Training 1/1 epoch (loss 1.5103): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 610/1250 [03:31<03:32, 3.01it/s] Training 1/1 epoch (loss 1.4958): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 610/1250 [03:32<03:32, 3.01it/s] Training 1/1 epoch (loss 1.4958): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 611/1250 [03:32<03:22, 3.15it/s] Training 1/1 epoch (loss 1.6313): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 611/1250 [03:32<03:22, 3.15it/s] Training 1/1 epoch (loss 1.6313): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 612/1250 [03:32<03:28, 3.05it/s] Training 1/1 epoch (loss 1.4736): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 612/1250 [03:32<03:28, 3.05it/s] Training 1/1 epoch (loss 1.4736): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 613/1250 [03:32<03:33, 2.98it/s] Training 1/1 epoch (loss 1.6702): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 613/1250 [03:33<03:33, 2.98it/s] Training 1/1 epoch (loss 1.6702): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 614/1250 [03:33<03:27, 3.06it/s] Training 1/1 epoch (loss 1.4929): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 614/1250 [03:33<03:27, 3.06it/s] Training 1/1 epoch (loss 1.4929): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 615/1250 [03:33<03:23, 3.11it/s] Training 1/1 epoch (loss 1.6564): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 615/1250 [03:33<03:23, 3.11it/s] Training 1/1 epoch (loss 1.6564): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 616/1250 [03:33<03:27, 3.05it/s] Training 1/1 epoch (loss 1.4992): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 616/1250 [03:34<03:27, 3.05it/s] Training 1/1 epoch (loss 1.4992): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 617/1250 [03:34<03:26, 3.07it/s] Training 1/1 epoch (loss 1.6698): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 617/1250 [03:34<03:26, 3.07it/s] Training 1/1 epoch (loss 1.6698): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 618/1250 [03:34<03:30, 3.00it/s] Training 1/1 epoch (loss 1.5076): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 618/1250 [03:34<03:30, 3.00it/s] Training 1/1 epoch (loss 1.5076): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 619/1250 [03:34<03:43, 2.82it/s] Training 1/1 epoch (loss 1.4736): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 619/1250 [03:35<03:43, 2.82it/s] Training 1/1 epoch (loss 1.4736): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 620/1250 [03:35<03:45, 2.79it/s] Training 1/1 epoch (loss 1.5016): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 620/1250 [03:35<03:45, 2.79it/s] Training 1/1 epoch (loss 1.5016): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 621/1250 [03:35<03:43, 2.81it/s] Training 1/1 epoch (loss 1.5647): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 621/1250 [03:35<03:43, 2.81it/s] Training 1/1 epoch (loss 1.5647): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 622/1250 [03:35<03:43, 2.81it/s] Training 1/1 epoch (loss 1.5357): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 622/1250 [03:36<03:43, 2.81it/s] Training 1/1 epoch (loss 1.5357): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 623/1250 [03:36<03:42, 2.82it/s] Training 1/1 epoch (loss 1.5774): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 623/1250 [03:36<03:42, 2.82it/s] Training 1/1 epoch (loss 1.5774): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 624/1250 [03:36<03:56, 2.65it/s] Training 1/1 epoch (loss 1.5092): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 624/1250 [03:37<03:56, 2.65it/s] Training 1/1 epoch (loss 1.5092): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 625/1250 [03:37<03:51, 2.70it/s] Training 1/1 epoch (loss 1.5632): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 625/1250 [03:37<03:51, 2.70it/s] Training 1/1 epoch (loss 1.5632): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 626/1250 [03:37<03:39, 2.85it/s] Training 1/1 epoch (loss 1.7102): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 626/1250 [03:37<03:39, 2.85it/s] Training 1/1 epoch (loss 1.7102): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 627/1250 [03:37<03:35, 2.89it/s] Training 1/1 epoch (loss 1.5602): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 627/1250 [03:37<03:35, 2.89it/s] Training 1/1 epoch (loss 1.5602): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 628/1250 [03:37<03:33, 2.91it/s] Training 1/1 epoch (loss 1.5394): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 628/1250 [03:38<03:33, 2.91it/s] Training 1/1 epoch (loss 1.5394): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 629/1250 [03:38<03:28, 2.98it/s] Training 1/1 epoch (loss 1.6313): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 629/1250 [03:38<03:28, 2.98it/s] Training 1/1 epoch (loss 1.6313): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 630/1250 [03:38<03:25, 3.02it/s] Training 1/1 epoch (loss 1.6110): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 630/1250 [03:38<03:25, 3.02it/s] Training 1/1 epoch (loss 1.6110): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 631/1250 [03:38<03:24, 3.03it/s] Training 1/1 epoch (loss 1.5054): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 631/1250 [03:39<03:24, 3.03it/s] Training 1/1 epoch (loss 1.5054): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 632/1250 [03:39<03:25, 3.01it/s] Training 1/1 epoch (loss 1.5611): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 632/1250 [03:39<03:25, 3.01it/s] Training 1/1 epoch (loss 1.5611): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 633/1250 [03:39<03:20, 3.08it/s] Training 1/1 epoch (loss 1.5268): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 633/1250 [03:39<03:20, 3.08it/s] Training 1/1 epoch (loss 1.5268): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 634/1250 [03:39<03:15, 3.16it/s] Training 1/1 epoch (loss 1.4175): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 634/1250 [03:40<03:15, 3.16it/s] Training 1/1 epoch (loss 1.4175): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 635/1250 [03:40<03:15, 3.14it/s] Training 1/1 epoch (loss 1.5070): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 635/1250 [03:40<03:15, 3.14it/s] Training 1/1 epoch (loss 1.5070): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 636/1250 [03:40<03:15, 3.14it/s] Training 1/1 epoch (loss 1.5028): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 636/1250 [03:40<03:15, 3.14it/s] Training 1/1 epoch (loss 1.5028): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 637/1250 [03:40<03:21, 3.05it/s] Training 1/1 epoch (loss 1.5524): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 637/1250 [03:41<03:21, 3.05it/s] Training 1/1 epoch (loss 1.5524): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 638/1250 [03:41<03:18, 3.09it/s] Training 1/1 epoch (loss 1.6153): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 638/1250 [03:41<03:18, 3.09it/s] Training 1/1 epoch (loss 1.6153): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 639/1250 [03:41<03:11, 3.20it/s] Training 1/1 epoch (loss 1.5490): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 639/1250 [03:41<03:11, 3.20it/s] Training 1/1 epoch (loss 1.5490): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 640/1250 [03:41<03:18, 3.07it/s] Training 1/1 epoch (loss 1.5697): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 640/1250 [03:42<03:18, 3.07it/s] Training 1/1 epoch (loss 1.5697): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 641/1250 [03:42<03:17, 3.08it/s] Training 1/1 epoch (loss 1.5100): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 641/1250 [03:42<03:17, 3.08it/s] Training 1/1 epoch (loss 1.5100): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 642/1250 [03:42<03:16, 3.09it/s] Training 1/1 epoch (loss 1.6577): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 642/1250 [03:42<03:16, 3.09it/s] Training 1/1 epoch (loss 1.6577): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 643/1250 [03:42<03:32, 2.85it/s] Training 1/1 epoch (loss 1.4945): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 643/1250 [03:43<03:32, 2.85it/s] Training 1/1 epoch (loss 1.4945): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 644/1250 [03:43<03:26, 2.94it/s] Training 1/1 epoch (loss 1.4853): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 644/1250 [03:43<03:26, 2.94it/s] Training 1/1 epoch (loss 1.4853): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 645/1250 [03:43<03:19, 3.03it/s] Training 1/1 epoch (loss 1.6125): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 645/1250 [03:43<03:19, 3.03it/s] Training 1/1 epoch (loss 1.6125): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 646/1250 [03:43<03:15, 3.09it/s] Training 1/1 epoch (loss 1.4895): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 646/1250 [03:44<03:15, 3.09it/s] Training 1/1 epoch (loss 1.4895): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 647/1250 [03:44<03:13, 3.11it/s] Training 1/1 epoch (loss 1.5908): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 647/1250 [03:44<03:13, 3.11it/s] Training 1/1 epoch (loss 1.5908): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 648/1250 [03:44<03:21, 2.99it/s] Training 1/1 epoch (loss 1.4284): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 648/1250 [03:44<03:21, 2.99it/s] Training 1/1 epoch (loss 1.4284): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 649/1250 [03:44<03:28, 2.88it/s] Training 1/1 epoch (loss 1.6306): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 649/1250 [03:45<03:28, 2.88it/s] Training 1/1 epoch (loss 1.6306): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 650/1250 [03:45<03:22, 2.96it/s] Training 1/1 epoch (loss 1.4964): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 650/1250 [03:45<03:22, 2.96it/s] Training 1/1 epoch (loss 1.4964): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 651/1250 [03:45<03:18, 3.02it/s] Training 1/1 epoch (loss 1.5366): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 651/1250 [03:45<03:18, 3.02it/s] Training 1/1 epoch (loss 1.5366): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 652/1250 [03:45<03:12, 3.11it/s] Training 1/1 epoch (loss 1.5395): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 652/1250 [03:46<03:12, 3.11it/s] Training 1/1 epoch (loss 1.5395): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 653/1250 [03:46<03:11, 3.11it/s] Training 1/1 epoch (loss 1.6482): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 653/1250 [03:46<03:11, 3.11it/s] Training 1/1 epoch (loss 1.6482): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 654/1250 [03:46<03:10, 3.13it/s] Training 1/1 epoch (loss 1.5043): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 654/1250 [03:46<03:10, 3.13it/s] Training 1/1 epoch (loss 1.5043): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 655/1250 [03:46<03:17, 3.01it/s] Training 1/1 epoch (loss 1.5558): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 655/1250 [03:47<03:17, 3.01it/s] Training 1/1 epoch (loss 1.5558): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 656/1250 [03:47<03:16, 3.02it/s] Training 1/1 epoch (loss 1.6135): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 656/1250 [03:47<03:16, 3.02it/s] Training 1/1 epoch (loss 1.6135): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 657/1250 [03:47<03:09, 3.13it/s] Training 1/1 epoch (loss 1.4348): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 657/1250 [03:47<03:09, 3.13it/s] Training 1/1 epoch (loss 1.4348): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 658/1250 [03:47<03:10, 3.11it/s] Training 1/1 epoch (loss 1.4710): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 658/1250 [03:48<03:10, 3.11it/s] Training 1/1 epoch (loss 1.4710): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 659/1250 [03:48<03:16, 3.01it/s] Training 1/1 epoch (loss 1.3850): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 659/1250 [03:48<03:16, 3.01it/s] Training 1/1 epoch (loss 1.3850): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 660/1250 [03:48<03:13, 3.05it/s] Training 1/1 epoch (loss 1.5009): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 660/1250 [03:48<03:13, 3.05it/s] Training 1/1 epoch (loss 1.5009): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 661/1250 [03:48<03:20, 2.94it/s] Training 1/1 epoch (loss 1.5064): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 661/1250 [03:49<03:20, 2.94it/s] Training 1/1 epoch (loss 1.5064): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 662/1250 [03:49<03:15, 3.00it/s] Training 1/1 epoch (loss 1.5555): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 662/1250 [03:49<03:15, 3.00it/s] Training 1/1 epoch (loss 1.5555): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 663/1250 [03:49<03:11, 3.07it/s] Training 1/1 epoch (loss 1.5485): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 663/1250 [03:49<03:11, 3.07it/s] Training 1/1 epoch (loss 1.5485): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 664/1250 [03:49<03:11, 3.05it/s] Training 1/1 epoch (loss 1.5486): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 664/1250 [03:50<03:11, 3.05it/s] Training 1/1 epoch (loss 1.5486): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 665/1250 [03:50<03:08, 3.10it/s] Training 1/1 epoch (loss 1.6450): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 665/1250 [03:50<03:08, 3.10it/s] Training 1/1 epoch (loss 1.6450): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 666/1250 [03:50<03:09, 3.08it/s] Training 1/1 epoch (loss 1.5050): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 666/1250 [03:50<03:09, 3.08it/s] Training 1/1 epoch (loss 1.5050): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 667/1250 [03:50<03:10, 3.07it/s] Training 1/1 epoch (loss 1.5568): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 667/1250 [03:51<03:10, 3.07it/s] Training 1/1 epoch (loss 1.5568): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 668/1250 [03:51<03:09, 3.07it/s] Training 1/1 epoch (loss 1.5551): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 668/1250 [03:51<03:09, 3.07it/s] Training 1/1 epoch (loss 1.5551): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 669/1250 [03:51<03:08, 3.09it/s] Training 1/1 epoch (loss 1.5806): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 669/1250 [03:51<03:08, 3.09it/s] Training 1/1 epoch (loss 1.5806): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 670/1250 [03:51<03:02, 3.19it/s] Training 1/1 epoch (loss 1.4892): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 670/1250 [03:51<03:02, 3.19it/s] Training 1/1 epoch (loss 1.4892): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 671/1250 [03:51<03:01, 3.18it/s] Training 1/1 epoch (loss 1.4861): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 671/1250 [03:52<03:01, 3.18it/s] Training 1/1 epoch (loss 1.4861): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 672/1250 [03:52<03:07, 3.08it/s] Training 1/1 epoch (loss 1.5145): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 672/1250 [03:52<03:07, 3.08it/s] Training 1/1 epoch (loss 1.5145): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 673/1250 [03:52<03:07, 3.08it/s] Training 1/1 epoch (loss 1.5171): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 673/1250 [03:52<03:07, 3.08it/s] Training 1/1 epoch (loss 1.5171): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 674/1250 [03:52<03:06, 3.08it/s] Training 1/1 epoch (loss 1.5632): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 674/1250 [03:53<03:06, 3.08it/s] Training 1/1 epoch (loss 1.5632): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 675/1250 [03:53<03:12, 2.98it/s] Training 1/1 epoch (loss 1.5361): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 675/1250 [03:53<03:12, 2.98it/s] Training 1/1 epoch (loss 1.5361): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 676/1250 [03:53<03:07, 3.07it/s] Training 1/1 epoch (loss 1.5601): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 676/1250 [03:53<03:07, 3.07it/s] Training 1/1 epoch (loss 1.5601): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 677/1250 [03:53<03:03, 3.12it/s] Training 1/1 epoch (loss 1.5361): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 677/1250 [03:54<03:03, 3.12it/s] Training 1/1 epoch (loss 1.5361): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 678/1250 [03:54<03:04, 3.09it/s] Training 1/1 epoch (loss 1.5355): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 678/1250 [03:54<03:04, 3.09it/s] Training 1/1 epoch (loss 1.5355): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 679/1250 [03:54<03:06, 3.07it/s] Training 1/1 epoch (loss 1.5598): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 679/1250 [03:54<03:06, 3.07it/s] Training 1/1 epoch (loss 1.5598): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 680/1250 [03:54<03:06, 3.05it/s] Training 1/1 epoch (loss 1.4968): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 680/1250 [03:55<03:06, 3.05it/s] Training 1/1 epoch (loss 1.4968): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 681/1250 [03:55<03:04, 3.08it/s] Training 1/1 epoch (loss 1.5678): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 681/1250 [03:55<03:04, 3.08it/s] Training 1/1 epoch (loss 1.5678): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 682/1250 [03:55<02:58, 3.19it/s] Training 1/1 epoch (loss 1.5511): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 682/1250 [03:55<02:58, 3.19it/s] Training 1/1 epoch (loss 1.5511): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 683/1250 [03:55<03:00, 3.14it/s] Training 1/1 epoch (loss 1.6106): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 683/1250 [03:56<03:00, 3.14it/s] Training 1/1 epoch (loss 1.6106): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 684/1250 [03:56<02:58, 3.17it/s] Training 1/1 epoch (loss 1.5098): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 684/1250 [03:56<02:58, 3.17it/s] Training 1/1 epoch (loss 1.5098): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 685/1250 [03:56<03:00, 3.14it/s] Training 1/1 epoch (loss 1.5290): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 685/1250 [03:56<03:00, 3.14it/s] Training 1/1 epoch (loss 1.5290): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 686/1250 [03:56<03:00, 3.13it/s] Training 1/1 epoch (loss 1.5427): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 686/1250 [03:57<03:00, 3.13it/s] Training 1/1 epoch (loss 1.5427): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 687/1250 [03:57<02:57, 3.18it/s] Training 1/1 epoch (loss 1.3919): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 687/1250 [03:57<02:57, 3.18it/s] Training 1/1 epoch (loss 1.3919): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 688/1250 [03:57<02:56, 3.19it/s] Training 1/1 epoch (loss 1.4859): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 688/1250 [03:57<02:56, 3.19it/s] Training 1/1 epoch (loss 1.4859): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 689/1250 [03:57<02:56, 3.17it/s] Training 1/1 epoch (loss 1.4165): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 689/1250 [03:58<02:56, 3.17it/s] Training 1/1 epoch (loss 1.4165): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 690/1250 [03:58<02:58, 3.14it/s] Training 1/1 epoch (loss 1.5640): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 690/1250 [03:58<02:58, 3.14it/s] Training 1/1 epoch (loss 1.5640): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 691/1250 [03:58<03:02, 3.06it/s] Training 1/1 epoch (loss 1.4841): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 691/1250 [03:58<03:02, 3.06it/s] Training 1/1 epoch (loss 1.4841): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 692/1250 [03:58<03:08, 2.96it/s] Training 1/1 epoch (loss 1.5701): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 692/1250 [03:59<03:08, 2.96it/s] Training 1/1 epoch (loss 1.5701): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 693/1250 [03:59<03:02, 3.05it/s] Training 1/1 epoch (loss 1.4587): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 693/1250 [03:59<03:02, 3.05it/s] Training 1/1 epoch (loss 1.4587): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 694/1250 [03:59<03:17, 2.81it/s] Training 1/1 epoch (loss 1.5958): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 694/1250 [03:59<03:17, 2.81it/s] Training 1/1 epoch (loss 1.5958): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 695/1250 [03:59<03:10, 2.92it/s] Training 1/1 epoch (loss 1.6715): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 695/1250 [04:00<03:10, 2.92it/s] Training 1/1 epoch (loss 1.6715): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 696/1250 [04:00<03:13, 2.87it/s] Training 1/1 epoch (loss 1.5506): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 696/1250 [04:00<03:13, 2.87it/s] Training 1/1 epoch (loss 1.5506): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 697/1250 [04:00<03:12, 2.88it/s] Training 1/1 epoch (loss 1.5667): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 697/1250 [04:00<03:12, 2.88it/s] Training 1/1 epoch (loss 1.5667): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 698/1250 [04:00<03:09, 2.91it/s] Training 1/1 epoch (loss 1.6549): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 698/1250 [04:01<03:09, 2.91it/s] Training 1/1 epoch (loss 1.6549): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 699/1250 [04:01<03:05, 2.97it/s] Training 1/1 epoch (loss 1.3779): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 699/1250 [04:01<03:05, 2.97it/s] Training 1/1 epoch (loss 1.3779): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 700/1250 [04:01<03:03, 3.00it/s] Training 1/1 epoch (loss 1.5554): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 700/1250 [04:01<03:03, 3.00it/s] Training 1/1 epoch (loss 1.5554): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 701/1250 [04:01<02:58, 3.08it/s] Training 1/1 epoch (loss 1.4032): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 701/1250 [04:02<02:58, 3.08it/s] Training 1/1 epoch (loss 1.4032): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 702/1250 [04:02<02:56, 3.11it/s] Training 1/1 epoch (loss 1.5705): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 702/1250 [04:02<02:56, 3.11it/s] Training 1/1 epoch (loss 1.5705): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 703/1250 [04:02<02:54, 3.13it/s] Training 1/1 epoch (loss 1.4012): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 703/1250 [04:02<02:54, 3.13it/s] Training 1/1 epoch (loss 1.4012): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 704/1250 [04:02<03:09, 2.88it/s] Training 1/1 epoch (loss 1.5793): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 704/1250 [04:03<03:09, 2.88it/s] Training 1/1 epoch (loss 1.5793): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 705/1250 [04:03<03:04, 2.96it/s] Training 1/1 epoch (loss 1.5035): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 705/1250 [04:03<03:04, 2.96it/s] Training 1/1 epoch (loss 1.5035): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 706/1250 [04:03<02:59, 3.03it/s] Training 1/1 epoch (loss 1.6331): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 706/1250 [04:03<02:59, 3.03it/s] Training 1/1 epoch (loss 1.6331): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 707/1250 [04:03<03:03, 2.96it/s] Training 1/1 epoch (loss 1.6060): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 707/1250 [04:04<03:03, 2.96it/s] Training 1/1 epoch (loss 1.6060): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 708/1250 [04:04<03:00, 3.00it/s] Training 1/1 epoch (loss 1.5164): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 708/1250 [04:04<03:00, 3.00it/s] Training 1/1 epoch (loss 1.5164): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 709/1250 [04:04<02:59, 3.01it/s] Training 1/1 epoch (loss 1.5647): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 709/1250 [04:04<02:59, 3.01it/s] Training 1/1 epoch (loss 1.5647): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 710/1250 [04:04<03:02, 2.96it/s] Training 1/1 epoch (loss 1.5979): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 710/1250 [04:05<03:02, 2.96it/s] Training 1/1 epoch (loss 1.5979): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 711/1250 [04:05<02:56, 3.05it/s] Training 1/1 epoch (loss 1.4790): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 711/1250 [04:05<02:56, 3.05it/s] Training 1/1 epoch (loss 1.4790): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 712/1250 [04:05<02:53, 3.09it/s] Training 1/1 epoch (loss 1.5869): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 712/1250 [04:05<02:53, 3.09it/s] Training 1/1 epoch (loss 1.5869): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 713/1250 [04:05<02:53, 3.10it/s] Training 1/1 epoch (loss 1.4049): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 713/1250 [04:06<02:53, 3.10it/s] Training 1/1 epoch (loss 1.4049): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 714/1250 [04:06<02:53, 3.08it/s] Training 1/1 epoch (loss 1.5129): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 714/1250 [04:06<02:53, 3.08it/s] Training 1/1 epoch (loss 1.5129): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 715/1250 [04:06<02:50, 3.14it/s] Training 1/1 epoch (loss 1.6267): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 715/1250 [04:06<02:50, 3.14it/s] Training 1/1 epoch (loss 1.6267): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 716/1250 [04:06<02:52, 3.10it/s] Training 1/1 epoch (loss 1.6487): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 716/1250 [04:07<02:52, 3.10it/s] Training 1/1 epoch (loss 1.6487): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 717/1250 [04:07<02:51, 3.10it/s] Training 1/1 epoch (loss 1.5512): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 717/1250 [04:07<02:51, 3.10it/s] Training 1/1 epoch (loss 1.5512): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 718/1250 [04:07<02:48, 3.16it/s] Training 1/1 epoch (loss 1.6256): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 718/1250 [04:07<02:48, 3.16it/s] Training 1/1 epoch (loss 1.6256): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 719/1250 [04:07<02:44, 3.22it/s] Training 1/1 epoch (loss 1.4680): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 719/1250 [04:08<02:44, 3.22it/s] Training 1/1 epoch (loss 1.4680): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 720/1250 [04:08<02:47, 3.16it/s] Training 1/1 epoch (loss 1.5120): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 720/1250 [04:08<02:47, 3.16it/s] Training 1/1 epoch (loss 1.5120): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 721/1250 [04:08<02:53, 3.04it/s] Training 1/1 epoch (loss 1.5342): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 721/1250 [04:08<02:53, 3.04it/s] Training 1/1 epoch (loss 1.5342): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 722/1250 [04:08<02:55, 3.01it/s] Training 1/1 epoch (loss 1.5233): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 722/1250 [04:09<02:55, 3.01it/s] Training 1/1 epoch (loss 1.5233): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 723/1250 [04:09<03:02, 2.89it/s] Training 1/1 epoch (loss 1.4893): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 723/1250 [04:09<03:02, 2.89it/s] Training 1/1 epoch (loss 1.4893): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 724/1250 [04:09<02:55, 2.99it/s] Training 1/1 epoch (loss 1.5236): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 724/1250 [04:09<02:55, 2.99it/s] Training 1/1 epoch (loss 1.5236): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 725/1250 [04:09<02:48, 3.11it/s] Training 1/1 epoch (loss 1.5478): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 725/1250 [04:10<02:48, 3.11it/s] Training 1/1 epoch (loss 1.5478): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 726/1250 [04:10<02:46, 3.15it/s] Training 1/1 epoch (loss 1.5741): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 726/1250 [04:10<02:46, 3.15it/s] Training 1/1 epoch (loss 1.5741): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 727/1250 [04:10<02:45, 3.16it/s] Training 1/1 epoch (loss 1.5890): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 727/1250 [04:10<02:45, 3.16it/s] Training 1/1 epoch (loss 1.5890): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 728/1250 [04:10<02:49, 3.08it/s] Training 1/1 epoch (loss 1.5317): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 728/1250 [04:10<02:49, 3.08it/s] Training 1/1 epoch (loss 1.5317): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 729/1250 [04:10<02:50, 3.06it/s] Training 1/1 epoch (loss 1.6553): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 729/1250 [04:11<02:50, 3.06it/s] Training 1/1 epoch (loss 1.6553): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 730/1250 [04:11<02:47, 3.10it/s] Training 1/1 epoch (loss 1.6161): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 730/1250 [04:11<02:47, 3.10it/s] Training 1/1 epoch (loss 1.6161): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 731/1250 [04:11<02:50, 3.04it/s] Training 1/1 epoch (loss 1.5242): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 731/1250 [04:11<02:50, 3.04it/s] Training 1/1 epoch (loss 1.5242): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 732/1250 [04:11<02:45, 3.12it/s] Training 1/1 epoch (loss 1.4447): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 732/1250 [04:12<02:45, 3.12it/s] Training 1/1 epoch (loss 1.4447): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 733/1250 [04:12<02:52, 2.99it/s] Training 1/1 epoch (loss 1.5695): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 733/1250 [04:12<02:52, 2.99it/s] Training 1/1 epoch (loss 1.5695): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 734/1250 [04:12<02:51, 3.02it/s] Training 1/1 epoch (loss 1.4556): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 734/1250 [04:12<02:51, 3.02it/s] Training 1/1 epoch (loss 1.4556): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 735/1250 [04:12<02:48, 3.05it/s] Training 1/1 epoch (loss 1.4958): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 735/1250 [04:13<02:48, 3.05it/s] Training 1/1 epoch (loss 1.4958): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 736/1250 [04:13<02:50, 3.02it/s] Training 1/1 epoch (loss 1.6340): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 736/1250 [04:13<02:50, 3.02it/s] Training 1/1 epoch (loss 1.6340): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 737/1250 [04:13<02:46, 3.07it/s] Training 1/1 epoch (loss 1.5707): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 737/1250 [04:13<02:46, 3.07it/s] Training 1/1 epoch (loss 1.5707): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 738/1250 [04:13<02:44, 3.11it/s] Training 1/1 epoch (loss 1.4948): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 738/1250 [04:14<02:44, 3.11it/s] Training 1/1 epoch (loss 1.4948): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 739/1250 [04:14<02:50, 3.00it/s] Training 1/1 epoch (loss 1.5104): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 739/1250 [04:14<02:50, 3.00it/s] Training 1/1 epoch (loss 1.5104): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 740/1250 [04:14<02:50, 3.00it/s] Training 1/1 epoch (loss 1.5528): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 740/1250 [04:14<02:50, 3.00it/s] Training 1/1 epoch (loss 1.5528): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 741/1250 [04:14<02:54, 2.92it/s] Training 1/1 epoch (loss 1.4689): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 741/1250 [04:15<02:54, 2.92it/s] Training 1/1 epoch (loss 1.4689): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 742/1250 [04:15<02:50, 2.99it/s] Training 1/1 epoch (loss 1.5141): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 742/1250 [04:15<02:50, 2.99it/s] Training 1/1 epoch (loss 1.5141): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 743/1250 [04:15<02:46, 3.04it/s] Training 1/1 epoch (loss 1.6149): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 743/1250 [04:15<02:46, 3.04it/s] Training 1/1 epoch (loss 1.6149): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 744/1250 [04:15<02:48, 3.00it/s] Training 1/1 epoch (loss 1.5124): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 744/1250 [04:16<02:48, 3.00it/s] Training 1/1 epoch (loss 1.5124): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 745/1250 [04:16<02:51, 2.95it/s] Training 1/1 epoch (loss 1.5428): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 745/1250 [04:16<02:51, 2.95it/s] Training 1/1 epoch (loss 1.5428): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 746/1250 [04:16<02:47, 3.02it/s] Training 1/1 epoch (loss 1.5416): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 746/1250 [04:16<02:47, 3.02it/s] Training 1/1 epoch (loss 1.5416): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 747/1250 [04:16<02:46, 3.02it/s] Training 1/1 epoch (loss 1.6113): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 747/1250 [04:17<02:46, 3.02it/s] Training 1/1 epoch (loss 1.6113): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 748/1250 [04:17<02:42, 3.09it/s] Training 1/1 epoch (loss 1.5208): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 748/1250 [04:17<02:42, 3.09it/s] Training 1/1 epoch (loss 1.5208): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 749/1250 [04:17<02:40, 3.11it/s] Training 1/1 epoch (loss 1.4705): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 749/1250 [04:17<02:40, 3.11it/s] Training 1/1 epoch (loss 1.4705): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 750/1250 [04:17<02:40, 3.12it/s] Training 1/1 epoch (loss 1.4951): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 750/1250 [04:18<02:40, 3.12it/s] Training 1/1 epoch (loss 1.4951): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 751/1250 [04:18<02:41, 3.08it/s] Training 1/1 epoch (loss 1.4975): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 751/1250 [04:18<02:41, 3.08it/s] Training 1/1 epoch (loss 1.4975): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 752/1250 [04:18<02:50, 2.92it/s] Training 1/1 epoch (loss 1.5712): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 752/1250 [04:18<02:50, 2.92it/s] Training 1/1 epoch (loss 1.5712): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 753/1250 [04:18<02:50, 2.91it/s] Training 1/1 epoch (loss 1.4695): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 753/1250 [04:19<02:50, 2.91it/s] Training 1/1 epoch (loss 1.4695): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 754/1250 [04:19<02:45, 2.99it/s] Training 1/1 epoch (loss 1.5575): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 754/1250 [04:19<02:45, 2.99it/s] Training 1/1 epoch (loss 1.5575): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 755/1250 [04:19<02:47, 2.95it/s] Training 1/1 epoch (loss 1.6193): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 755/1250 [04:19<02:47, 2.95it/s] Training 1/1 epoch (loss 1.6193): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 756/1250 [04:19<02:43, 3.02it/s] Training 1/1 epoch (loss 1.4771): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 756/1250 [04:20<02:43, 3.02it/s] Training 1/1 epoch (loss 1.4771): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 757/1250 [04:20<02:43, 3.01it/s] Training 1/1 epoch (loss 1.4873): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 757/1250 [04:20<02:43, 3.01it/s] Training 1/1 epoch (loss 1.4873): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 758/1250 [04:20<02:42, 3.03it/s] Training 1/1 epoch (loss 1.6699): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 758/1250 [04:20<02:42, 3.03it/s] Training 1/1 epoch (loss 1.6699): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 759/1250 [04:20<02:43, 3.01it/s] Training 1/1 epoch (loss 1.3874): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 759/1250 [04:21<02:43, 3.01it/s] Training 1/1 epoch (loss 1.3874): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 760/1250 [04:21<02:44, 2.98it/s] Training 1/1 epoch (loss 1.4359): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 760/1250 [04:21<02:44, 2.98it/s] Training 1/1 epoch (loss 1.4359): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 761/1250 [04:21<02:43, 2.99it/s] Training 1/1 epoch (loss 1.5229): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 761/1250 [04:21<02:43, 2.99it/s] Training 1/1 epoch (loss 1.5229): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 762/1250 [04:21<02:43, 2.98it/s] Training 1/1 epoch (loss 1.5644): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 762/1250 [04:22<02:43, 2.98it/s] Training 1/1 epoch (loss 1.5644): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 763/1250 [04:22<02:41, 3.01it/s] Training 1/1 epoch (loss 1.4707): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 763/1250 [04:22<02:41, 3.01it/s] Training 1/1 epoch (loss 1.4707): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 764/1250 [04:22<02:39, 3.04it/s] Training 1/1 epoch (loss 1.6647): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 764/1250 [04:22<02:39, 3.04it/s] Training 1/1 epoch (loss 1.6647): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 765/1250 [04:22<02:43, 2.98it/s] Training 1/1 epoch (loss 1.4921): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 765/1250 [04:23<02:43, 2.98it/s] Training 1/1 epoch (loss 1.4921): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 766/1250 [04:23<02:37, 3.08it/s] Training 1/1 epoch (loss 1.5770): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 766/1250 [04:23<02:37, 3.08it/s] Training 1/1 epoch (loss 1.5770): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 767/1250 [04:23<02:37, 3.06it/s] Training 1/1 epoch (loss 1.4378): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 767/1250 [04:23<02:37, 3.06it/s] Training 1/1 epoch (loss 1.4378): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 768/1250 [04:23<02:36, 3.08it/s] Training 1/1 epoch (loss 1.4862): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 768/1250 [04:24<02:36, 3.08it/s] Training 1/1 epoch (loss 1.4862): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 769/1250 [04:24<02:36, 3.08it/s] Training 1/1 epoch (loss 1.5283): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 769/1250 [04:24<02:36, 3.08it/s] Training 1/1 epoch (loss 1.5283): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 770/1250 [04:24<02:37, 3.04it/s] Training 1/1 epoch (loss 1.4122): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 770/1250 [04:24<02:37, 3.04it/s] Training 1/1 epoch (loss 1.4122): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 771/1250 [04:24<02:43, 2.94it/s] Training 1/1 epoch (loss 1.5304): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 771/1250 [04:25<02:43, 2.94it/s] Training 1/1 epoch (loss 1.5304): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 772/1250 [04:25<02:36, 3.06it/s] Training 1/1 epoch (loss 1.5475): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 772/1250 [04:25<02:36, 3.06it/s] Training 1/1 epoch (loss 1.5475): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 773/1250 [04:25<02:33, 3.12it/s] Training 1/1 epoch (loss 1.4357): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 773/1250 [04:25<02:33, 3.12it/s] Training 1/1 epoch (loss 1.4357): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 774/1250 [04:25<02:33, 3.10it/s] Training 1/1 epoch (loss 1.6378): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 774/1250 [04:26<02:33, 3.10it/s] Training 1/1 epoch (loss 1.6378): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 775/1250 [04:26<02:31, 3.13it/s] Training 1/1 epoch (loss 1.4972): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 775/1250 [04:26<02:31, 3.13it/s] Training 1/1 epoch (loss 1.4972): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 776/1250 [04:26<02:34, 3.06it/s] Training 1/1 epoch (loss 1.4782): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 776/1250 [04:26<02:34, 3.06it/s] Training 1/1 epoch (loss 1.4782): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 777/1250 [04:26<02:38, 2.99it/s] Training 1/1 epoch (loss 1.6839): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 777/1250 [04:27<02:38, 2.99it/s] Training 1/1 epoch (loss 1.6839): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 778/1250 [04:27<02:30, 3.13it/s] Training 1/1 epoch (loss 1.5379): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 778/1250 [04:27<02:30, 3.13it/s] Training 1/1 epoch (loss 1.5379): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 779/1250 [04:27<02:28, 3.17it/s] Training 1/1 epoch (loss 1.5217): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 779/1250 [04:27<02:28, 3.17it/s] Training 1/1 epoch (loss 1.5217): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 780/1250 [04:27<02:26, 3.20it/s] Training 1/1 epoch (loss 1.6488): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 780/1250 [04:28<02:26, 3.20it/s] Training 1/1 epoch (loss 1.6488): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 781/1250 [04:28<02:25, 3.22it/s] Training 1/1 epoch (loss 1.4970): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 781/1250 [04:28<02:25, 3.22it/s] Training 1/1 epoch (loss 1.4970): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 782/1250 [04:28<02:29, 3.13it/s] Training 1/1 epoch (loss 1.5148): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 782/1250 [04:28<02:29, 3.13it/s] Training 1/1 epoch (loss 1.5148): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 783/1250 [04:28<02:28, 3.15it/s] Training 1/1 epoch (loss 1.6265): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 783/1250 [04:29<02:28, 3.15it/s] Training 1/1 epoch (loss 1.6265): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 784/1250 [04:29<02:31, 3.08it/s] Training 1/1 epoch (loss 1.5748): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 784/1250 [04:29<02:31, 3.08it/s] Training 1/1 epoch (loss 1.5748): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 785/1250 [04:29<02:27, 3.16it/s] Training 1/1 epoch (loss 1.5247): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 785/1250 [04:29<02:27, 3.16it/s] Training 1/1 epoch (loss 1.5247): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 786/1250 [04:29<02:41, 2.88it/s] Training 1/1 epoch (loss 1.5718): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 786/1250 [04:30<02:41, 2.88it/s] Training 1/1 epoch (loss 1.5718): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 787/1250 [04:30<02:44, 2.81it/s] Training 1/1 epoch (loss 1.5385): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 787/1250 [04:30<02:44, 2.81it/s] Training 1/1 epoch (loss 1.5385): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 788/1250 [04:30<02:42, 2.85it/s] Training 1/1 epoch (loss 1.5869): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 788/1250 [04:30<02:42, 2.85it/s] Training 1/1 epoch (loss 1.5869): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 789/1250 [04:30<02:37, 2.93it/s] Training 1/1 epoch (loss 1.5246): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 789/1250 [04:31<02:37, 2.93it/s] Training 1/1 epoch (loss 1.5246): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 790/1250 [04:31<02:33, 3.00it/s] Training 1/1 epoch (loss 1.5363): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 790/1250 [04:31<02:33, 3.00it/s] Training 1/1 epoch (loss 1.5363): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 791/1250 [04:31<02:26, 3.14it/s] Training 1/1 epoch (loss 1.5363): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 791/1250 [04:31<02:26, 3.14it/s] Training 1/1 epoch (loss 1.5363): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 792/1250 [04:31<02:26, 3.12it/s] Training 1/1 epoch (loss 1.5538): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 792/1250 [04:32<02:26, 3.12it/s] Training 1/1 epoch (loss 1.5538): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 793/1250 [04:32<02:31, 3.01it/s] Training 1/1 epoch (loss 1.5381): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 793/1250 [04:32<02:31, 3.01it/s] Training 1/1 epoch (loss 1.5381): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 794/1250 [04:32<02:33, 2.98it/s] Training 1/1 epoch (loss 1.5610): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 794/1250 [04:32<02:33, 2.98it/s] Training 1/1 epoch (loss 1.5610): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 795/1250 [04:32<02:30, 3.02it/s] Training 1/1 epoch (loss 1.5536): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 795/1250 [04:33<02:30, 3.02it/s] Training 1/1 epoch (loss 1.5536): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 796/1250 [04:33<02:28, 3.06it/s] Training 1/1 epoch (loss 1.6396): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 796/1250 [04:33<02:28, 3.06it/s] Training 1/1 epoch (loss 1.6396): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 797/1250 [04:33<02:26, 3.09it/s] Training 1/1 epoch (loss 1.6358): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 797/1250 [04:33<02:26, 3.09it/s] Training 1/1 epoch (loss 1.6358): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 798/1250 [04:33<02:23, 3.14it/s] Training 1/1 epoch (loss 1.5081): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 798/1250 [04:34<02:23, 3.14it/s] Training 1/1 epoch (loss 1.5081): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 799/1250 [04:34<02:24, 3.12it/s] Training 1/1 epoch (loss 1.5358): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 799/1250 [04:34<02:24, 3.12it/s] Training 1/1 epoch (loss 1.5358): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 800/1250 [04:34<02:27, 3.04it/s] Training 1/1 epoch (loss 1.6399): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 800/1250 [04:34<02:27, 3.04it/s] Training 1/1 epoch (loss 1.6399): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 801/1250 [04:34<02:31, 2.96it/s] Training 1/1 epoch (loss 1.5044): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 801/1250 [04:35<02:31, 2.96it/s] Training 1/1 epoch (loss 1.5044): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 802/1250 [04:35<02:33, 2.91it/s] Training 1/1 epoch (loss 1.6364): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 802/1250 [04:35<02:33, 2.91it/s] Training 1/1 epoch (loss 1.6364): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 803/1250 [04:35<02:29, 3.00it/s] Training 1/1 epoch (loss 1.6202): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 803/1250 [04:35<02:29, 3.00it/s] Training 1/1 epoch (loss 1.6202): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 804/1250 [04:35<02:23, 3.11it/s] Training 1/1 epoch (loss 1.5521): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 804/1250 [04:35<02:23, 3.11it/s] Training 1/1 epoch (loss 1.5521): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 805/1250 [04:35<02:19, 3.18it/s] Training 1/1 epoch (loss 1.5733): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 805/1250 [04:36<02:19, 3.18it/s] Training 1/1 epoch (loss 1.5733): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 806/1250 [04:36<02:19, 3.19it/s] Training 1/1 epoch (loss 1.4210): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 806/1250 [04:36<02:19, 3.19it/s] Training 1/1 epoch (loss 1.4210): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 807/1250 [04:36<02:21, 3.14it/s] Training 1/1 epoch (loss 1.4747): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 807/1250 [04:36<02:21, 3.14it/s] Training 1/1 epoch (loss 1.4747): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 808/1250 [04:36<02:25, 3.04it/s] Training 1/1 epoch (loss 1.4363): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 808/1250 [04:37<02:25, 3.04it/s] Training 1/1 epoch (loss 1.4363): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 809/1250 [04:37<02:21, 3.12it/s] Training 1/1 epoch (loss 1.5225): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 809/1250 [04:37<02:21, 3.12it/s] Training 1/1 epoch (loss 1.5225): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 810/1250 [04:37<02:17, 3.19it/s] Training 1/1 epoch (loss 1.4501): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 810/1250 [04:37<02:17, 3.19it/s] Training 1/1 epoch (loss 1.4501): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 811/1250 [04:37<02:15, 3.25it/s] Training 1/1 epoch (loss 1.5825): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 811/1250 [04:38<02:15, 3.25it/s] Training 1/1 epoch (loss 1.5825): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 812/1250 [04:38<02:15, 3.23it/s] Training 1/1 epoch (loss 1.6420): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 812/1250 [04:38<02:15, 3.23it/s] Training 1/1 epoch (loss 1.6420): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 813/1250 [04:38<02:19, 3.14it/s] Training 1/1 epoch (loss 1.7040): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 813/1250 [04:38<02:19, 3.14it/s] Training 1/1 epoch (loss 1.7040): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 814/1250 [04:38<02:21, 3.07it/s] Training 1/1 epoch (loss 1.5806): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 814/1250 [04:39<02:21, 3.07it/s] Training 1/1 epoch (loss 1.5806): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 815/1250 [04:39<02:15, 3.22it/s] Training 1/1 epoch (loss 1.4987): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 815/1250 [04:39<02:15, 3.22it/s] Training 1/1 epoch (loss 1.4987): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 816/1250 [04:39<02:17, 3.16it/s] Training 1/1 epoch (loss 1.5076): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 816/1250 [04:39<02:17, 3.16it/s] Training 1/1 epoch (loss 1.5076): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 817/1250 [04:39<02:16, 3.18it/s] Training 1/1 epoch (loss 1.5428): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 817/1250 [04:40<02:16, 3.18it/s] Training 1/1 epoch (loss 1.5428): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 818/1250 [04:40<02:12, 3.26it/s] Training 1/1 epoch (loss 1.4815): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 818/1250 [04:40<02:12, 3.26it/s] Training 1/1 epoch (loss 1.4815): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 819/1250 [04:40<02:16, 3.16it/s] Training 1/1 epoch (loss 1.5303): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 819/1250 [04:40<02:16, 3.16it/s] Training 1/1 epoch (loss 1.5303): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 820/1250 [04:40<02:17, 3.13it/s] Training 1/1 epoch (loss 1.4957): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 820/1250 [04:41<02:17, 3.13it/s] Training 1/1 epoch (loss 1.4957): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 821/1250 [04:41<02:15, 3.17it/s] Training 1/1 epoch (loss 1.5775): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 821/1250 [04:41<02:15, 3.17it/s] Training 1/1 epoch (loss 1.5775): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 822/1250 [04:41<02:11, 3.26it/s] Training 1/1 epoch (loss 1.4156): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 822/1250 [04:41<02:11, 3.26it/s] Training 1/1 epoch (loss 1.4156): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 823/1250 [04:41<02:10, 3.27it/s] Training 1/1 epoch (loss 1.5131): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 823/1250 [04:41<02:10, 3.27it/s] Training 1/1 epoch (loss 1.5131): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 824/1250 [04:41<02:12, 3.22it/s] Training 1/1 epoch (loss 1.5455): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 824/1250 [04:42<02:12, 3.22it/s] Training 1/1 epoch (loss 1.5455): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 825/1250 [04:42<02:26, 2.90it/s] Training 1/1 epoch (loss 1.5988): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 825/1250 [04:42<02:26, 2.90it/s] Training 1/1 epoch (loss 1.5988): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 826/1250 [04:42<02:22, 2.97it/s] Training 1/1 epoch (loss 1.4333): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 826/1250 [04:43<02:22, 2.97it/s] Training 1/1 epoch (loss 1.4333): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 827/1250 [04:43<02:23, 2.94it/s] Training 1/1 epoch (loss 1.4552): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 827/1250 [04:43<02:23, 2.94it/s] Training 1/1 epoch (loss 1.4552): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 828/1250 [04:43<02:19, 3.03it/s] Training 1/1 epoch (loss 1.5200): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 828/1250 [04:43<02:19, 3.03it/s] Training 1/1 epoch (loss 1.5200): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 829/1250 [04:43<02:15, 3.11it/s] Training 1/1 epoch (loss 1.4884): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 829/1250 [04:43<02:15, 3.11it/s] Training 1/1 epoch (loss 1.4884): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 830/1250 [04:43<02:12, 3.17it/s] Training 1/1 epoch (loss 1.4938): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 830/1250 [04:44<02:12, 3.17it/s] Training 1/1 epoch (loss 1.4938): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 831/1250 [04:44<02:11, 3.19it/s] Training 1/1 epoch (loss 1.4009): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 831/1250 [04:44<02:11, 3.19it/s] Training 1/1 epoch (loss 1.4009): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 832/1250 [04:44<02:14, 3.12it/s] Training 1/1 epoch (loss 1.6065): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 832/1250 [04:44<02:14, 3.12it/s] Training 1/1 epoch (loss 1.6065): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 833/1250 [04:44<02:21, 2.95it/s] Training 1/1 epoch (loss 1.5781): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 833/1250 [04:45<02:21, 2.95it/s] Training 1/1 epoch (loss 1.5781): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 834/1250 [04:45<02:16, 3.05it/s] Training 1/1 epoch (loss 1.5358): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 834/1250 [04:45<02:16, 3.05it/s] Training 1/1 epoch (loss 1.5358): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 835/1250 [04:45<02:20, 2.95it/s] Training 1/1 epoch (loss 1.6046): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 835/1250 [04:45<02:20, 2.95it/s] Training 1/1 epoch (loss 1.6046): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 836/1250 [04:45<02:16, 3.04it/s] Training 1/1 epoch (loss 1.4452): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 836/1250 [04:46<02:16, 3.04it/s] Training 1/1 epoch (loss 1.4452): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 837/1250 [04:46<02:11, 3.14it/s] Training 1/1 epoch (loss 1.3903): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 837/1250 [04:46<02:11, 3.14it/s] Training 1/1 epoch (loss 1.3903): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 838/1250 [04:46<02:15, 3.05it/s] Training 1/1 epoch (loss 1.5863): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 838/1250 [04:46<02:15, 3.05it/s] Training 1/1 epoch (loss 1.5863): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 839/1250 [04:46<02:13, 3.08it/s] Training 1/1 epoch (loss 1.4333): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 839/1250 [04:47<02:13, 3.08it/s] Training 1/1 epoch (loss 1.4333): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 840/1250 [04:47<02:11, 3.11it/s] Training 1/1 epoch (loss 1.5048): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 840/1250 [04:47<02:11, 3.11it/s] Training 1/1 epoch (loss 1.5048): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 841/1250 [04:47<02:09, 3.16it/s] Training 1/1 epoch (loss 1.5996): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 841/1250 [04:47<02:09, 3.16it/s] Training 1/1 epoch (loss 1.5996): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 842/1250 [04:47<02:09, 3.15it/s] Training 1/1 epoch (loss 1.6154): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 842/1250 [04:48<02:09, 3.15it/s] Training 1/1 epoch (loss 1.6154): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 843/1250 [04:48<02:08, 3.17it/s] Training 1/1 epoch (loss 1.6039): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 843/1250 [04:48<02:08, 3.17it/s] Training 1/1 epoch (loss 1.6039): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 844/1250 [04:48<02:07, 3.18it/s] Training 1/1 epoch (loss 1.4767): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 844/1250 [04:48<02:07, 3.18it/s] Training 1/1 epoch (loss 1.4767): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 845/1250 [04:48<02:08, 3.14it/s] Training 1/1 epoch (loss 1.7192): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 845/1250 [04:49<02:08, 3.14it/s] Training 1/1 epoch (loss 1.7192): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 846/1250 [04:49<02:07, 3.18it/s] Training 1/1 epoch (loss 1.5316): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 846/1250 [04:49<02:07, 3.18it/s] Training 1/1 epoch (loss 1.5316): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 847/1250 [04:49<02:04, 3.23it/s] Training 1/1 epoch (loss 1.6052): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 847/1250 [04:49<02:04, 3.23it/s] Training 1/1 epoch (loss 1.6052): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 848/1250 [04:49<02:08, 3.14it/s] Training 1/1 epoch (loss 1.5869): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 848/1250 [04:50<02:08, 3.14it/s] Training 1/1 epoch (loss 1.5869): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 849/1250 [04:50<02:08, 3.12it/s] Training 1/1 epoch (loss 1.4886): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 849/1250 [04:50<02:08, 3.12it/s] Training 1/1 epoch (loss 1.4886): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 850/1250 [04:50<02:07, 3.15it/s] Training 1/1 epoch (loss 1.5364): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 850/1250 [04:50<02:07, 3.15it/s] Training 1/1 epoch (loss 1.5364): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 851/1250 [04:50<02:12, 3.00it/s] Training 1/1 epoch (loss 1.6339): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 851/1250 [04:51<02:12, 3.00it/s] Training 1/1 epoch (loss 1.6339): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 852/1250 [04:51<02:09, 3.07it/s] Training 1/1 epoch (loss 1.4726): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 852/1250 [04:51<02:09, 3.07it/s] Training 1/1 epoch (loss 1.4726): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 853/1250 [04:51<02:07, 3.12it/s] Training 1/1 epoch (loss 1.5097): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 853/1250 [04:51<02:07, 3.12it/s] Training 1/1 epoch (loss 1.5097): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 854/1250 [04:51<02:04, 3.18it/s] Training 1/1 epoch (loss 1.5988): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 854/1250 [04:52<02:04, 3.18it/s] Training 1/1 epoch (loss 1.5988): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 855/1250 [04:52<02:09, 3.06it/s] Training 1/1 epoch (loss 1.6620): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 855/1250 [04:52<02:09, 3.06it/s] Training 1/1 epoch (loss 1.6620): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 856/1250 [04:52<02:14, 2.93it/s] Training 1/1 epoch (loss 1.5514): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 856/1250 [04:52<02:14, 2.93it/s] Training 1/1 epoch (loss 1.5514): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 857/1250 [04:52<02:15, 2.90it/s] Training 1/1 epoch (loss 1.4706): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 857/1250 [04:53<02:15, 2.90it/s] Training 1/1 epoch (loss 1.4706): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 858/1250 [04:53<02:12, 2.95it/s] Training 1/1 epoch (loss 1.5011): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 858/1250 [04:53<02:12, 2.95it/s] Training 1/1 epoch (loss 1.5011): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 859/1250 [04:53<02:09, 3.03it/s] Training 1/1 epoch (loss 1.6174): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 859/1250 [04:53<02:09, 3.03it/s] Training 1/1 epoch (loss 1.6174): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 860/1250 [04:53<02:05, 3.12it/s] Training 1/1 epoch (loss 1.5749): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 860/1250 [04:53<02:05, 3.12it/s] Training 1/1 epoch (loss 1.5749): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 861/1250 [04:53<02:03, 3.16it/s] Training 1/1 epoch (loss 1.5075): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 861/1250 [04:54<02:03, 3.16it/s] Training 1/1 epoch (loss 1.5075): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 862/1250 [04:54<02:04, 3.11it/s] Training 1/1 epoch (loss 1.5597): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 862/1250 [04:54<02:04, 3.11it/s] Training 1/1 epoch (loss 1.5597): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 863/1250 [04:54<02:03, 3.13it/s] Training 1/1 epoch (loss 1.5604): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 863/1250 [04:54<02:03, 3.13it/s] Training 1/1 epoch (loss 1.5604): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 864/1250 [04:54<02:06, 3.06it/s] Training 1/1 epoch (loss 1.3939): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 864/1250 [04:55<02:06, 3.06it/s] Training 1/1 epoch (loss 1.3939): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 865/1250 [04:55<02:04, 3.10it/s] Training 1/1 epoch (loss 1.4725): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 865/1250 [04:55<02:04, 3.10it/s] Training 1/1 epoch (loss 1.4725): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 866/1250 [04:55<02:00, 3.18it/s] Training 1/1 epoch (loss 1.5243): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 866/1250 [04:55<02:00, 3.18it/s] Training 1/1 epoch (loss 1.5243): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 867/1250 [04:55<01:59, 3.22it/s] Training 1/1 epoch (loss 1.6171): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 867/1250 [04:56<01:59, 3.22it/s] Training 1/1 epoch (loss 1.6171): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 868/1250 [04:56<01:58, 3.23it/s] Training 1/1 epoch (loss 1.6152): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 868/1250 [04:56<01:58, 3.23it/s] Training 1/1 epoch (loss 1.6152): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 869/1250 [04:56<01:58, 3.22it/s] Training 1/1 epoch (loss 1.6057): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 869/1250 [04:56<01:58, 3.22it/s] Training 1/1 epoch (loss 1.6057): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 870/1250 [04:56<02:00, 3.15it/s] Training 1/1 epoch (loss 1.4954): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 870/1250 [04:57<02:00, 3.15it/s] Training 1/1 epoch (loss 1.4954): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 871/1250 [04:57<01:58, 3.19it/s] Training 1/1 epoch (loss 1.4014): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 871/1250 [04:57<01:58, 3.19it/s] Training 1/1 epoch (loss 1.4014): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 872/1250 [04:57<01:59, 3.16it/s] Training 1/1 epoch (loss 1.5711): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 872/1250 [04:57<01:59, 3.16it/s] Training 1/1 epoch (loss 1.5711): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 873/1250 [04:57<01:58, 3.17it/s] Training 1/1 epoch (loss 1.6424): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 873/1250 [04:58<01:58, 3.17it/s] Training 1/1 epoch (loss 1.6424): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 874/1250 [04:58<01:58, 3.17it/s] Training 1/1 epoch (loss 1.4240): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 874/1250 [04:58<01:58, 3.17it/s] Training 1/1 epoch (loss 1.4240): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 875/1250 [04:58<01:56, 3.22it/s] Training 1/1 epoch (loss 1.5732): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 875/1250 [04:58<01:56, 3.22it/s] Training 1/1 epoch (loss 1.5732): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 876/1250 [04:58<01:56, 3.21it/s] Training 1/1 epoch (loss 1.6116): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 876/1250 [04:59<01:56, 3.21it/s] Training 1/1 epoch (loss 1.6116): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 877/1250 [04:59<01:58, 3.14it/s] Training 1/1 epoch (loss 1.4791): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 877/1250 [04:59<01:58, 3.14it/s] Training 1/1 epoch (loss 1.4791): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 878/1250 [04:59<01:56, 3.21it/s] Training 1/1 epoch (loss 1.5430): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 878/1250 [04:59<01:56, 3.21it/s] Training 1/1 epoch (loss 1.5430): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 879/1250 [04:59<01:54, 3.24it/s] Training 1/1 epoch (loss 1.5690): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 879/1250 [05:00<01:54, 3.24it/s] Training 1/1 epoch (loss 1.5690): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 880/1250 [05:00<02:12, 2.80it/s] Training 1/1 epoch (loss 1.4871): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 880/1250 [05:00<02:12, 2.80it/s] Training 1/1 epoch (loss 1.4871): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 881/1250 [05:00<02:06, 2.92it/s] Training 1/1 epoch (loss 1.4610): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 881/1250 [05:00<02:06, 2.92it/s] Training 1/1 epoch (loss 1.4610): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 882/1250 [05:00<02:06, 2.92it/s] Training 1/1 epoch (loss 1.4793): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 882/1250 [05:01<02:06, 2.92it/s] Training 1/1 epoch (loss 1.4793): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 883/1250 [05:01<02:01, 3.01it/s] Training 1/1 epoch (loss 1.5542): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 883/1250 [05:01<02:01, 3.01it/s] Training 1/1 epoch (loss 1.5542): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 884/1250 [05:01<02:02, 3.00it/s] Training 1/1 epoch (loss 1.4983): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 884/1250 [05:01<02:02, 3.00it/s] Training 1/1 epoch (loss 1.4983): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 885/1250 [05:01<01:58, 3.07it/s] Training 1/1 epoch (loss 1.5561): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 885/1250 [05:02<01:58, 3.07it/s] Training 1/1 epoch (loss 1.5561): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 886/1250 [05:02<01:58, 3.08it/s] Training 1/1 epoch (loss 1.4674): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 886/1250 [05:02<01:58, 3.08it/s] Training 1/1 epoch (loss 1.4674): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 887/1250 [05:02<01:56, 3.11it/s] Training 1/1 epoch (loss 1.4388): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 887/1250 [05:02<01:56, 3.11it/s] Training 1/1 epoch (loss 1.4388): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 888/1250 [05:02<02:03, 2.94it/s] Training 1/1 epoch (loss 1.5322): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 888/1250 [05:03<02:03, 2.94it/s] Training 1/1 epoch (loss 1.5322): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 889/1250 [05:03<02:01, 2.97it/s] Training 1/1 epoch (loss 1.6113): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 889/1250 [05:03<02:01, 2.97it/s] Training 1/1 epoch (loss 1.6113): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 890/1250 [05:03<01:59, 3.02it/s] Training 1/1 epoch (loss 1.4490): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 890/1250 [05:03<01:59, 3.02it/s] Training 1/1 epoch (loss 1.4490): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 891/1250 [05:03<01:55, 3.10it/s] Training 1/1 epoch (loss 1.5376): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 891/1250 [05:03<01:55, 3.10it/s] Training 1/1 epoch (loss 1.5376): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 892/1250 [05:03<01:52, 3.17it/s] Training 1/1 epoch (loss 1.4771): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 892/1250 [05:04<01:52, 3.17it/s] Training 1/1 epoch (loss 1.4771): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 893/1250 [05:04<01:54, 3.12it/s] Training 1/1 epoch (loss 1.3834): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 893/1250 [05:04<01:54, 3.12it/s] Training 1/1 epoch (loss 1.3834): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 894/1250 [05:04<01:54, 3.12it/s] Training 1/1 epoch (loss 1.5614): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 894/1250 [05:04<01:54, 3.12it/s] Training 1/1 epoch (loss 1.5614): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 895/1250 [05:04<01:55, 3.08it/s] Training 1/1 epoch (loss 1.6370): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 895/1250 [05:05<01:55, 3.08it/s] Training 1/1 epoch (loss 1.6370): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 896/1250 [05:05<01:55, 3.07it/s] Training 1/1 epoch (loss 1.5142): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 896/1250 [05:05<01:55, 3.07it/s] Training 1/1 epoch (loss 1.5142): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 897/1250 [05:05<01:52, 3.14it/s] Training 1/1 epoch (loss 1.5525): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 897/1250 [05:05<01:52, 3.14it/s] Training 1/1 epoch (loss 1.5525): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 898/1250 [05:05<01:51, 3.15it/s] Training 1/1 epoch (loss 1.5946): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 898/1250 [05:06<01:51, 3.15it/s] Training 1/1 epoch (loss 1.5946): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 899/1250 [05:06<01:48, 3.23it/s] Training 1/1 epoch (loss 1.6162): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 899/1250 [05:06<01:48, 3.23it/s] Training 1/1 epoch (loss 1.6162): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 900/1250 [05:06<01:54, 3.05it/s] Training 1/1 epoch (loss 1.4959): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 900/1250 [05:06<01:54, 3.05it/s] Training 1/1 epoch (loss 1.4959): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 901/1250 [05:06<01:56, 3.00it/s] Training 1/1 epoch (loss 1.5590): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 901/1250 [05:07<01:56, 3.00it/s] Training 1/1 epoch (loss 1.5590): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 902/1250 [05:07<01:52, 3.10it/s] Training 1/1 epoch (loss 1.5183): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 902/1250 [05:07<01:52, 3.10it/s] Training 1/1 epoch (loss 1.5183): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 903/1250 [05:07<01:50, 3.15it/s] Training 1/1 epoch (loss 1.5360): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 903/1250 [05:07<01:50, 3.15it/s] Training 1/1 epoch (loss 1.5360): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 904/1250 [05:07<01:49, 3.16it/s] Training 1/1 epoch (loss 1.5303): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 904/1250 [05:08<01:49, 3.16it/s] Training 1/1 epoch (loss 1.5303): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 905/1250 [05:08<01:50, 3.13it/s] Training 1/1 epoch (loss 1.4621): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 905/1250 [05:08<01:50, 3.13it/s] Training 1/1 epoch (loss 1.4621): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 906/1250 [05:08<01:51, 3.08it/s] Training 1/1 epoch (loss 1.5248): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 906/1250 [05:08<01:51, 3.08it/s] Training 1/1 epoch (loss 1.5248): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 907/1250 [05:08<01:50, 3.09it/s] Training 1/1 epoch (loss 1.6044): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 907/1250 [05:09<01:50, 3.09it/s] Training 1/1 epoch (loss 1.6044): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 908/1250 [05:09<01:51, 3.06it/s] Training 1/1 epoch (loss 1.5584): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 908/1250 [05:09<01:51, 3.06it/s] Training 1/1 epoch (loss 1.5584): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 909/1250 [05:09<01:48, 3.13it/s] Training 1/1 epoch (loss 1.5289): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 909/1250 [05:09<01:48, 3.13it/s] Training 1/1 epoch (loss 1.5289): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 910/1250 [05:09<01:46, 3.18it/s] Training 1/1 epoch (loss 1.5895): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 910/1250 [05:10<01:46, 3.18it/s] Training 1/1 epoch (loss 1.5895): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 911/1250 [05:10<01:50, 3.08it/s] Training 1/1 epoch (loss 1.5291): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 911/1250 [05:10<01:50, 3.08it/s] Training 1/1 epoch (loss 1.5291): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 912/1250 [05:10<01:55, 2.93it/s] Training 1/1 epoch (loss 1.5648): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 912/1250 [05:10<01:55, 2.93it/s] Training 1/1 epoch (loss 1.5648): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 913/1250 [05:10<02:02, 2.75it/s] Training 1/1 epoch (loss 1.5342): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 913/1250 [05:11<02:02, 2.75it/s] Training 1/1 epoch (loss 1.5342): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 914/1250 [05:11<01:55, 2.90it/s] Training 1/1 epoch (loss 1.4782): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 914/1250 [05:11<01:55, 2.90it/s] Training 1/1 epoch (loss 1.4782): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 915/1250 [05:11<01:52, 2.99it/s] Training 1/1 epoch (loss 1.4819): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 915/1250 [05:11<01:52, 2.99it/s] Training 1/1 epoch (loss 1.4819): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 916/1250 [05:11<01:57, 2.84it/s] Training 1/1 epoch (loss 1.5668): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 916/1250 [05:12<01:57, 2.84it/s] Training 1/1 epoch (loss 1.5668): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 917/1250 [05:12<01:54, 2.92it/s] Training 1/1 epoch (loss 1.5875): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 917/1250 [05:12<01:54, 2.92it/s] Training 1/1 epoch (loss 1.5875): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 918/1250 [05:12<01:50, 3.01it/s] Training 1/1 epoch (loss 1.5592): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 918/1250 [05:12<01:50, 3.01it/s] Training 1/1 epoch (loss 1.5592): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 919/1250 [05:12<01:56, 2.85it/s] Training 1/1 epoch (loss 1.6008): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 919/1250 [05:13<01:56, 2.85it/s] Training 1/1 epoch (loss 1.6008): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 920/1250 [05:13<01:54, 2.87it/s] Training 1/1 epoch (loss 1.5358): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 920/1250 [05:13<01:54, 2.87it/s] Training 1/1 epoch (loss 1.5358): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 921/1250 [05:13<01:50, 2.97it/s] Training 1/1 epoch (loss 1.6106): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 921/1250 [05:13<01:50, 2.97it/s] Training 1/1 epoch (loss 1.6106): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 922/1250 [05:13<01:48, 3.02it/s] Training 1/1 epoch (loss 1.4685): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 922/1250 [05:14<01:48, 3.02it/s] Training 1/1 epoch (loss 1.4685): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 923/1250 [05:14<01:45, 3.10it/s] Training 1/1 epoch (loss 1.6671): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 923/1250 [05:14<01:45, 3.10it/s] Training 1/1 epoch (loss 1.6671): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 924/1250 [05:14<01:43, 3.14it/s] Training 1/1 epoch (loss 1.5343): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 924/1250 [05:14<01:43, 3.14it/s] Training 1/1 epoch (loss 1.5343): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 925/1250 [05:14<01:46, 3.04it/s] Training 1/1 epoch (loss 1.4507): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 925/1250 [05:15<01:46, 3.04it/s] Training 1/1 epoch (loss 1.4507): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 926/1250 [05:15<01:43, 3.14it/s] Training 1/1 epoch (loss 1.5125): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 926/1250 [05:15<01:43, 3.14it/s] Training 1/1 epoch (loss 1.5125): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 927/1250 [05:15<01:42, 3.16it/s] Training 1/1 epoch (loss 1.4777): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 927/1250 [05:15<01:42, 3.16it/s] Training 1/1 epoch (loss 1.4777): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 928/1250 [05:15<01:43, 3.11it/s] Training 1/1 epoch (loss 1.4645): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 928/1250 [05:16<01:43, 3.11it/s] Training 1/1 epoch (loss 1.4645): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 929/1250 [05:16<01:43, 3.11it/s] Training 1/1 epoch (loss 1.5281): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 929/1250 [05:16<01:43, 3.11it/s] Training 1/1 epoch (loss 1.5281): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 930/1250 [05:16<01:41, 3.17it/s] Training 1/1 epoch (loss 1.6200): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 930/1250 [05:16<01:41, 3.17it/s] Training 1/1 epoch (loss 1.6200): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 931/1250 [05:16<01:40, 3.17it/s] Training 1/1 epoch (loss 1.6711): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 931/1250 [05:17<01:40, 3.17it/s] Training 1/1 epoch (loss 1.6711): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 932/1250 [05:17<01:44, 3.04it/s] Training 1/1 epoch (loss 1.5022): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 932/1250 [05:17<01:44, 3.04it/s] Training 1/1 epoch (loss 1.5022): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 933/1250 [05:17<01:42, 3.09it/s] Training 1/1 epoch (loss 1.4835): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 933/1250 [05:17<01:42, 3.09it/s] Training 1/1 epoch (loss 1.4835): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 934/1250 [05:17<01:41, 3.10it/s] Training 1/1 epoch (loss 1.4880): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 934/1250 [05:18<01:41, 3.10it/s] Training 1/1 epoch (loss 1.4880): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 935/1250 [05:18<01:40, 3.14it/s] Training 1/1 epoch (loss 1.4890): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 935/1250 [05:18<01:40, 3.14it/s] Training 1/1 epoch (loss 1.4890): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 936/1250 [05:18<01:41, 3.10it/s] Training 1/1 epoch (loss 1.5871): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 936/1250 [05:18<01:41, 3.10it/s] Training 1/1 epoch (loss 1.5871): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 937/1250 [05:18<01:43, 3.03it/s] Training 1/1 epoch (loss 1.4271): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 937/1250 [05:19<01:43, 3.03it/s] Training 1/1 epoch (loss 1.4271): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 938/1250 [05:19<01:40, 3.11it/s] Training 1/1 epoch (loss 1.6059): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 938/1250 [05:19<01:40, 3.11it/s] Training 1/1 epoch (loss 1.6059): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 939/1250 [05:19<01:40, 3.09it/s] Training 1/1 epoch (loss 1.6016): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 939/1250 [05:19<01:40, 3.09it/s] Training 1/1 epoch (loss 1.6016): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 940/1250 [05:19<01:37, 3.17it/s] Training 1/1 epoch (loss 1.6302): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 940/1250 [05:19<01:37, 3.17it/s] Training 1/1 epoch (loss 1.6302): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 941/1250 [05:19<01:36, 3.20it/s] Training 1/1 epoch (loss 1.4554): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 941/1250 [05:20<01:36, 3.20it/s] Training 1/1 epoch (loss 1.4554): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 942/1250 [05:20<01:38, 3.14it/s] Training 1/1 epoch (loss 1.4011): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 942/1250 [05:20<01:38, 3.14it/s] Training 1/1 epoch (loss 1.4011): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 943/1250 [05:20<01:38, 3.10it/s] Training 1/1 epoch (loss 1.5023): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 943/1250 [05:20<01:38, 3.10it/s] Training 1/1 epoch (loss 1.5023): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 944/1250 [05:20<01:40, 3.04it/s] Training 1/1 epoch (loss 1.5837): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 944/1250 [05:21<01:40, 3.04it/s] Training 1/1 epoch (loss 1.5837): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 945/1250 [05:21<01:38, 3.08it/s] Training 1/1 epoch (loss 1.5377): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 945/1250 [05:21<01:38, 3.08it/s] Training 1/1 epoch (loss 1.5377): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 946/1250 [05:21<01:36, 3.14it/s] Training 1/1 epoch (loss 1.5812): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 946/1250 [05:21<01:36, 3.14it/s] Training 1/1 epoch (loss 1.5812): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 947/1250 [05:21<01:33, 3.23it/s] Training 1/1 epoch (loss 1.4544): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 947/1250 [05:22<01:33, 3.23it/s] Training 1/1 epoch (loss 1.4544): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 948/1250 [05:22<01:37, 3.11it/s] Training 1/1 epoch (loss 1.5271): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 948/1250 [05:22<01:37, 3.11it/s] Training 1/1 epoch (loss 1.5271): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 949/1250 [05:22<01:39, 3.03it/s] Training 1/1 epoch (loss 1.6679): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 949/1250 [05:22<01:39, 3.03it/s] Training 1/1 epoch (loss 1.6679): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 950/1250 [05:22<01:38, 3.04it/s] Training 1/1 epoch (loss 1.5736): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 950/1250 [05:23<01:38, 3.04it/s] Training 1/1 epoch (loss 1.5736): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 951/1250 [05:23<01:43, 2.88it/s] Training 1/1 epoch (loss 1.4502): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 951/1250 [05:23<01:43, 2.88it/s] Training 1/1 epoch (loss 1.4502): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 952/1250 [05:23<01:41, 2.95it/s] Training 1/1 epoch (loss 1.4816): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 952/1250 [05:23<01:41, 2.95it/s] Training 1/1 epoch (loss 1.4816): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 953/1250 [05:23<01:38, 3.01it/s] Training 1/1 epoch (loss 1.6781): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 953/1250 [05:24<01:38, 3.01it/s] Training 1/1 epoch (loss 1.6781): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 954/1250 [05:24<01:35, 3.09it/s] Training 1/1 epoch (loss 1.5696): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 954/1250 [05:24<01:35, 3.09it/s] Training 1/1 epoch (loss 1.5696): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 955/1250 [05:24<01:33, 3.15it/s] Training 1/1 epoch (loss 1.5603): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 955/1250 [05:24<01:33, 3.15it/s] Training 1/1 epoch (loss 1.5603): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 956/1250 [05:24<01:33, 3.13it/s] Training 1/1 epoch (loss 1.5149): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 956/1250 [05:25<01:33, 3.13it/s] Training 1/1 epoch (loss 1.5149): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 957/1250 [05:25<01:33, 3.15it/s] Training 1/1 epoch (loss 1.4539): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 957/1250 [05:25<01:33, 3.15it/s] Training 1/1 epoch (loss 1.4539): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 958/1250 [05:25<01:31, 3.18it/s] Training 1/1 epoch (loss 1.4160): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 958/1250 [05:25<01:31, 3.18it/s] Training 1/1 epoch (loss 1.4160): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 959/1250 [05:25<01:29, 3.24it/s] Training 1/1 epoch (loss 1.5162): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 959/1250 [05:26<01:29, 3.24it/s] Training 1/1 epoch (loss 1.5162): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 960/1250 [05:26<01:30, 3.20it/s] Training 1/1 epoch (loss 1.5933): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 960/1250 [05:26<01:30, 3.20it/s] Training 1/1 epoch (loss 1.5933): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 961/1250 [05:26<01:31, 3.17it/s] Training 1/1 epoch (loss 1.6407): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 961/1250 [05:26<01:31, 3.17it/s] Training 1/1 epoch (loss 1.6407): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 962/1250 [05:26<01:31, 3.16it/s] Training 1/1 epoch (loss 1.4632): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 962/1250 [05:27<01:31, 3.16it/s] Training 1/1 epoch (loss 1.4632): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 963/1250 [05:27<01:30, 3.16it/s] Training 1/1 epoch (loss 1.5920): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 963/1250 [05:27<01:30, 3.16it/s] Training 1/1 epoch (loss 1.5920): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 964/1250 [05:27<01:29, 3.19it/s] Training 1/1 epoch (loss 1.5674): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 964/1250 [05:27<01:29, 3.19it/s] Training 1/1 epoch (loss 1.5674): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 965/1250 [05:27<01:32, 3.10it/s] Training 1/1 epoch (loss 1.4878): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 965/1250 [05:28<01:32, 3.10it/s] Training 1/1 epoch (loss 1.4878): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 966/1250 [05:28<01:30, 3.12it/s] Training 1/1 epoch (loss 1.5906): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 966/1250 [05:28<01:30, 3.12it/s] Training 1/1 epoch (loss 1.5906): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 967/1250 [05:28<01:32, 3.07it/s] Training 1/1 epoch (loss 1.5782): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 967/1250 [05:28<01:32, 3.07it/s] Training 1/1 epoch (loss 1.5782): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 968/1250 [05:28<01:32, 3.04it/s] Training 1/1 epoch (loss 1.4705): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 968/1250 [05:29<01:32, 3.04it/s] Training 1/1 epoch (loss 1.4705): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 969/1250 [05:29<01:31, 3.08it/s] Training 1/1 epoch (loss 1.5298): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 969/1250 [05:29<01:31, 3.08it/s] Training 1/1 epoch (loss 1.5298): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 970/1250 [05:29<01:29, 3.12it/s] Training 1/1 epoch (loss 1.5548): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 970/1250 [05:29<01:29, 3.12it/s] Training 1/1 epoch (loss 1.5548): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 971/1250 [05:29<01:28, 3.15it/s] Training 1/1 epoch (loss 1.4463): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 971/1250 [05:29<01:28, 3.15it/s] Training 1/1 epoch (loss 1.4463): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 972/1250 [05:29<01:29, 3.11it/s] Training 1/1 epoch (loss 1.5256): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 972/1250 [05:30<01:29, 3.11it/s] Training 1/1 epoch (loss 1.5256): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 973/1250 [05:30<01:35, 2.89it/s] Training 1/1 epoch (loss 1.4914): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 973/1250 [05:30<01:35, 2.89it/s] Training 1/1 epoch (loss 1.4914): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 974/1250 [05:30<01:34, 2.93it/s] Training 1/1 epoch (loss 1.4422): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 974/1250 [05:31<01:34, 2.93it/s] Training 1/1 epoch (loss 1.4422): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 975/1250 [05:31<01:32, 2.98it/s] Training 1/1 epoch (loss 1.4502): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 975/1250 [05:31<01:32, 2.98it/s] Training 1/1 epoch (loss 1.4502): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 976/1250 [05:31<01:29, 3.07it/s] Training 1/1 epoch (loss 1.5569): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 976/1250 [05:31<01:29, 3.07it/s] Training 1/1 epoch (loss 1.5569): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 977/1250 [05:31<01:28, 3.10it/s] Training 1/1 epoch (loss 1.5337): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 977/1250 [05:31<01:28, 3.10it/s] Training 1/1 epoch (loss 1.5337): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 978/1250 [05:31<01:26, 3.13it/s] Training 1/1 epoch (loss 1.4641): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 978/1250 [05:32<01:26, 3.13it/s] Training 1/1 epoch (loss 1.4641): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 979/1250 [05:32<01:24, 3.21it/s] Training 1/1 epoch (loss 1.6305): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 979/1250 [05:32<01:24, 3.21it/s] Training 1/1 epoch (loss 1.6305): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 980/1250 [05:32<01:24, 3.20it/s] Training 1/1 epoch (loss 1.5961): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 980/1250 [05:32<01:24, 3.20it/s] Training 1/1 epoch (loss 1.5961): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 981/1250 [05:32<01:27, 3.06it/s] Training 1/1 epoch (loss 1.5834): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 981/1250 [05:33<01:27, 3.06it/s] Training 1/1 epoch (loss 1.5834): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 982/1250 [05:33<01:44, 2.57it/s] Training 1/1 epoch (loss 1.4991): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 982/1250 [05:33<01:44, 2.57it/s] Training 1/1 epoch (loss 1.4991): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 983/1250 [05:33<01:37, 2.73it/s] Training 1/1 epoch (loss 1.5029): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 983/1250 [05:34<01:37, 2.73it/s] Training 1/1 epoch (loss 1.5029): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 984/1250 [05:34<01:34, 2.83it/s] Training 1/1 epoch (loss 1.6770): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 984/1250 [05:34<01:34, 2.83it/s] Training 1/1 epoch (loss 1.6770): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 985/1250 [05:34<01:30, 2.94it/s] Training 1/1 epoch (loss 1.4782): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 985/1250 [05:34<01:30, 2.94it/s] Training 1/1 epoch (loss 1.4782): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 986/1250 [05:34<01:27, 3.00it/s] Training 1/1 epoch (loss 1.5329): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 986/1250 [05:35<01:27, 3.00it/s] Training 1/1 epoch (loss 1.5329): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 987/1250 [05:35<01:30, 2.89it/s] Training 1/1 epoch (loss 1.4046): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 987/1250 [05:35<01:30, 2.89it/s] Training 1/1 epoch (loss 1.4046): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 988/1250 [05:35<01:28, 2.97it/s] Training 1/1 epoch (loss 1.4223): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 988/1250 [05:35<01:28, 2.97it/s] Training 1/1 epoch (loss 1.4223): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 989/1250 [05:35<01:25, 3.06it/s] Training 1/1 epoch (loss 1.5294): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 989/1250 [05:36<01:25, 3.06it/s] Training 1/1 epoch (loss 1.5294): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 990/1250 [05:36<01:23, 3.12it/s] Training 1/1 epoch (loss 1.3583): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 990/1250 [05:36<01:23, 3.12it/s] Training 1/1 epoch (loss 1.3583): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 991/1250 [05:36<01:23, 3.11it/s] Training 1/1 epoch (loss 1.5677): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 991/1250 [05:36<01:23, 3.11it/s] Training 1/1 epoch (loss 1.5677): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 992/1250 [05:36<01:25, 3.02it/s] Training 1/1 epoch (loss 1.5283): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 992/1250 [05:37<01:25, 3.02it/s] Training 1/1 epoch (loss 1.5283): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 993/1250 [05:37<01:25, 3.02it/s] Training 1/1 epoch (loss 1.5871): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 993/1250 [05:37<01:25, 3.02it/s] Training 1/1 epoch (loss 1.5871): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 994/1250 [05:37<01:22, 3.11it/s] Training 1/1 epoch (loss 1.5409): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 994/1250 [05:37<01:22, 3.11it/s] Training 1/1 epoch (loss 1.5409): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 995/1250 [05:37<01:21, 3.12it/s] Training 1/1 epoch (loss 1.5137): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 995/1250 [05:37<01:21, 3.12it/s] Training 1/1 epoch (loss 1.5137): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 996/1250 [05:37<01:18, 3.22it/s] Training 1/1 epoch (loss 1.4877): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 996/1250 [05:38<01:18, 3.22it/s] Training 1/1 epoch (loss 1.4877): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 997/1250 [05:38<01:18, 3.21it/s] Training 1/1 epoch (loss 1.5266): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 997/1250 [05:38<01:18, 3.21it/s] Training 1/1 epoch (loss 1.5266): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 998/1250 [05:38<01:19, 3.15it/s] Training 1/1 epoch (loss 1.5034): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 998/1250 [05:38<01:19, 3.15it/s] Training 1/1 epoch (loss 1.5034): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 999/1250 [05:38<01:22, 3.04it/s] Training 1/1 epoch (loss 1.4889): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 999/1250 [05:39<01:22, 3.04it/s] Training 1/1 epoch (loss 1.4889): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1000/1250 [05:39<01:22, 3.02it/s] Training 1/1 epoch (loss 1.3863): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1000/1250 [05:39<01:22, 3.02it/s] Training 1/1 epoch (loss 1.3863): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1001/1250 [05:39<01:21, 3.06it/s] Training 1/1 epoch (loss 1.5961): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1001/1250 [05:39<01:21, 3.06it/s] Training 1/1 epoch (loss 1.5961): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1002/1250 [05:39<01:18, 3.17it/s] Training 1/1 epoch (loss 1.4582): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1002/1250 [05:40<01:18, 3.17it/s] Training 1/1 epoch (loss 1.4582): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1003/1250 [05:40<01:16, 3.22it/s] Training 1/1 epoch (loss 1.4236): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1003/1250 [05:40<01:16, 3.22it/s] Training 1/1 epoch (loss 1.4236): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1004/1250 [05:40<01:17, 3.16it/s] Training 1/1 epoch (loss 1.5849): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1004/1250 [05:40<01:17, 3.16it/s] Training 1/1 epoch (loss 1.5849): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1005/1250 [05:40<01:16, 3.21it/s] Training 1/1 epoch (loss 1.4895): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1005/1250 [05:41<01:16, 3.21it/s] Training 1/1 epoch (loss 1.4895): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1006/1250 [05:41<01:17, 3.14it/s] Training 1/1 epoch (loss 1.5844): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1006/1250 [05:41<01:17, 3.14it/s] Training 1/1 epoch (loss 1.5844): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1007/1250 [05:41<01:17, 3.15it/s] Training 1/1 epoch (loss 1.4831): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1007/1250 [05:41<01:17, 3.15it/s] Training 1/1 epoch (loss 1.4831): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1008/1250 [05:41<01:17, 3.11it/s] Training 1/1 epoch (loss 1.5491): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1008/1250 [05:42<01:17, 3.11it/s] Training 1/1 epoch (loss 1.5491): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1009/1250 [05:42<01:15, 3.21it/s] Training 1/1 epoch (loss 1.4080): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1009/1250 [05:42<01:15, 3.21it/s] Training 1/1 epoch (loss 1.4080): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1010/1250 [05:42<01:16, 3.15it/s] Training 1/1 epoch (loss 1.5406): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1010/1250 [05:42<01:16, 3.15it/s] Training 1/1 epoch (loss 1.5406): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1011/1250 [05:42<01:14, 3.22it/s] Training 1/1 epoch (loss 1.4998): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1011/1250 [05:43<01:14, 3.22it/s] Training 1/1 epoch (loss 1.4998): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1012/1250 [05:43<01:14, 3.18it/s] Training 1/1 epoch (loss 1.4927): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1012/1250 [05:43<01:14, 3.18it/s] Training 1/1 epoch (loss 1.4927): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1013/1250 [05:43<01:16, 3.12it/s] Training 1/1 epoch (loss 1.5068): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1013/1250 [05:43<01:16, 3.12it/s] Training 1/1 epoch (loss 1.5068): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1014/1250 [05:43<01:17, 3.03it/s] Training 1/1 epoch (loss 1.4415): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1014/1250 [05:44<01:17, 3.03it/s] Training 1/1 epoch (loss 1.4415): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1015/1250 [05:44<01:17, 3.03it/s] Training 1/1 epoch (loss 1.4476): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1015/1250 [05:44<01:17, 3.03it/s] Training 1/1 epoch (loss 1.4476): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1016/1250 [05:44<01:17, 3.00it/s] Training 1/1 epoch (loss 1.4620): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1016/1250 [05:44<01:17, 3.00it/s] Training 1/1 epoch (loss 1.4620): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1017/1250 [05:44<01:16, 3.04it/s] Training 1/1 epoch (loss 1.5235): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1017/1250 [05:45<01:16, 3.04it/s] Training 1/1 epoch (loss 1.5235): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1018/1250 [05:45<01:20, 2.88it/s] Training 1/1 epoch (loss 1.5125): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1018/1250 [05:45<01:20, 2.88it/s] Training 1/1 epoch (loss 1.5125): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1019/1250 [05:45<01:18, 2.96it/s] Training 1/1 epoch (loss 1.5321): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1019/1250 [05:45<01:18, 2.96it/s] Training 1/1 epoch (loss 1.5321): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1020/1250 [05:45<01:15, 3.05it/s] Training 1/1 epoch (loss 1.4380): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1020/1250 [05:46<01:15, 3.05it/s] Training 1/1 epoch (loss 1.4380): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1021/1250 [05:46<01:12, 3.14it/s] Training 1/1 epoch (loss 1.4302): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1021/1250 [05:46<01:12, 3.14it/s] Training 1/1 epoch (loss 1.4302): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1022/1250 [05:46<01:13, 3.09it/s] Training 1/1 epoch (loss 1.4822): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1022/1250 [05:46<01:13, 3.09it/s] Training 1/1 epoch (loss 1.4822): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1023/1250 [05:46<01:14, 3.04it/s] Training 1/1 epoch (loss 1.4193): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1023/1250 [05:47<01:14, 3.04it/s] Training 1/1 epoch (loss 1.4193): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1024/1250 [05:47<01:15, 2.99it/s] Training 1/1 epoch (loss 1.5143): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1024/1250 [05:47<01:15, 2.99it/s] Training 1/1 epoch (loss 1.5143): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1025/1250 [05:47<01:13, 3.07it/s] Training 1/1 epoch (loss 1.5920): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1025/1250 [05:47<01:13, 3.07it/s] Training 1/1 epoch (loss 1.5920): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1026/1250 [05:47<01:11, 3.13it/s] Training 1/1 epoch (loss 1.5817): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1026/1250 [05:47<01:11, 3.13it/s] Training 1/1 epoch (loss 1.5817): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1027/1250 [05:47<01:11, 3.11it/s] Training 1/1 epoch (loss 1.5970): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1027/1250 [05:48<01:11, 3.11it/s] Training 1/1 epoch (loss 1.5970): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1028/1250 [05:48<01:10, 3.13it/s] Training 1/1 epoch (loss 1.5138): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1028/1250 [05:48<01:10, 3.13it/s] Training 1/1 epoch (loss 1.5138): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1029/1250 [05:48<01:13, 3.02it/s] Training 1/1 epoch (loss 1.5569): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1029/1250 [05:48<01:13, 3.02it/s] Training 1/1 epoch (loss 1.5569): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1030/1250 [05:48<01:13, 3.00it/s] Training 1/1 epoch (loss 1.5531): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1030/1250 [05:49<01:13, 3.00it/s] Training 1/1 epoch (loss 1.5531): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1031/1250 [05:49<01:11, 3.04it/s] Training 1/1 epoch (loss 1.5214): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1031/1250 [05:49<01:11, 3.04it/s] Training 1/1 epoch (loss 1.5214): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1032/1250 [05:49<01:10, 3.08it/s] Training 1/1 epoch (loss 1.5384): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1032/1250 [05:49<01:10, 3.08it/s] Training 1/1 epoch (loss 1.5384): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1033/1250 [05:49<01:09, 3.11it/s] Training 1/1 epoch (loss 1.5956): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1033/1250 [05:50<01:09, 3.11it/s] Training 1/1 epoch (loss 1.5956): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1034/1250 [05:50<01:08, 3.15it/s] Training 1/1 epoch (loss 1.5909): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1034/1250 [05:50<01:08, 3.15it/s] Training 1/1 epoch (loss 1.5909): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1035/1250 [05:50<01:07, 3.18it/s] Training 1/1 epoch (loss 1.4137): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1035/1250 [05:50<01:07, 3.18it/s] Training 1/1 epoch (loss 1.4137): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1036/1250 [05:50<01:07, 3.16it/s] Training 1/1 epoch (loss 1.6276): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1036/1250 [05:51<01:07, 3.16it/s] Training 1/1 epoch (loss 1.6276): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1037/1250 [05:51<01:08, 3.12it/s] Training 1/1 epoch (loss 1.3863): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1037/1250 [05:51<01:08, 3.12it/s] Training 1/1 epoch (loss 1.3863): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1038/1250 [05:51<01:07, 3.14it/s] Training 1/1 epoch (loss 1.4579): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1038/1250 [05:51<01:07, 3.14it/s] Training 1/1 epoch (loss 1.4579): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1039/1250 [05:51<01:06, 3.16it/s] Training 1/1 epoch (loss 1.5178): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1039/1250 [05:52<01:06, 3.16it/s] Training 1/1 epoch (loss 1.5178): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1040/1250 [05:52<01:06, 3.14it/s] Training 1/1 epoch (loss 1.6098): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1040/1250 [05:52<01:06, 3.14it/s] Training 1/1 epoch (loss 1.6098): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1041/1250 [05:52<01:09, 3.02it/s] Training 1/1 epoch (loss 1.4959): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1041/1250 [05:52<01:09, 3.02it/s] Training 1/1 epoch (loss 1.4959): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1042/1250 [05:52<01:11, 2.93it/s] Training 1/1 epoch (loss 1.5264): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1042/1250 [05:53<01:11, 2.93it/s] Training 1/1 epoch (loss 1.5264): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1043/1250 [05:53<01:08, 3.00it/s] Training 1/1 epoch (loss 1.5144): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1043/1250 [05:53<01:08, 3.00it/s] Training 1/1 epoch (loss 1.5144): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1044/1250 [05:53<01:08, 3.03it/s] Training 1/1 epoch (loss 1.5125): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1044/1250 [05:53<01:08, 3.03it/s] Training 1/1 epoch (loss 1.5125): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1045/1250 [05:53<01:11, 2.86it/s] Training 1/1 epoch (loss 1.4989): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1045/1250 [05:54<01:11, 2.86it/s] Training 1/1 epoch (loss 1.4989): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1046/1250 [05:54<01:08, 3.00it/s] Training 1/1 epoch (loss 1.4628): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1046/1250 [05:54<01:08, 3.00it/s] Training 1/1 epoch (loss 1.4628): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1047/1250 [05:54<01:06, 3.04it/s] Training 1/1 epoch (loss 1.5085): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1047/1250 [05:54<01:06, 3.04it/s] Training 1/1 epoch (loss 1.5085): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1048/1250 [05:54<01:06, 3.06it/s] Training 1/1 epoch (loss 1.5718): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1048/1250 [05:55<01:06, 3.06it/s] Training 1/1 epoch (loss 1.5718): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1049/1250 [05:55<01:05, 3.06it/s] Training 1/1 epoch (loss 1.4702): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1049/1250 [05:55<01:05, 3.06it/s] Training 1/1 epoch (loss 1.4702): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1050/1250 [05:55<01:04, 3.11it/s] Training 1/1 epoch (loss 1.5218): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1050/1250 [05:55<01:04, 3.11it/s] Training 1/1 epoch (loss 1.5218): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1051/1250 [05:55<01:02, 3.21it/s] Training 1/1 epoch (loss 1.6182): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1051/1250 [05:56<01:02, 3.21it/s] Training 1/1 epoch (loss 1.6182): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1052/1250 [05:56<01:01, 3.22it/s] Training 1/1 epoch (loss 1.4541): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1052/1250 [05:56<01:01, 3.22it/s] Training 1/1 epoch (loss 1.4541): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1053/1250 [05:56<01:01, 3.20it/s] Training 1/1 epoch (loss 1.5987): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1053/1250 [05:56<01:01, 3.20it/s] Training 1/1 epoch (loss 1.5987): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1054/1250 [05:56<01:00, 3.23it/s] Training 1/1 epoch (loss 1.4995): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1054/1250 [05:57<01:00, 3.23it/s] Training 1/1 epoch (loss 1.4995): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1055/1250 [05:57<01:02, 3.11it/s] Training 1/1 epoch (loss 1.4815): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1055/1250 [05:57<01:02, 3.11it/s] Training 1/1 epoch (loss 1.4815): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1056/1250 [05:57<01:02, 3.09it/s] Training 1/1 epoch (loss 1.4193): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1056/1250 [05:57<01:02, 3.09it/s] Training 1/1 epoch (loss 1.4193): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1057/1250 [05:57<01:01, 3.13it/s] Training 1/1 epoch (loss 1.5636): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1057/1250 [05:57<01:01, 3.13it/s] Training 1/1 epoch (loss 1.5636): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1058/1250 [05:57<01:00, 3.15it/s] Training 1/1 epoch (loss 1.4078): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1058/1250 [05:58<01:00, 3.15it/s] Training 1/1 epoch (loss 1.4078): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1059/1250 [05:58<01:00, 3.18it/s] Training 1/1 epoch (loss 1.4246): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1059/1250 [05:58<01:00, 3.18it/s] Training 1/1 epoch (loss 1.4246): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1060/1250 [05:58<00:59, 3.20it/s] Training 1/1 epoch (loss 1.5666): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1060/1250 [05:58<00:59, 3.20it/s] Training 1/1 epoch (loss 1.5666): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1061/1250 [05:58<01:03, 2.99it/s] Training 1/1 epoch (loss 1.5006): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1061/1250 [05:59<01:03, 2.99it/s] Training 1/1 epoch (loss 1.5006): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1062/1250 [05:59<01:01, 3.04it/s] Training 1/1 epoch (loss 1.4770): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1062/1250 [05:59<01:01, 3.04it/s] Training 1/1 epoch (loss 1.4770): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1063/1250 [05:59<00:59, 3.15it/s] Training 1/1 epoch (loss 1.5701): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1063/1250 [05:59<00:59, 3.15it/s] Training 1/1 epoch (loss 1.5701): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1064/1250 [05:59<01:02, 3.00it/s] Training 1/1 epoch (loss 1.5631): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1064/1250 [06:00<01:02, 3.00it/s] Training 1/1 epoch (loss 1.5631): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1065/1250 [06:00<01:00, 3.07it/s] Training 1/1 epoch (loss 1.5540): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1065/1250 [06:00<01:00, 3.07it/s] Training 1/1 epoch (loss 1.5540): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1066/1250 [06:00<01:02, 2.92it/s] Training 1/1 epoch (loss 1.6083): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1066/1250 [06:00<01:02, 2.92it/s] Training 1/1 epoch (loss 1.6083): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1067/1250 [06:00<01:02, 2.93it/s] Training 1/1 epoch (loss 1.5385): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1067/1250 [06:01<01:02, 2.93it/s] Training 1/1 epoch (loss 1.5385): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1068/1250 [06:01<01:00, 3.00it/s] Training 1/1 epoch (loss 1.6379): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1068/1250 [06:01<01:00, 3.00it/s] Training 1/1 epoch (loss 1.6379): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1069/1250 [06:01<00:58, 3.09it/s] Training 1/1 epoch (loss 1.5622): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1069/1250 [06:01<00:58, 3.09it/s] Training 1/1 epoch (loss 1.5622): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1070/1250 [06:01<00:57, 3.14it/s] Training 1/1 epoch (loss 1.5487): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1070/1250 [06:02<00:57, 3.14it/s] Training 1/1 epoch (loss 1.5487): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1071/1250 [06:02<00:56, 3.16it/s] Training 1/1 epoch (loss 1.4144): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1071/1250 [06:02<00:56, 3.16it/s] Training 1/1 epoch (loss 1.4144): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1072/1250 [06:02<00:56, 3.15it/s] Training 1/1 epoch (loss 1.4811): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1072/1250 [06:02<00:56, 3.15it/s] Training 1/1 epoch (loss 1.4811): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1073/1250 [06:02<00:58, 3.03it/s] Training 1/1 epoch (loss 1.6016): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1073/1250 [06:03<00:58, 3.03it/s] Training 1/1 epoch (loss 1.6016): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1074/1250 [06:03<00:57, 3.08it/s] Training 1/1 epoch (loss 1.6015): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1074/1250 [06:03<00:57, 3.08it/s] Training 1/1 epoch (loss 1.6015): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1075/1250 [06:03<00:55, 3.14it/s] Training 1/1 epoch (loss 1.4948): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1075/1250 [06:03<00:55, 3.14it/s] Training 1/1 epoch (loss 1.4948): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1076/1250 [06:03<00:54, 3.17it/s] Training 1/1 epoch (loss 1.4165): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1076/1250 [06:04<00:54, 3.17it/s] Training 1/1 epoch (loss 1.4165): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1077/1250 [06:04<00:58, 2.98it/s] Training 1/1 epoch (loss 1.3891): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1077/1250 [06:04<00:58, 2.98it/s] Training 1/1 epoch (loss 1.3891): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1078/1250 [06:04<00:57, 3.00it/s] Training 1/1 epoch (loss 1.5889): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1078/1250 [06:04<00:57, 3.00it/s] Training 1/1 epoch (loss 1.5889): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1079/1250 [06:04<00:55, 3.06it/s] Training 1/1 epoch (loss 1.4684): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1079/1250 [06:05<00:55, 3.06it/s] Training 1/1 epoch (loss 1.4684): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1080/1250 [06:05<00:55, 3.05it/s] Training 1/1 epoch (loss 1.4919): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1080/1250 [06:05<00:55, 3.05it/s] Training 1/1 epoch (loss 1.4919): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1081/1250 [06:05<00:54, 3.11it/s] Training 1/1 epoch (loss 1.4283): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1081/1250 [06:05<00:54, 3.11it/s] Training 1/1 epoch (loss 1.4283): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1082/1250 [06:05<00:52, 3.20it/s] Training 1/1 epoch (loss 1.5916): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1082/1250 [06:06<00:52, 3.20it/s] Training 1/1 epoch (loss 1.5916): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1083/1250 [06:06<00:51, 3.22it/s] Training 1/1 epoch (loss 1.4223): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1083/1250 [06:06<00:51, 3.22it/s] Training 1/1 epoch (loss 1.4223): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1084/1250 [06:06<00:51, 3.25it/s] Training 1/1 epoch (loss 1.5726): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1084/1250 [06:06<00:51, 3.25it/s] Training 1/1 epoch (loss 1.5726): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1085/1250 [06:06<00:51, 3.22it/s] Training 1/1 epoch (loss 1.6210): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1085/1250 [06:07<00:51, 3.22it/s] Training 1/1 epoch (loss 1.6210): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1086/1250 [06:07<00:51, 3.18it/s] Training 1/1 epoch (loss 1.5089): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1086/1250 [06:07<00:51, 3.18it/s] Training 1/1 epoch (loss 1.5089): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1087/1250 [06:07<00:52, 3.08it/s] Training 1/1 epoch (loss 1.4973): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1087/1250 [06:07<00:52, 3.08it/s] Training 1/1 epoch (loss 1.4973): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1088/1250 [06:07<00:52, 3.09it/s] Training 1/1 epoch (loss 1.4214): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1088/1250 [06:08<00:52, 3.09it/s] Training 1/1 epoch (loss 1.4214): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1089/1250 [06:08<00:51, 3.14it/s] Training 1/1 epoch (loss 1.5623): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1089/1250 [06:08<00:51, 3.14it/s] Training 1/1 epoch (loss 1.5623): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1090/1250 [06:08<00:50, 3.20it/s] Training 1/1 epoch (loss 1.4239): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1090/1250 [06:08<00:50, 3.20it/s] Training 1/1 epoch (loss 1.4239): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1091/1250 [06:08<00:51, 3.11it/s] Training 1/1 epoch (loss 1.3967): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1091/1250 [06:08<00:51, 3.11it/s] Training 1/1 epoch (loss 1.3967): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1092/1250 [06:08<00:51, 3.07it/s] Training 1/1 epoch (loss 1.4487): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1092/1250 [06:09<00:51, 3.07it/s] Training 1/1 epoch (loss 1.4487): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1093/1250 [06:09<00:49, 3.15it/s] Training 1/1 epoch (loss 1.3463): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1093/1250 [06:09<00:49, 3.15it/s] Training 1/1 epoch (loss 1.3463): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1094/1250 [06:09<00:50, 3.10it/s] Training 1/1 epoch (loss 1.5077): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1094/1250 [06:09<00:50, 3.10it/s] Training 1/1 epoch (loss 1.5077): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1095/1250 [06:09<00:49, 3.11it/s] Training 1/1 epoch (loss 1.5868): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1095/1250 [06:10<00:49, 3.11it/s] Training 1/1 epoch (loss 1.5868): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1096/1250 [06:10<00:49, 3.11it/s] Training 1/1 epoch (loss 1.5641): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1096/1250 [06:10<00:49, 3.11it/s] Training 1/1 epoch (loss 1.5641): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1097/1250 [06:10<00:47, 3.20it/s] Training 1/1 epoch (loss 1.5654): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1097/1250 [06:10<00:47, 3.20it/s] Training 1/1 epoch (loss 1.5654): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1098/1250 [06:10<00:48, 3.16it/s] Training 1/1 epoch (loss 1.5231): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1098/1250 [06:11<00:48, 3.16it/s] Training 1/1 epoch (loss 1.5231): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1099/1250 [06:11<00:48, 3.14it/s] Training 1/1 epoch (loss 1.5237): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1099/1250 [06:11<00:48, 3.14it/s] Training 1/1 epoch (loss 1.5237): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1100/1250 [06:11<00:46, 3.21it/s] Training 1/1 epoch (loss 1.5343): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1100/1250 [06:11<00:46, 3.21it/s] Training 1/1 epoch (loss 1.5343): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1101/1250 [06:11<00:45, 3.29it/s] Training 1/1 epoch (loss 1.6430): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1101/1250 [06:12<00:45, 3.29it/s] Training 1/1 epoch (loss 1.6430): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1102/1250 [06:12<00:45, 3.23it/s] Training 1/1 epoch (loss 1.4826): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1102/1250 [06:12<00:45, 3.23it/s] Training 1/1 epoch (loss 1.4826): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1103/1250 [06:12<00:45, 3.24it/s] Training 1/1 epoch (loss 1.5137): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1103/1250 [06:12<00:45, 3.24it/s] Training 1/1 epoch (loss 1.5137): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1104/1250 [06:12<00:45, 3.19it/s] Training 1/1 epoch (loss 1.4560): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1104/1250 [06:13<00:45, 3.19it/s] Training 1/1 epoch (loss 1.4560): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1105/1250 [06:13<00:46, 3.10it/s] Training 1/1 epoch (loss 1.5660): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1105/1250 [06:13<00:46, 3.10it/s] Training 1/1 epoch (loss 1.5660): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1106/1250 [06:13<00:45, 3.16it/s] Training 1/1 epoch (loss 1.4244): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1106/1250 [06:13<00:45, 3.16it/s] Training 1/1 epoch (loss 1.4244): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1107/1250 [06:13<00:44, 3.24it/s] Training 1/1 epoch (loss 1.4205): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1107/1250 [06:13<00:44, 3.24it/s] Training 1/1 epoch (loss 1.4205): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1108/1250 [06:13<00:44, 3.20it/s] Training 1/1 epoch (loss 1.4582): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1108/1250 [06:14<00:44, 3.20it/s] Training 1/1 epoch (loss 1.4582): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1109/1250 [06:14<00:44, 3.14it/s] Training 1/1 epoch (loss 1.6196): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1109/1250 [06:14<00:44, 3.14it/s] Training 1/1 epoch (loss 1.6196): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1110/1250 [06:14<00:48, 2.91it/s] Training 1/1 epoch (loss 1.5100): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1110/1250 [06:15<00:48, 2.91it/s] Training 1/1 epoch (loss 1.5100): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1111/1250 [06:15<00:49, 2.81it/s] Training 1/1 epoch (loss 1.5168): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1111/1250 [06:15<00:49, 2.81it/s] Training 1/1 epoch (loss 1.5168): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1112/1250 [06:15<00:48, 2.87it/s] Training 1/1 epoch (loss 1.3217): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1112/1250 [06:15<00:48, 2.87it/s] Training 1/1 epoch (loss 1.3217): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1113/1250 [06:15<00:45, 3.00it/s] Training 1/1 epoch (loss 1.5053): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1113/1250 [06:16<00:45, 3.00it/s] Training 1/1 epoch (loss 1.5053): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1114/1250 [06:16<00:44, 3.03it/s] Training 1/1 epoch (loss 1.5441): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1114/1250 [06:16<00:44, 3.03it/s] Training 1/1 epoch (loss 1.5441): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1115/1250 [06:16<00:43, 3.07it/s] Training 1/1 epoch (loss 1.5006): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1115/1250 [06:16<00:43, 3.07it/s] Training 1/1 epoch (loss 1.5006): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1116/1250 [06:16<00:42, 3.13it/s] Training 1/1 epoch (loss 1.5660): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1116/1250 [06:17<00:42, 3.13it/s] Training 1/1 epoch (loss 1.5660): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1117/1250 [06:17<00:42, 3.10it/s] Training 1/1 epoch (loss 1.6200): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1117/1250 [06:17<00:42, 3.10it/s] Training 1/1 epoch (loss 1.6200): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1118/1250 [06:17<00:42, 3.11it/s] Training 1/1 epoch (loss 1.4577): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1118/1250 [06:17<00:42, 3.11it/s] Training 1/1 epoch (loss 1.4577): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1119/1250 [06:17<00:41, 3.12it/s] Training 1/1 epoch (loss 1.5416): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1119/1250 [06:17<00:41, 3.12it/s] Training 1/1 epoch (loss 1.5416): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1120/1250 [06:17<00:41, 3.13it/s] Training 1/1 epoch (loss 1.5913): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1120/1250 [06:18<00:41, 3.13it/s] Training 1/1 epoch (loss 1.5913): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1121/1250 [06:18<00:41, 3.14it/s] Training 1/1 epoch (loss 1.6258): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1121/1250 [06:18<00:41, 3.14it/s] Training 1/1 epoch (loss 1.6258): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1122/1250 [06:18<00:39, 3.24it/s] Training 1/1 epoch (loss 1.4931): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1122/1250 [06:18<00:39, 3.24it/s] Training 1/1 epoch (loss 1.4931): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1123/1250 [06:18<00:40, 3.14it/s] Training 1/1 epoch (loss 1.6270): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1123/1250 [06:19<00:40, 3.14it/s] Training 1/1 epoch (loss 1.6270): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1124/1250 [06:19<00:40, 3.12it/s] Training 1/1 epoch (loss 1.5280): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1124/1250 [06:19<00:40, 3.12it/s] Training 1/1 epoch (loss 1.5280): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1125/1250 [06:19<00:38, 3.21it/s] Training 1/1 epoch (loss 1.4190): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1125/1250 [06:19<00:38, 3.21it/s] Training 1/1 epoch (loss 1.4190): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1126/1250 [06:19<00:39, 3.11it/s] Training 1/1 epoch (loss 1.4516): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1126/1250 [06:20<00:39, 3.11it/s] Training 1/1 epoch (loss 1.4516): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1127/1250 [06:20<00:39, 3.10it/s] Training 1/1 epoch (loss 1.6464): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1127/1250 [06:20<00:39, 3.10it/s] Training 1/1 epoch (loss 1.6464): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1128/1250 [06:20<00:39, 3.06it/s] Training 1/1 epoch (loss 1.4923): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1128/1250 [06:20<00:39, 3.06it/s] Training 1/1 epoch (loss 1.4923): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1129/1250 [06:20<00:38, 3.11it/s] Training 1/1 epoch (loss 1.5608): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1129/1250 [06:21<00:38, 3.11it/s] Training 1/1 epoch (loss 1.5608): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1130/1250 [06:21<00:39, 3.07it/s] Training 1/1 epoch (loss 1.5772): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1130/1250 [06:21<00:39, 3.07it/s] Training 1/1 epoch (loss 1.5772): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1131/1250 [06:21<00:37, 3.15it/s] Training 1/1 epoch (loss 1.5126): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1131/1250 [06:21<00:37, 3.15it/s] Training 1/1 epoch (loss 1.5126): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1132/1250 [06:21<00:36, 3.19it/s] Training 1/1 epoch (loss 1.5583): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1132/1250 [06:22<00:36, 3.19it/s] Training 1/1 epoch (loss 1.5583): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1133/1250 [06:22<00:36, 3.20it/s] Training 1/1 epoch (loss 1.6000): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1133/1250 [06:22<00:36, 3.20it/s] Training 1/1 epoch (loss 1.6000): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1134/1250 [06:22<00:36, 3.18it/s] Training 1/1 epoch (loss 1.4799): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1134/1250 [06:22<00:36, 3.18it/s] Training 1/1 epoch (loss 1.4799): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1135/1250 [06:22<00:35, 3.22it/s] Training 1/1 epoch (loss 1.5271): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1135/1250 [06:23<00:35, 3.22it/s] Training 1/1 epoch (loss 1.5271): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1136/1250 [06:23<00:37, 3.00it/s] Training 1/1 epoch (loss 1.4374): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1136/1250 [06:23<00:37, 3.00it/s] Training 1/1 epoch (loss 1.4374): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1137/1250 [06:23<00:37, 2.99it/s] Training 1/1 epoch (loss 1.6343): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1137/1250 [06:24<00:37, 2.99it/s] Training 1/1 epoch (loss 1.6343): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1138/1250 [06:24<00:47, 2.35it/s] Training 1/1 epoch (loss 1.5339): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1138/1250 [06:24<00:47, 2.35it/s] Training 1/1 epoch (loss 1.5339): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1139/1250 [06:24<00:44, 2.51it/s] Training 1/1 epoch (loss 1.4878): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1139/1250 [06:24<00:44, 2.51it/s] Training 1/1 epoch (loss 1.4878): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1140/1250 [06:24<00:40, 2.71it/s] Training 1/1 epoch (loss 1.4166): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1140/1250 [06:25<00:40, 2.71it/s] Training 1/1 epoch (loss 1.4166): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1141/1250 [06:25<00:39, 2.73it/s] Training 1/1 epoch (loss 1.4479): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1141/1250 [06:25<00:39, 2.73it/s] Training 1/1 epoch (loss 1.4479): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1142/1250 [06:25<00:38, 2.80it/s] Training 1/1 epoch (loss 1.4248): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1142/1250 [06:25<00:38, 2.80it/s] Training 1/1 epoch (loss 1.4248): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1143/1250 [06:25<00:35, 2.99it/s] Training 1/1 epoch (loss 1.6552): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1143/1250 [06:26<00:35, 2.99it/s] Training 1/1 epoch (loss 1.6552): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1144/1250 [06:26<00:35, 2.99it/s] Training 1/1 epoch (loss 1.5583): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1144/1250 [06:26<00:35, 2.99it/s] Training 1/1 epoch (loss 1.5583): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1145/1250 [06:26<00:34, 3.03it/s] Training 1/1 epoch (loss 1.4931): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1145/1250 [06:26<00:34, 3.03it/s] Training 1/1 epoch (loss 1.4931): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1146/1250 [06:26<00:33, 3.09it/s] Training 1/1 epoch (loss 1.5167): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1146/1250 [06:26<00:33, 3.09it/s] Training 1/1 epoch (loss 1.5167): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1147/1250 [06:26<00:33, 3.05it/s] Training 1/1 epoch (loss 1.6796): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1147/1250 [06:27<00:33, 3.05it/s] Training 1/1 epoch (loss 1.6796): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1148/1250 [06:27<00:33, 3.08it/s] Training 1/1 epoch (loss 1.4733): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1148/1250 [06:27<00:33, 3.08it/s] Training 1/1 epoch (loss 1.4733): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1149/1250 [06:27<00:32, 3.12it/s] Training 1/1 epoch (loss 1.5034): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1149/1250 [06:27<00:32, 3.12it/s] Training 1/1 epoch (loss 1.5034): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1150/1250 [06:27<00:31, 3.17it/s] Training 1/1 epoch (loss 1.7206): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1150/1250 [06:28<00:31, 3.17it/s] Training 1/1 epoch (loss 1.7206): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1151/1250 [06:28<00:31, 3.19it/s] Training 1/1 epoch (loss 1.7148): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1151/1250 [06:28<00:31, 3.19it/s] Training 1/1 epoch (loss 1.7148): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1152/1250 [06:28<00:31, 3.14it/s] Training 1/1 epoch (loss 1.5488): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1152/1250 [06:28<00:31, 3.14it/s] Training 1/1 epoch (loss 1.5488): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1153/1250 [06:28<00:31, 3.12it/s] Training 1/1 epoch (loss 1.6425): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1153/1250 [06:29<00:31, 3.12it/s] Training 1/1 epoch (loss 1.6425): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1154/1250 [06:29<00:31, 3.08it/s] Training 1/1 epoch (loss 1.3657): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1154/1250 [06:29<00:31, 3.08it/s] Training 1/1 epoch (loss 1.3657): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1155/1250 [06:29<00:30, 3.15it/s] Training 1/1 epoch (loss 1.4765): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1155/1250 [06:29<00:30, 3.15it/s] Training 1/1 epoch (loss 1.4765): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1156/1250 [06:29<00:29, 3.15it/s] Training 1/1 epoch (loss 1.5515): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1156/1250 [06:30<00:29, 3.15it/s] Training 1/1 epoch (loss 1.5515): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1157/1250 [06:30<00:30, 3.04it/s] Training 1/1 epoch (loss 1.4198): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1157/1250 [06:30<00:30, 3.04it/s] Training 1/1 epoch (loss 1.4198): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1158/1250 [06:30<00:30, 3.00it/s] Training 1/1 epoch (loss 1.4835): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1158/1250 [06:30<00:30, 3.00it/s] Training 1/1 epoch (loss 1.4835): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1159/1250 [06:30<00:31, 2.85it/s] Training 1/1 epoch (loss 1.4312): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1159/1250 [06:31<00:31, 2.85it/s] Training 1/1 epoch (loss 1.4312): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1160/1250 [06:31<00:31, 2.82it/s] Training 1/1 epoch (loss 1.5838): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1160/1250 [06:31<00:31, 2.82it/s] Training 1/1 epoch (loss 1.5838): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1161/1250 [06:31<00:30, 2.93it/s] Training 1/1 epoch (loss 1.5117): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1161/1250 [06:31<00:30, 2.93it/s] Training 1/1 epoch (loss 1.5117): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1162/1250 [06:31<00:28, 3.05it/s] Training 1/1 epoch (loss 1.4153): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1162/1250 [06:32<00:28, 3.05it/s] Training 1/1 epoch (loss 1.4153): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1163/1250 [06:32<00:28, 3.06it/s] Training 1/1 epoch (loss 1.5018): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1163/1250 [06:32<00:28, 3.06it/s] Training 1/1 epoch (loss 1.5018): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1164/1250 [06:32<00:27, 3.13it/s] Training 1/1 epoch (loss 1.3986): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1164/1250 [06:32<00:27, 3.13it/s] Training 1/1 epoch (loss 1.3986): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1165/1250 [06:32<00:27, 3.10it/s] Training 1/1 epoch (loss 1.6476): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1165/1250 [06:33<00:27, 3.10it/s] Training 1/1 epoch (loss 1.6476): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1166/1250 [06:33<00:27, 3.06it/s] Training 1/1 epoch (loss 1.5272): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1166/1250 [06:33<00:27, 3.06it/s] Training 1/1 epoch (loss 1.5272): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1167/1250 [06:33<00:27, 3.04it/s] Training 1/1 epoch (loss 1.6202): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1167/1250 [06:33<00:27, 3.04it/s] Training 1/1 epoch (loss 1.6202): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1168/1250 [06:33<00:27, 3.02it/s] Training 1/1 epoch (loss 1.6121): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1168/1250 [06:34<00:27, 3.02it/s] Training 1/1 epoch (loss 1.6121): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1169/1250 [06:34<00:26, 3.08it/s] Training 1/1 epoch (loss 1.6579): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1169/1250 [06:34<00:26, 3.08it/s] Training 1/1 epoch (loss 1.6579): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1170/1250 [06:34<00:26, 3.03it/s] Training 1/1 epoch (loss 1.4513): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1170/1250 [06:34<00:26, 3.03it/s] Training 1/1 epoch (loss 1.4513): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1171/1250 [06:34<00:26, 2.93it/s] Training 1/1 epoch (loss 1.5072): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1171/1250 [06:35<00:26, 2.93it/s] Training 1/1 epoch (loss 1.5072): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1172/1250 [06:35<00:25, 3.04it/s] Training 1/1 epoch (loss 1.4343): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1172/1250 [06:35<00:25, 3.04it/s] Training 1/1 epoch (loss 1.4343): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1173/1250 [06:35<00:25, 2.97it/s] Training 1/1 epoch (loss 1.5964): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1173/1250 [06:35<00:25, 2.97it/s] Training 1/1 epoch (loss 1.5964): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1174/1250 [06:35<00:25, 3.03it/s] Training 1/1 epoch (loss 1.5049): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1174/1250 [06:36<00:25, 3.03it/s] Training 1/1 epoch (loss 1.5049): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1175/1250 [06:36<00:24, 3.07it/s] Training 1/1 epoch (loss 1.5689): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1175/1250 [06:36<00:24, 3.07it/s] Training 1/1 epoch (loss 1.5689): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1176/1250 [06:36<00:24, 3.00it/s] Training 1/1 epoch (loss 1.5663): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1176/1250 [06:36<00:24, 3.00it/s] Training 1/1 epoch (loss 1.5663): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1177/1250 [06:36<00:24, 3.01it/s] Training 1/1 epoch (loss 1.5215): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1177/1250 [06:37<00:24, 3.01it/s] Training 1/1 epoch (loss 1.5215): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1178/1250 [06:37<00:23, 3.00it/s] Training 1/1 epoch (loss 1.4202): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1178/1250 [06:37<00:23, 3.00it/s] Training 1/1 epoch (loss 1.4202): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1179/1250 [06:37<00:22, 3.12it/s] Training 1/1 epoch (loss 1.5137): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1179/1250 [06:37<00:22, 3.12it/s] Training 1/1 epoch (loss 1.5137): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1180/1250 [06:37<00:22, 3.14it/s] Training 1/1 epoch (loss 1.4328): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1180/1250 [06:38<00:22, 3.14it/s] Training 1/1 epoch (loss 1.4328): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1181/1250 [06:38<00:21, 3.15it/s] Training 1/1 epoch (loss 1.4496): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1181/1250 [06:38<00:21, 3.15it/s] Training 1/1 epoch (loss 1.4496): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1182/1250 [06:38<00:21, 3.12it/s] Training 1/1 epoch (loss 1.5729): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1182/1250 [06:38<00:21, 3.12it/s] Training 1/1 epoch (loss 1.5729): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1183/1250 [06:38<00:21, 3.16it/s] Training 1/1 epoch (loss 1.4681): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1183/1250 [06:39<00:21, 3.16it/s] Training 1/1 epoch (loss 1.4681): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1184/1250 [06:39<00:21, 3.02it/s] Training 1/1 epoch (loss 1.4797): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1184/1250 [06:39<00:21, 3.02it/s] Training 1/1 epoch (loss 1.4797): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1185/1250 [06:39<00:21, 3.03it/s] Training 1/1 epoch (loss 1.4507): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1185/1250 [06:39<00:21, 3.03it/s] Training 1/1 epoch (loss 1.4507): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1186/1250 [06:39<00:20, 3.11it/s] Training 1/1 epoch (loss 1.4444): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1186/1250 [06:40<00:20, 3.11it/s] Training 1/1 epoch (loss 1.4444): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1187/1250 [06:40<00:20, 3.12it/s] Training 1/1 epoch (loss 1.5227): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1187/1250 [06:40<00:20, 3.12it/s] Training 1/1 epoch (loss 1.5227): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1188/1250 [06:40<00:19, 3.11it/s] Training 1/1 epoch (loss 1.4971): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1188/1250 [06:40<00:19, 3.11it/s] Training 1/1 epoch (loss 1.4971): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1189/1250 [06:40<00:20, 2.99it/s] Training 1/1 epoch (loss 1.4389): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1189/1250 [06:41<00:20, 2.99it/s] Training 1/1 epoch (loss 1.4389): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1190/1250 [06:41<00:21, 2.85it/s] Training 1/1 epoch (loss 1.5841): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1190/1250 [06:41<00:21, 2.85it/s] Training 1/1 epoch (loss 1.5841): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1191/1250 [06:41<00:20, 2.93it/s] Training 1/1 epoch (loss 1.5726): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1191/1250 [06:41<00:20, 2.93it/s] Training 1/1 epoch (loss 1.5726): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1192/1250 [06:41<00:19, 2.97it/s] Training 1/1 epoch (loss 1.4500): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1192/1250 [06:42<00:19, 2.97it/s] Training 1/1 epoch (loss 1.4500): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1193/1250 [06:42<00:18, 3.02it/s] Training 1/1 epoch (loss 1.4855): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1193/1250 [06:42<00:18, 3.02it/s] Training 1/1 epoch (loss 1.4855): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1194/1250 [06:42<00:18, 3.09it/s] Training 1/1 epoch (loss 1.5814): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1194/1250 [06:42<00:18, 3.09it/s] Training 1/1 epoch (loss 1.5814): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1195/1250 [06:42<00:17, 3.11it/s] Training 1/1 epoch (loss 1.5712): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1195/1250 [06:43<00:17, 3.11it/s] Training 1/1 epoch (loss 1.5712): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1196/1250 [06:43<00:17, 3.08it/s] Training 1/1 epoch (loss 1.3277): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1196/1250 [06:43<00:17, 3.08it/s] Training 1/1 epoch (loss 1.3277): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1197/1250 [06:43<00:17, 3.05it/s] Training 1/1 epoch (loss 1.5310): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1197/1250 [06:43<00:17, 3.05it/s] Training 1/1 epoch (loss 1.5310): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1198/1250 [06:43<00:16, 3.09it/s] Training 1/1 epoch (loss 1.4462): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1198/1250 [06:43<00:16, 3.09it/s] Training 1/1 epoch (loss 1.4462): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1199/1250 [06:43<00:15, 3.19it/s] Training 1/1 epoch (loss 1.4144): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1199/1250 [06:44<00:15, 3.19it/s] Training 1/1 epoch (loss 1.4144): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1200/1250 [06:44<00:15, 3.13it/s] Training 1/1 epoch (loss 1.4437): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1200/1250 [06:44<00:15, 3.13it/s] Training 1/1 epoch (loss 1.4437): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1201/1250 [06:44<00:15, 3.12it/s] Training 1/1 epoch (loss 1.4944): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1201/1250 [06:45<00:15, 3.12it/s] Training 1/1 epoch (loss 1.4944): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1202/1250 [06:45<00:17, 2.81it/s] Training 1/1 epoch (loss 1.5560): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1202/1250 [06:45<00:17, 2.81it/s] Training 1/1 epoch (loss 1.5560): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1203/1250 [06:45<00:16, 2.77it/s] Training 1/1 epoch (loss 1.4725): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1203/1250 [06:45<00:16, 2.77it/s] Training 1/1 epoch (loss 1.4725): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1204/1250 [06:45<00:16, 2.78it/s] Training 1/1 epoch (loss 1.4574): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1204/1250 [06:46<00:16, 2.78it/s] Training 1/1 epoch (loss 1.4574): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1205/1250 [06:46<00:16, 2.70it/s] Training 1/1 epoch (loss 1.5634): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1205/1250 [06:46<00:16, 2.70it/s] Training 1/1 epoch (loss 1.5634): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1206/1250 [06:46<00:15, 2.84it/s] Training 1/1 epoch (loss 1.4670): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1206/1250 [06:46<00:15, 2.84it/s] Training 1/1 epoch (loss 1.4670): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1207/1250 [06:46<00:15, 2.81it/s] Training 1/1 epoch (loss 1.5262): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1207/1250 [06:47<00:15, 2.81it/s] Training 1/1 epoch (loss 1.5262): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1208/1250 [06:47<00:15, 2.71it/s] Training 1/1 epoch (loss 1.5171): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1208/1250 [06:47<00:15, 2.71it/s] Training 1/1 epoch (loss 1.5171): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1209/1250 [06:47<00:14, 2.77it/s] Training 1/1 epoch (loss 1.5384): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1209/1250 [06:47<00:14, 2.77it/s] Training 1/1 epoch (loss 1.5384): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1210/1250 [06:47<00:14, 2.77it/s] Training 1/1 epoch (loss 1.5710): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1210/1250 [06:48<00:14, 2.77it/s] Training 1/1 epoch (loss 1.5710): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1211/1250 [06:48<00:13, 2.81it/s] Training 1/1 epoch (loss 1.4317): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1211/1250 [06:48<00:13, 2.81it/s] Training 1/1 epoch (loss 1.4317): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1212/1250 [06:48<00:13, 2.89it/s] Training 1/1 epoch (loss 1.4706): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1212/1250 [06:48<00:13, 2.89it/s] Training 1/1 epoch (loss 1.4706): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1213/1250 [06:48<00:12, 2.90it/s] Training 1/1 epoch (loss 1.5718): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1213/1250 [06:49<00:12, 2.90it/s] Training 1/1 epoch (loss 1.5718): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1214/1250 [06:49<00:12, 2.89it/s] Training 1/1 epoch (loss 1.5113): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1214/1250 [06:49<00:12, 2.89it/s] Training 1/1 epoch (loss 1.5113): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1215/1250 [06:49<00:11, 2.94it/s] Training 1/1 epoch (loss 1.5213): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1215/1250 [06:50<00:11, 2.94it/s] Training 1/1 epoch (loss 1.5213): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1216/1250 [06:50<00:11, 2.90it/s] Training 1/1 epoch (loss 1.4996): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1216/1250 [06:50<00:11, 2.90it/s] Training 1/1 epoch (loss 1.4996): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1217/1250 [06:50<00:11, 2.96it/s] Training 1/1 epoch (loss 1.4384): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1217/1250 [06:50<00:11, 2.96it/s] Training 1/1 epoch (loss 1.4384): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1218/1250 [06:50<00:10, 3.04it/s] Training 1/1 epoch (loss 1.5890): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1218/1250 [06:50<00:10, 3.04it/s] Training 1/1 epoch (loss 1.5890): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1219/1250 [06:50<00:10, 2.99it/s] Training 1/1 epoch (loss 1.3565): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1219/1250 [06:51<00:10, 2.99it/s] Training 1/1 epoch (loss 1.3565): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1220/1250 [06:51<00:10, 2.98it/s] Training 1/1 epoch (loss 1.5299): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1220/1250 [06:51<00:10, 2.98it/s] Training 1/1 epoch (loss 1.5299): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1221/1250 [06:51<00:09, 3.06it/s] Training 1/1 epoch (loss 1.4789): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1221/1250 [06:51<00:09, 3.06it/s] Training 1/1 epoch (loss 1.4789): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1222/1250 [06:51<00:08, 3.15it/s] Training 1/1 epoch (loss 1.5145): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1222/1250 [06:52<00:08, 3.15it/s] Training 1/1 epoch (loss 1.5145): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1223/1250 [06:52<00:08, 3.15it/s] Training 1/1 epoch (loss 1.4646): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1223/1250 [06:52<00:08, 3.15it/s] Training 1/1 epoch (loss 1.4646): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1224/1250 [06:52<00:08, 3.07it/s] Training 1/1 epoch (loss 1.4852): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1224/1250 [06:52<00:08, 3.07it/s] Training 1/1 epoch (loss 1.4852): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1225/1250 [06:52<00:08, 3.12it/s] Training 1/1 epoch (loss 1.6219): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1225/1250 [06:53<00:08, 3.12it/s] Training 1/1 epoch (loss 1.6219): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1226/1250 [06:53<00:08, 2.83it/s] Training 1/1 epoch (loss 1.4843): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1226/1250 [06:53<00:08, 2.83it/s] Training 1/1 epoch (loss 1.4843): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1227/1250 [06:53<00:07, 2.94it/s] Training 1/1 epoch (loss 1.3903): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1227/1250 [06:53<00:07, 2.94it/s] Training 1/1 epoch (loss 1.3903): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1228/1250 [06:53<00:07, 3.05it/s] Training 1/1 epoch (loss 1.6273): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1228/1250 [06:54<00:07, 3.05it/s] Training 1/1 epoch (loss 1.6273): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1229/1250 [06:54<00:06, 3.11it/s] Training 1/1 epoch (loss 1.4770): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1229/1250 [06:54<00:06, 3.11it/s] Training 1/1 epoch (loss 1.4770): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1230/1250 [06:54<00:06, 3.07it/s] Training 1/1 epoch (loss 1.5178): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1230/1250 [06:54<00:06, 3.07it/s] Training 1/1 epoch (loss 1.5178): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1231/1250 [06:54<00:06, 3.00it/s] Training 1/1 epoch (loss 1.5180): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1231/1250 [06:55<00:06, 3.00it/s] Training 1/1 epoch (loss 1.5180): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1232/1250 [06:55<00:06, 2.73it/s] Training 1/1 epoch (loss 1.4576): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1232/1250 [06:55<00:06, 2.73it/s] Training 1/1 epoch (loss 1.4576): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1233/1250 [06:55<00:06, 2.75it/s] Training 1/1 epoch (loss 1.5662): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1233/1250 [06:56<00:06, 2.75it/s] Training 1/1 epoch (loss 1.5662): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1234/1250 [06:56<00:05, 2.78it/s] Training 1/1 epoch (loss 1.4561): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1234/1250 [06:56<00:05, 2.78it/s] Training 1/1 epoch (loss 1.4561): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1235/1250 [06:56<00:05, 2.82it/s] Training 1/1 epoch (loss 1.5677): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1235/1250 [06:56<00:05, 2.82it/s] Training 1/1 epoch (loss 1.5677): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1236/1250 [06:56<00:05, 2.77it/s] Training 1/1 epoch (loss 1.4726): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1236/1250 [06:57<00:05, 2.77it/s] Training 1/1 epoch (loss 1.4726): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1237/1250 [06:57<00:04, 2.73it/s] Training 1/1 epoch (loss 1.5596): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1237/1250 [06:57<00:04, 2.73it/s] Training 1/1 epoch (loss 1.5596): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1238/1250 [06:57<00:04, 2.80it/s] Training 1/1 epoch (loss 1.3780): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1238/1250 [06:57<00:04, 2.80it/s] Training 1/1 epoch (loss 1.3780): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1239/1250 [06:57<00:03, 2.92it/s] Training 1/1 epoch (loss 1.6324): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1239/1250 [06:58<00:03, 2.92it/s] Training 1/1 epoch (loss 1.6324): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1240/1250 [06:58<00:03, 2.93it/s] Training 1/1 epoch (loss 1.4356): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1240/1250 [06:58<00:03, 2.93it/s] Training 1/1 epoch (loss 1.4356): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1241/1250 [06:58<00:02, 3.02it/s] Training 1/1 epoch (loss 1.4644): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1241/1250 [06:58<00:02, 3.02it/s] Training 1/1 epoch (loss 1.4644): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1242/1250 [06:58<00:02, 3.04it/s] Training 1/1 epoch (loss 1.3843): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1242/1250 [06:59<00:02, 3.04it/s] Training 1/1 epoch (loss 1.3843): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1243/1250 [06:59<00:02, 3.07it/s] Training 1/1 epoch (loss 1.5747): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1243/1250 [06:59<00:02, 3.07it/s] Training 1/1 epoch (loss 1.5747): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1244/1250 [06:59<00:01, 3.09it/s] Training 1/1 epoch (loss 1.5588): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1244/1250 [06:59<00:01, 3.09it/s] Training 1/1 epoch (loss 1.5588): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1245/1250 [06:59<00:01, 3.14it/s] Training 1/1 epoch (loss 1.5203): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1245/1250 [07:00<00:01, 3.14it/s] Training 1/1 epoch (loss 1.5203): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1246/1250 [07:00<00:01, 3.07it/s] Training 1/1 epoch (loss 1.3928): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1246/1250 [07:00<00:01, 3.07it/s] Training 1/1 epoch (loss 1.3928): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1247/1250 [07:00<00:00, 3.14it/s] Training 1/1 epoch (loss 1.4681): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1247/1250 [07:00<00:00, 3.14it/s] Training 1/1 epoch (loss 1.4681): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1248/1250 [07:00<00:00, 2.90it/s] Training 1/1 epoch (loss 1.5618): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1248/1250 [07:01<00:00, 2.90it/s] Training 1/1 epoch (loss 1.5618): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1249/1250 [07:01<00:00, 2.76it/s] Training 1/1 epoch (loss 1.5405): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1249/1250 [07:01<00:00, 2.76it/s] Training 1/1 epoch (loss 1.5405): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1250/1250 [07:01<00:00, 2.95it/s] Training 1/1 epoch (loss 1.5405): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1250/1250 [07:01<00:00, 2.97it/s]
tokenizer config file saved in /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/tinyllama-2T/tinyllama-2T-s3-Q1-40k/tokenizer_config.json
Special tokens file saved in /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/tinyllama-2T/tinyllama-2T-s3-Q1-40k/special_tokens_map.json
wandb: ERROR Problem finishing run
Exception ignored in atexit callback: <bound method rank_zero_only.<locals>.wrapper of <safe_rlhf.logger.Logger object at 0x15505f9df850>>
Traceback (most recent call last):
File "/home/hansirui_1st/jiayi/resist/setting3/safe_rlhf/utils.py", line 212, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/hansirui_1st/jiayi/resist/setting3/safe_rlhf/logger.py", line 183, in close
self.wandb.finish()
File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 449, in wrapper
return func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 391, in wrapper
return func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 2106, in finish
return self._finish(exit_code)
^^^^^^^^^^^^^^^^^^^^^^^
File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 2127, in _finish
self._atexit_cleanup(exit_code=exit_code)
File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 2352, in _atexit_cleanup
self._on_finish()
File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 2609, in _on_finish
wait_with_progress(
File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/mailbox/wait_with_progress.py", line 24, in wait_with_progress
return wait_all_with_progress(
^^^^^^^^^^^^^^^^^^^^^^^
File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/mailbox/wait_with_progress.py", line 87, in wait_all_with_progress
return asyncio_compat.run(progress_loop_with_timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/lib/asyncio_compat.py", line 27, in run
future = executor.submit(runner.run, fn)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/concurrent/futures/thread.py", line 169, in submit
raise RuntimeError('cannot schedule new futures after '
RuntimeError: cannot schedule new futures after interpreter shutdown