Gaie's picture
Upload folder using huggingface_hub
c163c36 verified
+ deepspeed --master_port 51152 --module safe_rlhf.finetune --train_datasets inverse-json::/home/hansirui_1st/jiayi/resist/imdb_data/train/pos/10000/train.json --model_name_or_path /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T --max_length 512 --trust_remote_code True --epochs 1 --per_device_train_batch_size 1 --per_device_eval_batch_size 4 --gradient_accumulation_steps 8 --gradient_checkpointing --learning_rate 1e-5 --lr_warmup_ratio 0 --weight_decay 0.0 --lr_scheduler_type constant --weight_decay 0.0 --seed 42 --output_dir /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-1T/tinyllama-1T-s3-Q1-10000 --log_type wandb --log_run_name imdb-tinyllama-1T-s3-Q1-10000 --log_project Inverse_Alignment_IMDb --zero_stage 3 --offload none --bf16 True --tf32 True --save_16bit
[rank5]:[W529 17:25:51.014365012 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 5] using GPU 5 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
[rank7]:[W529 17:25:51.021848873 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 7] using GPU 7 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
[rank6]:[W529 17:25:51.119147718 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 6] using GPU 6 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
[rank3]:[W529 17:25:52.805304718 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 3] using GPU 3 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
[rank0]:[W529 17:25:53.276718923 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 0] using GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
[rank1]:[W529 17:25:53.276893358 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 1] using GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
[rank4]:[W529 17:25:53.276897994 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 4] using GPU 4 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
[rank2]:[W529 17:25:53.282328237 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 2] using GPU 2 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/config.json
loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/config.json
loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/config.json
loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/config.json
loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/config.json
loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/config.json
loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/config.json
loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/config.json
Model config LlamaConfig {
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"head_dim": 64,
"hidden_act": "silu",
"hidden_size": 2048,
"initializer_range": 0.02,
"intermediate_size": 5632,
"max_position_embeddings": 2048,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 22,
"num_key_value_heads": 4,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.52.1",
"use_cache": true,
"vocab_size": 32000
}
Model config LlamaConfig {
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"head_dim": 64,
"hidden_act": "silu",
"hidden_size": 2048,
"initializer_range": 0.02,
"intermediate_size": 5632,
"max_position_embeddings": 2048,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 22,
"num_key_value_heads": 4,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.52.1",
"use_cache": true,
"vocab_size": 32000
}
Model config LlamaConfig {
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"head_dim": 64,
"hidden_act": "silu",
"hidden_size": 2048,
"initializer_range": 0.02,
"intermediate_size": 5632,
"max_position_embeddings": 2048,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 22,
"num_key_value_heads": 4,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.52.1",
"use_cache": true,
"vocab_size": 32000
}
Model config LlamaConfig {
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"head_dim": 64,
"hidden_act": "silu",
"hidden_size": 2048,
"initializer_range": 0.02,
"intermediate_size": 5632,
"max_position_embeddings": 2048,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 22,
"num_key_value_heads": 4,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.52.1",
"use_cache": true,
"vocab_size": 32000
}
Model config LlamaConfig {
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"head_dim": 64,
"hidden_act": "silu",
"hidden_size": 2048,
"initializer_range": 0.02,
"intermediate_size": 5632,
"max_position_embeddings": 2048,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 22,
"num_key_value_heads": 4,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.52.1",
"use_cache": true,
"vocab_size": 32000
}
Model config LlamaConfig {
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"head_dim": 64,
"hidden_act": "silu",
"hidden_size": 2048,
"initializer_range": 0.02,
"intermediate_size": 5632,
"max_position_embeddings": 2048,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 22,
"num_key_value_heads": 4,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.52.1",
"use_cache": true,
"vocab_size": 32000
}
Model config LlamaConfig {
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"head_dim": 64,
"hidden_act": "silu",
"hidden_size": 2048,
"initializer_range": 0.02,
"intermediate_size": 5632,
"max_position_embeddings": 2048,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 22,
"num_key_value_heads": 4,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.52.1",
"use_cache": true,
"vocab_size": 32000
}
Model config LlamaConfig {
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"head_dim": 64,
"hidden_act": "silu",
"hidden_size": 2048,
"initializer_range": 0.02,
"intermediate_size": 5632,
"max_position_embeddings": 2048,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 22,
"num_key_value_heads": 4,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.52.1",
"use_cache": true,
"vocab_size": 32000
}
loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/model.safetensors
loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/model.safetensors
loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/model.safetensors
loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/model.safetensors
loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/model.safetensors
loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/model.safetensors
loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/model.safetensors
loading weights file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/model.safetensors
Will use torch_dtype=torch.float32 as defined in model's config object
Instantiating LlamaForCausalLM model under default dtype torch.float32.
Will use torch_dtype=torch.float32 as defined in model's config object
Instantiating LlamaForCausalLM model under default dtype torch.float32.
Detected DeepSpeed ZeRO-3: activating zero.init() for this model
Will use torch_dtype=torch.float32 as defined in model's config object
Will use torch_dtype=torch.float32 as defined in model's config object
Detected DeepSpeed ZeRO-3: activating zero.init() for this model
Will use torch_dtype=torch.float32 as defined in model's config object
Will use torch_dtype=torch.float32 as defined in model's config object
Instantiating LlamaForCausalLM model under default dtype torch.float32.
Instantiating LlamaForCausalLM model under default dtype torch.float32.
Will use torch_dtype=torch.float32 as defined in model's config object
Instantiating LlamaForCausalLM model under default dtype torch.float32.
Instantiating LlamaForCausalLM model under default dtype torch.float32.
Instantiating LlamaForCausalLM model under default dtype torch.float32.
Detected DeepSpeed ZeRO-3: activating zero.init() for this model
Detected DeepSpeed ZeRO-3: activating zero.init() for this model
Detected DeepSpeed ZeRO-3: activating zero.init() for this model
Detected DeepSpeed ZeRO-3: activating zero.init() for this model
Detected DeepSpeed ZeRO-3: activating zero.init() for this model
Will use torch_dtype=torch.float32 as defined in model's config object
Instantiating LlamaForCausalLM model under default dtype torch.float32.
Detected DeepSpeed ZeRO-3: activating zero.init() for this model
Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2
}
Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2
}
Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2
}
Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2
}
Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2
}
Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2
}
Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2
}
Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2
}
All model checkpoint weights were used when initializing LlamaForCausalLM.
All model checkpoint weights were used when initializing LlamaForCausalLM.
All model checkpoint weights were used when initializing LlamaForCausalLM.
All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
All model checkpoint weights were used when initializing LlamaForCausalLM.
All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
All model checkpoint weights were used when initializing LlamaForCausalLM.
All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
All model checkpoint weights were used when initializing LlamaForCausalLM.
All model checkpoint weights were used when initializing LlamaForCausalLM.
All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/generation_config.json
loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/generation_config.json
loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/generation_config.json
loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/generation_config.json
loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/generation_config.json
loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/generation_config.json
Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2,
"max_length": 2048,
"pad_token_id": 0
}
Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2,
"max_length": 2048,
"pad_token_id": 0
}
Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2,
"max_length": 2048,
"pad_token_id": 0
}
Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2,
"max_length": 2048,
"pad_token_id": 0
}
Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2,
"max_length": 2048,
"pad_token_id": 0
}
loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/generation_config.json
Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2,
"max_length": 2048,
"pad_token_id": 0
}
Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2,
"max_length": 2048,
"pad_token_id": 0
}
loading file tokenizer.model
loading file tokenizer.json
loading file added_tokens.json
loading file special_tokens_map.json
loading file tokenizer_config.json
loading file chat_template.jinja
loading file tokenizer.model
loading file tokenizer.model
loading file tokenizer.model
loading file tokenizer.json
loading file tokenizer.json
loading file tokenizer.model
loading file tokenizer.json
loading file added_tokens.json
loading file special_tokens_map.json
loading file added_tokens.json
loading file tokenizer.json
loading file added_tokens.json
loading file special_tokens_map.json
loading file special_tokens_map.json
loading file tokenizer_config.json
loading file added_tokens.json
loading file tokenizer.model
loading file chat_template.jinja
loading file special_tokens_map.json
loading file tokenizer_config.json
loading file tokenizer_config.json
loading file chat_template.jinja
loading file tokenizer_config.json
loading file chat_template.jinja
loading file chat_template.jinja
loading file tokenizer.json
loading file added_tokens.json
loading file tokenizer.model
loading file tokenizer.json
loading file special_tokens_map.json
loading file tokenizer_config.json
loading file added_tokens.json
loading file chat_template.jinja
loading file special_tokens_map.json
loading file tokenizer_config.json
loading file chat_template.jinja
You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc
You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc
You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc
You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc
You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc
You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc
You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc
All model checkpoint weights were used when initializing LlamaForCausalLM.
All the weights of LlamaForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
loading configuration file /aifs4su/hansirui_1st/models/TinyLlama-1.1B-intermediate-step-480k-1T/generation_config.json
Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2,
"max_length": 2048,
"pad_token_id": 0
}
loading file tokenizer.model
loading file tokenizer.json
loading file added_tokens.json
loading file special_tokens_map.json
loading file tokenizer_config.json
loading file chat_template.jinja
You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root...
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root...
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root...
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root...
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root...
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root...
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root...
Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/hansirui_1st/.cache/torch_extensions/py311_cu124/fused_adam/build.ninja...
/aifs4su/hansirui_1st/miniconda3/envs/jy-resist/lib/python3.11/site-packages/torch/utils/cpp_extension.py:2059: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
warnings.warn(
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
Loading extension module fused_adam...
Loading extension module fused_adam...Loading extension module fused_adam...Loading extension module fused_adam...
Loading extension module fused_adam...
Loading extension module fused_adam...
Loading extension module fused_adam...
Loading extension module fused_adam...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
wandb: Currently logged in as: xtom to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.19.11
wandb: Run data is saved locally in /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-1T/tinyllama-1T-s3-Q1-10000/wandb/run-20250529_172629-0t1vq5hu
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run imdb-tinyllama-1T-s3-Q1-10000
wandb: ⭐️ View project at https://wandb.ai/xtom/Inverse_Alignment_IMDb
wandb: πŸš€ View run at https://wandb.ai/xtom/Inverse_Alignment_IMDb/runs/0t1vq5hu
Training 1/1 epoch: 0%| | 0/1250 [00:00<?, ?it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
Training 1/1 epoch (loss 2.9431): 0%| | 0/1250 [00:05<?, ?it/s] Training 1/1 epoch (loss 2.9431): 0%| | 1/1250 [00:05<1:53:28, 5.45s/it] Training 1/1 epoch (loss 2.9469): 0%| | 1/1250 [00:07<1:53:28, 5.45s/it] Training 1/1 epoch (loss 2.9469): 0%| | 2/1250 [00:07<1:06:30, 3.20s/it] Training 1/1 epoch (loss 2.8404): 0%| | 2/1250 [00:07<1:06:30, 3.20s/it] Training 1/1 epoch (loss 2.8404): 0%| | 3/1250 [00:07<39:16, 1.89s/it] Training 1/1 epoch (loss 3.2492): 0%| | 3/1250 [00:07<39:16, 1.89s/it] Training 1/1 epoch (loss 3.2492): 0%| | 4/1250 [00:07<26:19, 1.27s/it] Training 1/1 epoch (loss 2.9856): 0%| | 4/1250 [00:08<26:19, 1.27s/it] Training 1/1 epoch (loss 2.9856): 0%| | 5/1250 [00:08<19:52, 1.04it/s] Training 1/1 epoch (loss 3.2084): 0%| | 5/1250 [00:08<19:52, 1.04it/s] Training 1/1 epoch (loss 3.2084): 0%| | 6/1250 [00:08<15:21, 1.35it/s] Training 1/1 epoch (loss 2.9686): 0%| | 6/1250 [00:08<15:21, 1.35it/s] Training 1/1 epoch (loss 2.9686): 1%| | 7/1250 [00:08<12:48, 1.62it/s] Training 1/1 epoch (loss 2.8229): 1%| | 7/1250 [00:09<12:48, 1.62it/s] Training 1/1 epoch (loss 2.8229): 1%| | 8/1250 [00:09<11:25, 1.81it/s] Training 1/1 epoch (loss 2.9485): 1%| | 8/1250 [00:09<11:25, 1.81it/s] Training 1/1 epoch (loss 2.9485): 1%| | 9/1250 [00:09<10:04, 2.05it/s] Training 1/1 epoch (loss 2.7161): 1%| | 9/1250 [00:09<10:04, 2.05it/s] Training 1/1 epoch (loss 2.7161): 1%| | 10/1250 [00:09<08:55, 2.32it/s] Training 1/1 epoch (loss 3.0050): 1%| | 10/1250 [00:10<08:55, 2.32it/s] Training 1/1 epoch (loss 3.0050): 1%| | 11/1250 [00:10<08:40, 2.38it/s] Training 1/1 epoch (loss 2.9368): 1%| | 11/1250 [00:10<08:40, 2.38it/s] Training 1/1 epoch (loss 2.9368): 1%| | 12/1250 [00:10<08:01, 2.57it/s] Training 1/1 epoch (loss 2.8264): 1%| | 12/1250 [00:10<08:01, 2.57it/s] Training 1/1 epoch (loss 2.8264): 1%| | 13/1250 [00:10<07:33, 2.73it/s] Training 1/1 epoch (loss 2.6714): 1%| | 13/1250 [00:11<07:33, 2.73it/s] Training 1/1 epoch (loss 2.6714): 1%| | 14/1250 [00:11<07:07, 2.89it/s] Training 1/1 epoch (loss 3.0056): 1%| | 14/1250 [00:11<07:07, 2.89it/s] Training 1/1 epoch (loss 3.0056): 1%| | 15/1250 [00:11<07:06, 2.89it/s] Training 1/1 epoch (loss 2.8702): 1%| | 15/1250 [00:11<07:06, 2.89it/s] Training 1/1 epoch (loss 2.8702): 1%|▏ | 16/1250 [00:11<06:55, 2.97it/s] Training 1/1 epoch (loss 2.9951): 1%|▏ | 16/1250 [00:12<06:55, 2.97it/s] Training 1/1 epoch (loss 2.9951): 1%|▏ | 17/1250 [00:12<07:17, 2.82it/s] Training 1/1 epoch (loss 3.0666): 1%|▏ | 17/1250 [00:12<07:17, 2.82it/s] Training 1/1 epoch (loss 3.0666): 1%|▏ | 18/1250 [00:12<07:08, 2.87it/s] Training 1/1 epoch (loss 2.9809): 1%|▏ | 18/1250 [00:12<07:08, 2.87it/s] Training 1/1 epoch (loss 2.9809): 2%|▏ | 19/1250 [00:12<06:53, 2.98it/s] Training 1/1 epoch (loss 2.9604): 2%|▏ | 19/1250 [00:13<06:53, 2.98it/s] Training 1/1 epoch (loss 2.9604): 2%|▏ | 20/1250 [00:13<06:46, 3.02it/s] Training 1/1 epoch (loss 2.8485): 2%|▏ | 20/1250 [00:13<06:46, 3.02it/s] Training 1/1 epoch (loss 2.8485): 2%|▏ | 21/1250 [00:13<06:35, 3.11it/s] Training 1/1 epoch (loss 3.1385): 2%|▏ | 21/1250 [00:13<06:35, 3.11it/s] Training 1/1 epoch (loss 3.1385): 2%|▏ | 22/1250 [00:13<07:05, 2.89it/s] Training 1/1 epoch (loss 3.0077): 2%|▏ | 22/1250 [00:14<07:05, 2.89it/s] Training 1/1 epoch (loss 3.0077): 2%|▏ | 23/1250 [00:14<07:10, 2.85it/s] Training 1/1 epoch (loss 2.5717): 2%|▏ | 23/1250 [00:14<07:10, 2.85it/s] Training 1/1 epoch (loss 2.5717): 2%|▏ | 24/1250 [00:14<07:13, 2.82it/s] Training 1/1 epoch (loss 2.8567): 2%|▏ | 24/1250 [00:14<07:13, 2.82it/s] Training 1/1 epoch (loss 2.8567): 2%|▏ | 25/1250 [00:14<07:12, 2.83it/s] Training 1/1 epoch (loss 2.7675): 2%|▏ | 25/1250 [00:15<07:12, 2.83it/s] Training 1/1 epoch (loss 2.7675): 2%|▏ | 26/1250 [00:15<06:55, 2.94it/s] Training 1/1 epoch (loss 3.0234): 2%|▏ | 26/1250 [00:15<06:55, 2.94it/s] Training 1/1 epoch (loss 3.0234): 2%|▏ | 27/1250 [00:15<07:09, 2.85it/s] Training 1/1 epoch (loss 3.0146): 2%|▏ | 27/1250 [00:15<07:09, 2.85it/s] Training 1/1 epoch (loss 3.0146): 2%|▏ | 28/1250 [00:15<06:57, 2.92it/s] Training 1/1 epoch (loss 2.9705): 2%|▏ | 28/1250 [00:16<06:57, 2.92it/s] Training 1/1 epoch (loss 2.9705): 2%|▏ | 29/1250 [00:16<06:59, 2.91it/s] Training 1/1 epoch (loss 2.9443): 2%|▏ | 29/1250 [00:16<06:59, 2.91it/s] Training 1/1 epoch (loss 2.9443): 2%|▏ | 30/1250 [00:16<07:07, 2.85it/s] Training 1/1 epoch (loss 3.1438): 2%|▏ | 30/1250 [00:17<07:07, 2.85it/s] Training 1/1 epoch (loss 3.1438): 2%|▏ | 31/1250 [00:17<07:03, 2.88it/s] Training 1/1 epoch (loss 2.9828): 2%|▏ | 31/1250 [00:17<07:03, 2.88it/s] Training 1/1 epoch (loss 2.9828): 3%|β–Ž | 32/1250 [00:17<06:59, 2.90it/s] Training 1/1 epoch (loss 3.0442): 3%|β–Ž | 32/1250 [00:17<06:59, 2.90it/s] Training 1/1 epoch (loss 3.0442): 3%|β–Ž | 33/1250 [00:17<07:02, 2.88it/s] Training 1/1 epoch (loss 2.7946): 3%|β–Ž | 33/1250 [00:18<07:02, 2.88it/s] Training 1/1 epoch (loss 2.7946): 3%|β–Ž | 34/1250 [00:18<07:05, 2.86it/s] Training 1/1 epoch (loss 2.9816): 3%|β–Ž | 34/1250 [00:18<07:05, 2.86it/s] Training 1/1 epoch (loss 2.9816): 3%|β–Ž | 35/1250 [00:18<07:00, 2.89it/s] Training 1/1 epoch (loss 2.9471): 3%|β–Ž | 35/1250 [00:18<07:00, 2.89it/s] Training 1/1 epoch (loss 2.9471): 3%|β–Ž | 36/1250 [00:18<06:58, 2.90it/s] Training 1/1 epoch (loss 2.7825): 3%|β–Ž | 36/1250 [00:19<06:58, 2.90it/s] Training 1/1 epoch (loss 2.7825): 3%|β–Ž | 37/1250 [00:19<06:58, 2.90it/s] Training 1/1 epoch (loss 2.7514): 3%|β–Ž | 37/1250 [00:19<06:58, 2.90it/s] Training 1/1 epoch (loss 2.7514): 3%|β–Ž | 38/1250 [00:19<06:47, 2.97it/s] Training 1/1 epoch (loss 3.0962): 3%|β–Ž | 38/1250 [00:19<06:47, 2.97it/s] Training 1/1 epoch (loss 3.0962): 3%|β–Ž | 39/1250 [00:19<06:44, 2.99it/s] Training 1/1 epoch (loss 2.8145): 3%|β–Ž | 39/1250 [00:20<06:44, 2.99it/s] Training 1/1 epoch (loss 2.8145): 3%|β–Ž | 40/1250 [00:20<07:05, 2.84it/s] Training 1/1 epoch (loss 2.9322): 3%|β–Ž | 40/1250 [00:20<07:05, 2.84it/s] Training 1/1 epoch (loss 2.9322): 3%|β–Ž | 41/1250 [00:20<06:52, 2.93it/s] Training 1/1 epoch (loss 2.7641): 3%|β–Ž | 41/1250 [00:20<06:52, 2.93it/s] Training 1/1 epoch (loss 2.7641): 3%|β–Ž | 42/1250 [00:20<07:14, 2.78it/s] Training 1/1 epoch (loss 2.8599): 3%|β–Ž | 42/1250 [00:21<07:14, 2.78it/s] Training 1/1 epoch (loss 2.8599): 3%|β–Ž | 43/1250 [00:21<06:58, 2.88it/s] Training 1/1 epoch (loss 2.9119): 3%|β–Ž | 43/1250 [00:21<06:58, 2.88it/s] Training 1/1 epoch (loss 2.9119): 4%|β–Ž | 44/1250 [00:21<06:46, 2.96it/s] Training 1/1 epoch (loss 2.8669): 4%|β–Ž | 44/1250 [00:21<06:46, 2.96it/s] Training 1/1 epoch (loss 2.8669): 4%|β–Ž | 45/1250 [00:21<06:51, 2.93it/s] Training 1/1 epoch (loss 2.7548): 4%|β–Ž | 45/1250 [00:22<06:51, 2.93it/s] Training 1/1 epoch (loss 2.7548): 4%|β–Ž | 46/1250 [00:22<06:58, 2.88it/s] Training 1/1 epoch (loss 2.9779): 4%|β–Ž | 46/1250 [00:22<06:58, 2.88it/s] Training 1/1 epoch (loss 2.9779): 4%|▍ | 47/1250 [00:22<06:36, 3.03it/s] Training 1/1 epoch (loss 2.9614): 4%|▍ | 47/1250 [00:22<06:36, 3.03it/s] Training 1/1 epoch (loss 2.9614): 4%|▍ | 48/1250 [00:22<06:46, 2.95it/s] Training 1/1 epoch (loss 2.7782): 4%|▍ | 48/1250 [00:23<06:46, 2.95it/s] Training 1/1 epoch (loss 2.7782): 4%|▍ | 49/1250 [00:23<08:02, 2.49it/s] Training 1/1 epoch (loss 2.9420): 4%|▍ | 49/1250 [00:23<08:02, 2.49it/s] Training 1/1 epoch (loss 2.9420): 4%|▍ | 50/1250 [00:23<07:40, 2.60it/s] Training 1/1 epoch (loss 2.8061): 4%|▍ | 50/1250 [00:24<07:40, 2.60it/s] Training 1/1 epoch (loss 2.8061): 4%|▍ | 51/1250 [00:24<07:57, 2.51it/s] Training 1/1 epoch (loss 2.8840): 4%|▍ | 51/1250 [00:24<07:57, 2.51it/s] Training 1/1 epoch (loss 2.8840): 4%|▍ | 52/1250 [00:24<07:33, 2.64it/s] Training 1/1 epoch (loss 2.9800): 4%|▍ | 52/1250 [00:24<07:33, 2.64it/s] Training 1/1 epoch (loss 2.9800): 4%|▍ | 53/1250 [00:24<07:25, 2.69it/s] Training 1/1 epoch (loss 2.7454): 4%|▍ | 53/1250 [00:25<07:25, 2.69it/s] Training 1/1 epoch (loss 2.7454): 4%|▍ | 54/1250 [00:25<07:23, 2.70it/s] Training 1/1 epoch (loss 2.8337): 4%|▍ | 54/1250 [00:25<07:23, 2.70it/s] Training 1/1 epoch (loss 2.8337): 4%|▍ | 55/1250 [00:25<07:07, 2.80it/s] Training 1/1 epoch (loss 2.8162): 4%|▍ | 55/1250 [00:25<07:07, 2.80it/s] Training 1/1 epoch (loss 2.8162): 4%|▍ | 56/1250 [00:25<07:22, 2.70it/s] Training 1/1 epoch (loss 2.7496): 4%|▍ | 56/1250 [00:26<07:22, 2.70it/s] Training 1/1 epoch (loss 2.7496): 5%|▍ | 57/1250 [00:26<07:33, 2.63it/s] Training 1/1 epoch (loss 2.6391): 5%|▍ | 57/1250 [00:26<07:33, 2.63it/s] Training 1/1 epoch (loss 2.6391): 5%|▍ | 58/1250 [00:26<07:13, 2.75it/s] Training 1/1 epoch (loss 2.7713): 5%|▍ | 58/1250 [00:27<07:13, 2.75it/s] Training 1/1 epoch (loss 2.7713): 5%|▍ | 59/1250 [00:27<06:49, 2.91it/s] Training 1/1 epoch (loss 2.9023): 5%|▍ | 59/1250 [00:27<06:49, 2.91it/s] Training 1/1 epoch (loss 2.9023): 5%|▍ | 60/1250 [00:27<06:49, 2.91it/s] Training 1/1 epoch (loss 2.7720): 5%|▍ | 60/1250 [00:27<06:49, 2.91it/s] Training 1/1 epoch (loss 2.7720): 5%|▍ | 61/1250 [00:27<06:50, 2.89it/s] Training 1/1 epoch (loss 2.8423): 5%|▍ | 61/1250 [00:28<06:50, 2.89it/s] Training 1/1 epoch (loss 2.8423): 5%|▍ | 62/1250 [00:28<07:55, 2.50it/s] Training 1/1 epoch (loss 2.9824): 5%|▍ | 62/1250 [00:28<07:55, 2.50it/s] Training 1/1 epoch (loss 2.9824): 5%|β–Œ | 63/1250 [00:28<08:24, 2.35it/s] Training 1/1 epoch (loss 3.1130): 5%|β–Œ | 63/1250 [00:29<08:24, 2.35it/s] Training 1/1 epoch (loss 3.1130): 5%|β–Œ | 64/1250 [00:29<07:47, 2.54it/s] Training 1/1 epoch (loss 2.8913): 5%|β–Œ | 64/1250 [00:29<07:47, 2.54it/s] Training 1/1 epoch (loss 2.8913): 5%|β–Œ | 65/1250 [00:29<07:41, 2.57it/s] Training 1/1 epoch (loss 2.7743): 5%|β–Œ | 65/1250 [00:29<07:41, 2.57it/s] Training 1/1 epoch (loss 2.7743): 5%|β–Œ | 66/1250 [00:29<07:34, 2.61it/s] Training 1/1 epoch (loss 2.8412): 5%|β–Œ | 66/1250 [00:30<07:34, 2.61it/s] Training 1/1 epoch (loss 2.8412): 5%|β–Œ | 67/1250 [00:30<07:39, 2.57it/s] Training 1/1 epoch (loss 2.9112): 5%|β–Œ | 67/1250 [00:30<07:39, 2.57it/s] Training 1/1 epoch (loss 2.9112): 5%|β–Œ | 68/1250 [00:30<07:09, 2.75it/s] Training 1/1 epoch (loss 2.7371): 5%|β–Œ | 68/1250 [00:30<07:09, 2.75it/s] Training 1/1 epoch (loss 2.7371): 6%|β–Œ | 69/1250 [00:30<07:02, 2.80it/s] Training 1/1 epoch (loss 2.9480): 6%|β–Œ | 69/1250 [00:31<07:02, 2.80it/s] Training 1/1 epoch (loss 2.9480): 6%|β–Œ | 70/1250 [00:31<07:01, 2.80it/s] Training 1/1 epoch (loss 2.7706): 6%|β–Œ | 70/1250 [00:31<07:01, 2.80it/s] Training 1/1 epoch (loss 2.7706): 6%|β–Œ | 71/1250 [00:31<06:46, 2.90it/s] Training 1/1 epoch (loss 2.7700): 6%|β–Œ | 71/1250 [00:31<06:46, 2.90it/s] Training 1/1 epoch (loss 2.7700): 6%|β–Œ | 72/1250 [00:31<06:48, 2.88it/s] Training 1/1 epoch (loss 2.9774): 6%|β–Œ | 72/1250 [00:32<06:48, 2.88it/s] Training 1/1 epoch (loss 2.9774): 6%|β–Œ | 73/1250 [00:32<06:51, 2.86it/s] Training 1/1 epoch (loss 2.9490): 6%|β–Œ | 73/1250 [00:32<06:51, 2.86it/s] Training 1/1 epoch (loss 2.9490): 6%|β–Œ | 74/1250 [00:32<06:38, 2.95it/s] Training 1/1 epoch (loss 2.8017): 6%|β–Œ | 74/1250 [00:32<06:38, 2.95it/s] Training 1/1 epoch (loss 2.8017): 6%|β–Œ | 75/1250 [00:32<06:37, 2.96it/s] Training 1/1 epoch (loss 2.8868): 6%|β–Œ | 75/1250 [00:33<06:37, 2.96it/s] Training 1/1 epoch (loss 2.8868): 6%|β–Œ | 76/1250 [00:33<06:33, 2.98it/s] Training 1/1 epoch (loss 2.7817): 6%|β–Œ | 76/1250 [00:33<06:33, 2.98it/s] Training 1/1 epoch (loss 2.7817): 6%|β–Œ | 77/1250 [00:33<06:38, 2.94it/s] Training 1/1 epoch (loss 2.9865): 6%|β–Œ | 77/1250 [00:33<06:38, 2.94it/s] Training 1/1 epoch (loss 2.9865): 6%|β–Œ | 78/1250 [00:33<06:54, 2.83it/s] Training 1/1 epoch (loss 2.8642): 6%|β–Œ | 78/1250 [00:34<06:54, 2.83it/s] Training 1/1 epoch (loss 2.8642): 6%|β–‹ | 79/1250 [00:34<07:08, 2.73it/s] Training 1/1 epoch (loss 2.8344): 6%|β–‹ | 79/1250 [00:34<07:08, 2.73it/s] Training 1/1 epoch (loss 2.8344): 6%|β–‹ | 80/1250 [00:34<06:53, 2.83it/s] Training 1/1 epoch (loss 2.9445): 6%|β–‹ | 80/1250 [00:34<06:53, 2.83it/s] Training 1/1 epoch (loss 2.9445): 6%|β–‹ | 81/1250 [00:34<06:46, 2.88it/s] Training 1/1 epoch (loss 2.9105): 6%|β–‹ | 81/1250 [00:35<06:46, 2.88it/s] Training 1/1 epoch (loss 2.9105): 7%|β–‹ | 82/1250 [00:35<06:35, 2.95it/s] Training 1/1 epoch (loss 2.9879): 7%|β–‹ | 82/1250 [00:35<06:35, 2.95it/s] Training 1/1 epoch (loss 2.9879): 7%|β–‹ | 83/1250 [00:35<06:23, 3.04it/s] Training 1/1 epoch (loss 2.8390): 7%|β–‹ | 83/1250 [00:35<06:23, 3.04it/s] Training 1/1 epoch (loss 2.8390): 7%|β–‹ | 84/1250 [00:35<06:29, 2.99it/s] Training 1/1 epoch (loss 2.7115): 7%|β–‹ | 84/1250 [00:36<06:29, 2.99it/s] Training 1/1 epoch (loss 2.7115): 7%|β–‹ | 85/1250 [00:36<06:29, 2.99it/s] Training 1/1 epoch (loss 2.8002): 7%|β–‹ | 85/1250 [00:36<06:29, 2.99it/s] Training 1/1 epoch (loss 2.8002): 7%|β–‹ | 86/1250 [00:36<06:41, 2.90it/s] Training 1/1 epoch (loss 2.9690): 7%|β–‹ | 86/1250 [00:37<06:41, 2.90it/s] Training 1/1 epoch (loss 2.9690): 7%|β–‹ | 87/1250 [00:37<06:43, 2.88it/s] Training 1/1 epoch (loss 2.7454): 7%|β–‹ | 87/1250 [00:37<06:43, 2.88it/s] Training 1/1 epoch (loss 2.7454): 7%|β–‹ | 88/1250 [00:37<06:43, 2.88it/s] Training 1/1 epoch (loss 2.8628): 7%|β–‹ | 88/1250 [00:37<06:43, 2.88it/s] Training 1/1 epoch (loss 2.8628): 7%|β–‹ | 89/1250 [00:37<06:38, 2.91it/s] Training 1/1 epoch (loss 3.0633): 7%|β–‹ | 89/1250 [00:38<06:38, 2.91it/s] Training 1/1 epoch (loss 3.0633): 7%|β–‹ | 90/1250 [00:38<06:43, 2.88it/s] Training 1/1 epoch (loss 2.7011): 7%|β–‹ | 90/1250 [00:38<06:43, 2.88it/s] Training 1/1 epoch (loss 2.7011): 7%|β–‹ | 91/1250 [00:38<06:39, 2.90it/s] Training 1/1 epoch (loss 2.7780): 7%|β–‹ | 91/1250 [00:38<06:39, 2.90it/s] Training 1/1 epoch (loss 2.7780): 7%|β–‹ | 92/1250 [00:38<06:33, 2.94it/s] Training 1/1 epoch (loss 2.7543): 7%|β–‹ | 92/1250 [00:39<06:33, 2.94it/s] Training 1/1 epoch (loss 2.7543): 7%|β–‹ | 93/1250 [00:39<06:49, 2.82it/s] Training 1/1 epoch (loss 2.7619): 7%|β–‹ | 93/1250 [00:39<06:49, 2.82it/s] Training 1/1 epoch (loss 2.7619): 8%|β–Š | 94/1250 [00:39<07:15, 2.66it/s] Training 1/1 epoch (loss 2.8318): 8%|β–Š | 94/1250 [00:39<07:15, 2.66it/s] Training 1/1 epoch (loss 2.8318): 8%|β–Š | 95/1250 [00:39<07:34, 2.54it/s] Training 1/1 epoch (loss 2.8768): 8%|β–Š | 95/1250 [00:40<07:34, 2.54it/s] Training 1/1 epoch (loss 2.8768): 8%|β–Š | 96/1250 [00:40<08:13, 2.34it/s] Training 1/1 epoch (loss 2.7662): 8%|β–Š | 96/1250 [00:40<08:13, 2.34it/s] Training 1/1 epoch (loss 2.7662): 8%|β–Š | 97/1250 [00:40<07:58, 2.41it/s] Training 1/1 epoch (loss 2.7607): 8%|β–Š | 97/1250 [00:41<07:58, 2.41it/s] Training 1/1 epoch (loss 2.7607): 8%|β–Š | 98/1250 [00:41<07:19, 2.62it/s] Training 1/1 epoch (loss 2.8731): 8%|β–Š | 98/1250 [00:41<07:19, 2.62it/s] Training 1/1 epoch (loss 2.8731): 8%|β–Š | 99/1250 [00:41<06:53, 2.79it/s] Training 1/1 epoch (loss 3.0974): 8%|β–Š | 99/1250 [00:41<06:53, 2.79it/s] Training 1/1 epoch (loss 3.0974): 8%|β–Š | 100/1250 [00:41<07:10, 2.67it/s] Training 1/1 epoch (loss 2.8144): 8%|β–Š | 100/1250 [00:42<07:10, 2.67it/s] Training 1/1 epoch (loss 2.8144): 8%|β–Š | 101/1250 [00:42<06:58, 2.75it/s] Training 1/1 epoch (loss 2.7572): 8%|β–Š | 101/1250 [00:42<06:58, 2.75it/s] Training 1/1 epoch (loss 2.7572): 8%|β–Š | 102/1250 [00:42<06:49, 2.81it/s] Training 1/1 epoch (loss 2.8391): 8%|β–Š | 102/1250 [00:42<06:49, 2.81it/s] Training 1/1 epoch (loss 2.8391): 8%|β–Š | 103/1250 [00:42<06:30, 2.94it/s] Training 1/1 epoch (loss 2.9753): 8%|β–Š | 103/1250 [00:43<06:30, 2.94it/s] Training 1/1 epoch (loss 2.9753): 8%|β–Š | 104/1250 [00:43<06:33, 2.91it/s] Training 1/1 epoch (loss 2.8194): 8%|β–Š | 104/1250 [00:43<06:33, 2.91it/s] Training 1/1 epoch (loss 2.8194): 8%|β–Š | 105/1250 [00:43<06:20, 3.01it/s] Training 1/1 epoch (loss 2.8027): 8%|β–Š | 105/1250 [00:43<06:20, 3.01it/s] Training 1/1 epoch (loss 2.8027): 8%|β–Š | 106/1250 [00:43<06:52, 2.78it/s] Training 1/1 epoch (loss 2.8189): 8%|β–Š | 106/1250 [00:44<06:52, 2.78it/s] Training 1/1 epoch (loss 2.8189): 9%|β–Š | 107/1250 [00:44<06:44, 2.82it/s] Training 1/1 epoch (loss 2.9953): 9%|β–Š | 107/1250 [00:44<06:44, 2.82it/s] Training 1/1 epoch (loss 2.9953): 9%|β–Š | 108/1250 [00:44<06:34, 2.89it/s] Training 1/1 epoch (loss 2.9417): 9%|β–Š | 108/1250 [00:44<06:34, 2.89it/s] Training 1/1 epoch (loss 2.9417): 9%|β–Š | 109/1250 [00:44<06:20, 3.00it/s] Training 1/1 epoch (loss 2.8546): 9%|β–Š | 109/1250 [00:45<06:20, 3.00it/s] Training 1/1 epoch (loss 2.8546): 9%|β–‰ | 110/1250 [00:45<06:14, 3.04it/s] Training 1/1 epoch (loss 2.8867): 9%|β–‰ | 110/1250 [00:45<06:14, 3.04it/s] Training 1/1 epoch (loss 2.8867): 9%|β–‰ | 111/1250 [00:45<06:16, 3.02it/s] Training 1/1 epoch (loss 2.7350): 9%|β–‰ | 111/1250 [00:45<06:16, 3.02it/s] Training 1/1 epoch (loss 2.7350): 9%|β–‰ | 112/1250 [00:45<06:17, 3.01it/s] Training 1/1 epoch (loss 2.7250): 9%|β–‰ | 112/1250 [00:46<06:17, 3.01it/s] Training 1/1 epoch (loss 2.7250): 9%|β–‰ | 113/1250 [00:46<06:31, 2.90it/s] Training 1/1 epoch (loss 2.6639): 9%|β–‰ | 113/1250 [00:46<06:31, 2.90it/s] Training 1/1 epoch (loss 2.6639): 9%|β–‰ | 114/1250 [00:46<06:35, 2.87it/s] Training 1/1 epoch (loss 2.7430): 9%|β–‰ | 114/1250 [00:46<06:35, 2.87it/s] Training 1/1 epoch (loss 2.7430): 9%|β–‰ | 115/1250 [00:46<06:32, 2.89it/s] Training 1/1 epoch (loss 2.8149): 9%|β–‰ | 115/1250 [00:47<06:32, 2.89it/s] Training 1/1 epoch (loss 2.8149): 9%|β–‰ | 116/1250 [00:47<06:16, 3.01it/s] Training 1/1 epoch (loss 2.7659): 9%|β–‰ | 116/1250 [00:47<06:16, 3.01it/s] Training 1/1 epoch (loss 2.7659): 9%|β–‰ | 117/1250 [00:47<06:18, 2.99it/s] Training 1/1 epoch (loss 2.7451): 9%|β–‰ | 117/1250 [00:47<06:18, 2.99it/s] Training 1/1 epoch (loss 2.7451): 9%|β–‰ | 118/1250 [00:47<06:18, 2.99it/s] Training 1/1 epoch (loss 2.8699): 9%|β–‰ | 118/1250 [00:48<06:18, 2.99it/s] Training 1/1 epoch (loss 2.8699): 10%|β–‰ | 119/1250 [00:48<06:24, 2.94it/s] Training 1/1 epoch (loss 2.9826): 10%|β–‰ | 119/1250 [00:48<06:24, 2.94it/s] Training 1/1 epoch (loss 2.9826): 10%|β–‰ | 120/1250 [00:48<06:40, 2.82it/s] Training 1/1 epoch (loss 2.9322): 10%|β–‰ | 120/1250 [00:49<06:40, 2.82it/s] Training 1/1 epoch (loss 2.9322): 10%|β–‰ | 121/1250 [00:49<06:31, 2.89it/s] Training 1/1 epoch (loss 2.7240): 10%|β–‰ | 121/1250 [00:49<06:31, 2.89it/s] Training 1/1 epoch (loss 2.7240): 10%|β–‰ | 122/1250 [00:49<06:16, 2.99it/s] Training 1/1 epoch (loss 2.8411): 10%|β–‰ | 122/1250 [00:49<06:16, 2.99it/s] Training 1/1 epoch (loss 2.8411): 10%|β–‰ | 123/1250 [00:49<06:11, 3.04it/s] Training 1/1 epoch (loss 2.6664): 10%|β–‰ | 123/1250 [00:49<06:11, 3.04it/s] Training 1/1 epoch (loss 2.6664): 10%|β–‰ | 124/1250 [00:49<05:57, 3.15it/s] Training 1/1 epoch (loss 2.5935): 10%|β–‰ | 124/1250 [00:50<05:57, 3.15it/s] Training 1/1 epoch (loss 2.5935): 10%|β–ˆ | 125/1250 [00:50<06:17, 2.98it/s] Training 1/1 epoch (loss 2.8362): 10%|β–ˆ | 125/1250 [00:50<06:17, 2.98it/s] Training 1/1 epoch (loss 2.8362): 10%|β–ˆ | 126/1250 [00:50<06:14, 3.00it/s] Training 1/1 epoch (loss 2.6994): 10%|β–ˆ | 126/1250 [00:50<06:14, 3.00it/s] Training 1/1 epoch (loss 2.6994): 10%|β–ˆ | 127/1250 [00:50<06:10, 3.03it/s] Training 1/1 epoch (loss 2.8743): 10%|β–ˆ | 127/1250 [00:51<06:10, 3.03it/s] Training 1/1 epoch (loss 2.8743): 10%|β–ˆ | 128/1250 [00:51<06:03, 3.09it/s] Training 1/1 epoch (loss 2.7251): 10%|β–ˆ | 128/1250 [00:51<06:03, 3.09it/s] Training 1/1 epoch (loss 2.7251): 10%|β–ˆ | 129/1250 [00:51<05:55, 3.16it/s] Training 1/1 epoch (loss 2.7631): 10%|β–ˆ | 129/1250 [00:51<05:55, 3.16it/s] Training 1/1 epoch (loss 2.7631): 10%|β–ˆ | 130/1250 [00:51<06:10, 3.03it/s] Training 1/1 epoch (loss 2.9168): 10%|β–ˆ | 130/1250 [00:52<06:10, 3.03it/s] Training 1/1 epoch (loss 2.9168): 10%|β–ˆ | 131/1250 [00:52<06:37, 2.81it/s] Training 1/1 epoch (loss 2.9141): 10%|β–ˆ | 131/1250 [00:52<06:37, 2.81it/s] Training 1/1 epoch (loss 2.9141): 11%|β–ˆ | 132/1250 [00:52<06:23, 2.92it/s] Training 1/1 epoch (loss 2.8750): 11%|β–ˆ | 132/1250 [00:52<06:23, 2.92it/s] Training 1/1 epoch (loss 2.8750): 11%|β–ˆ | 133/1250 [00:52<06:16, 2.97it/s] Training 1/1 epoch (loss 2.9954): 11%|β–ˆ | 133/1250 [00:53<06:16, 2.97it/s] Training 1/1 epoch (loss 2.9954): 11%|β–ˆ | 134/1250 [00:53<07:50, 2.37it/s] Training 1/1 epoch (loss 2.8709): 11%|β–ˆ | 134/1250 [00:54<07:50, 2.37it/s] Training 1/1 epoch (loss 2.8709): 11%|β–ˆ | 135/1250 [00:54<07:50, 2.37it/s] Training 1/1 epoch (loss 2.8426): 11%|β–ˆ | 135/1250 [00:54<07:50, 2.37it/s] Training 1/1 epoch (loss 2.8426): 11%|β–ˆ | 136/1250 [00:54<07:35, 2.45it/s] Training 1/1 epoch (loss 2.7097): 11%|β–ˆ | 136/1250 [00:54<07:35, 2.45it/s] Training 1/1 epoch (loss 2.7097): 11%|β–ˆ | 137/1250 [00:54<07:18, 2.54it/s] Training 1/1 epoch (loss 2.7272): 11%|β–ˆ | 137/1250 [00:55<07:18, 2.54it/s] Training 1/1 epoch (loss 2.7272): 11%|β–ˆ | 138/1250 [00:55<06:51, 2.70it/s] Training 1/1 epoch (loss 2.9721): 11%|β–ˆ | 138/1250 [00:55<06:51, 2.70it/s] Training 1/1 epoch (loss 2.9721): 11%|β–ˆ | 139/1250 [00:55<06:36, 2.80it/s] Training 1/1 epoch (loss 2.8974): 11%|β–ˆ | 139/1250 [00:55<06:36, 2.80it/s] Training 1/1 epoch (loss 2.8974): 11%|β–ˆ | 140/1250 [00:55<06:14, 2.97it/s] Training 1/1 epoch (loss 2.7571): 11%|β–ˆ | 140/1250 [00:56<06:14, 2.97it/s] Training 1/1 epoch (loss 2.7571): 11%|β–ˆβ– | 141/1250 [00:56<06:32, 2.83it/s] Training 1/1 epoch (loss 2.9425): 11%|β–ˆβ– | 141/1250 [00:56<06:32, 2.83it/s] Training 1/1 epoch (loss 2.9425): 11%|β–ˆβ– | 142/1250 [00:56<06:29, 2.84it/s] Training 1/1 epoch (loss 2.7775): 11%|β–ˆβ– | 142/1250 [00:56<06:29, 2.84it/s] Training 1/1 epoch (loss 2.7775): 11%|β–ˆβ– | 143/1250 [00:56<06:21, 2.90it/s] Training 1/1 epoch (loss 2.9933): 11%|β–ˆβ– | 143/1250 [00:57<06:21, 2.90it/s] Training 1/1 epoch (loss 2.9933): 12%|β–ˆβ– | 144/1250 [00:57<06:21, 2.90it/s] Training 1/1 epoch (loss 2.8607): 12%|β–ˆβ– | 144/1250 [00:57<06:21, 2.90it/s] Training 1/1 epoch (loss 2.8607): 12%|β–ˆβ– | 145/1250 [00:57<06:27, 2.85it/s] Training 1/1 epoch (loss 2.9262): 12%|β–ˆβ– | 145/1250 [00:57<06:27, 2.85it/s] Training 1/1 epoch (loss 2.9262): 12%|β–ˆβ– | 146/1250 [00:57<06:20, 2.90it/s] Training 1/1 epoch (loss 2.7681): 12%|β–ˆβ– | 146/1250 [00:58<06:20, 2.90it/s] Training 1/1 epoch (loss 2.7681): 12%|β–ˆβ– | 147/1250 [00:58<07:06, 2.58it/s] Training 1/1 epoch (loss 2.9255): 12%|β–ˆβ– | 147/1250 [00:58<07:06, 2.58it/s] Training 1/1 epoch (loss 2.9255): 12%|β–ˆβ– | 148/1250 [00:58<07:47, 2.36it/s] Training 1/1 epoch (loss 2.7794): 12%|β–ˆβ– | 148/1250 [00:59<07:47, 2.36it/s] Training 1/1 epoch (loss 2.7794): 12%|β–ˆβ– | 149/1250 [00:59<07:04, 2.59it/s] Training 1/1 epoch (loss 3.0724): 12%|β–ˆβ– | 149/1250 [00:59<07:04, 2.59it/s] Training 1/1 epoch (loss 3.0724): 12%|β–ˆβ– | 150/1250 [00:59<07:02, 2.61it/s] Training 1/1 epoch (loss 2.8239): 12%|β–ˆβ– | 150/1250 [00:59<07:02, 2.61it/s] Training 1/1 epoch (loss 2.8239): 12%|β–ˆβ– | 151/1250 [00:59<06:41, 2.74it/s] Training 1/1 epoch (loss 2.8889): 12%|β–ˆβ– | 151/1250 [01:00<06:41, 2.74it/s] Training 1/1 epoch (loss 2.8889): 12%|β–ˆβ– | 152/1250 [01:00<07:23, 2.48it/s] Training 1/1 epoch (loss 2.9986): 12%|β–ˆβ– | 152/1250 [01:00<07:23, 2.48it/s] Training 1/1 epoch (loss 2.9986): 12%|β–ˆβ– | 153/1250 [01:00<07:13, 2.53it/s] Training 1/1 epoch (loss 2.9800): 12%|β–ˆβ– | 153/1250 [01:00<07:13, 2.53it/s] Training 1/1 epoch (loss 2.9800): 12%|β–ˆβ– | 154/1250 [01:00<06:37, 2.75it/s] Training 1/1 epoch (loss 2.8152): 12%|β–ˆβ– | 154/1250 [01:01<06:37, 2.75it/s] Training 1/1 epoch (loss 2.8152): 12%|β–ˆβ– | 155/1250 [01:01<06:18, 2.89it/s] Training 1/1 epoch (loss 3.0026): 12%|β–ˆβ– | 155/1250 [01:01<06:18, 2.89it/s] Training 1/1 epoch (loss 3.0026): 12%|β–ˆβ– | 156/1250 [01:01<06:10, 2.96it/s] Training 1/1 epoch (loss 2.9617): 12%|β–ˆβ– | 156/1250 [01:01<06:10, 2.96it/s] Training 1/1 epoch (loss 2.9617): 13%|β–ˆβ–Ž | 157/1250 [01:01<05:53, 3.09it/s] Training 1/1 epoch (loss 2.9243): 13%|β–ˆβ–Ž | 157/1250 [01:02<05:53, 3.09it/s] Training 1/1 epoch (loss 2.9243): 13%|β–ˆβ–Ž | 158/1250 [01:02<05:59, 3.04it/s] Training 1/1 epoch (loss 3.1867): 13%|β–ˆβ–Ž | 158/1250 [01:02<05:59, 3.04it/s] Training 1/1 epoch (loss 3.1867): 13%|β–ˆβ–Ž | 159/1250 [01:02<06:05, 2.99it/s] Training 1/1 epoch (loss 2.7637): 13%|β–ˆβ–Ž | 159/1250 [01:02<06:05, 2.99it/s] Training 1/1 epoch (loss 2.7637): 13%|β–ˆβ–Ž | 160/1250 [01:02<06:18, 2.88it/s] Training 1/1 epoch (loss 2.8238): 13%|β–ˆβ–Ž | 160/1250 [01:03<06:18, 2.88it/s] Training 1/1 epoch (loss 2.8238): 13%|β–ˆβ–Ž | 161/1250 [01:03<06:09, 2.95it/s] Training 1/1 epoch (loss 2.7362): 13%|β–ˆβ–Ž | 161/1250 [01:03<06:09, 2.95it/s] Training 1/1 epoch (loss 2.7362): 13%|β–ˆβ–Ž | 162/1250 [01:03<05:56, 3.05it/s] Training 1/1 epoch (loss 2.9208): 13%|β–ˆβ–Ž | 162/1250 [01:03<05:56, 3.05it/s] Training 1/1 epoch (loss 2.9208): 13%|β–ˆβ–Ž | 163/1250 [01:03<06:06, 2.97it/s] Training 1/1 epoch (loss 2.7017): 13%|β–ˆβ–Ž | 163/1250 [01:04<06:06, 2.97it/s] Training 1/1 epoch (loss 2.7017): 13%|β–ˆβ–Ž | 164/1250 [01:04<06:27, 2.80it/s] Training 1/1 epoch (loss 2.8445): 13%|β–ˆβ–Ž | 164/1250 [01:04<06:27, 2.80it/s] Training 1/1 epoch (loss 2.8445): 13%|β–ˆβ–Ž | 165/1250 [01:04<06:26, 2.81it/s] Training 1/1 epoch (loss 2.8219): 13%|β–ˆβ–Ž | 165/1250 [01:04<06:26, 2.81it/s] Training 1/1 epoch (loss 2.8219): 13%|β–ˆβ–Ž | 166/1250 [01:04<06:10, 2.93it/s] Training 1/1 epoch (loss 2.7584): 13%|β–ˆβ–Ž | 166/1250 [01:05<06:10, 2.93it/s] Training 1/1 epoch (loss 2.7584): 13%|β–ˆβ–Ž | 167/1250 [01:05<05:57, 3.03it/s] Training 1/1 epoch (loss 2.7869): 13%|β–ˆβ–Ž | 167/1250 [01:05<05:57, 3.03it/s] Training 1/1 epoch (loss 2.7869): 13%|β–ˆβ–Ž | 168/1250 [01:05<06:05, 2.96it/s] Training 1/1 epoch (loss 2.8387): 13%|β–ˆβ–Ž | 168/1250 [01:05<06:05, 2.96it/s] Training 1/1 epoch (loss 2.8387): 14%|β–ˆβ–Ž | 169/1250 [01:05<06:08, 2.93it/s] Training 1/1 epoch (loss 2.9099): 14%|β–ˆβ–Ž | 169/1250 [01:06<06:08, 2.93it/s] Training 1/1 epoch (loss 2.9099): 14%|β–ˆβ–Ž | 170/1250 [01:06<05:57, 3.02it/s] Training 1/1 epoch (loss 2.8050): 14%|β–ˆβ–Ž | 170/1250 [01:06<05:57, 3.02it/s] Training 1/1 epoch (loss 2.8050): 14%|β–ˆβ–Ž | 171/1250 [01:06<06:11, 2.90it/s] Training 1/1 epoch (loss 2.7045): 14%|β–ˆβ–Ž | 171/1250 [01:06<06:11, 2.90it/s] Training 1/1 epoch (loss 2.7045): 14%|β–ˆβ– | 172/1250 [01:06<06:01, 2.98it/s] Training 1/1 epoch (loss 2.6318): 14%|β–ˆβ– | 172/1250 [01:07<06:01, 2.98it/s] Training 1/1 epoch (loss 2.6318): 14%|β–ˆβ– | 173/1250 [01:07<05:53, 3.05it/s] Training 1/1 epoch (loss 2.9167): 14%|β–ˆβ– | 173/1250 [01:07<05:53, 3.05it/s] Training 1/1 epoch (loss 2.9167): 14%|β–ˆβ– | 174/1250 [01:07<06:00, 2.99it/s] Training 1/1 epoch (loss 3.0358): 14%|β–ˆβ– | 174/1250 [01:07<06:00, 2.99it/s] Training 1/1 epoch (loss 3.0358): 14%|β–ˆβ– | 175/1250 [01:07<05:59, 2.99it/s] Training 1/1 epoch (loss 2.7926): 14%|β–ˆβ– | 175/1250 [01:08<05:59, 2.99it/s] Training 1/1 epoch (loss 2.7926): 14%|β–ˆβ– | 176/1250 [01:08<06:05, 2.94it/s] Training 1/1 epoch (loss 3.0718): 14%|β–ˆβ– | 176/1250 [01:08<06:05, 2.94it/s] Training 1/1 epoch (loss 3.0718): 14%|β–ˆβ– | 177/1250 [01:08<06:10, 2.89it/s] Training 1/1 epoch (loss 2.9168): 14%|β–ˆβ– | 177/1250 [01:09<06:10, 2.89it/s] Training 1/1 epoch (loss 2.9168): 14%|β–ˆβ– | 178/1250 [01:09<06:25, 2.78it/s] Training 1/1 epoch (loss 3.0796): 14%|β–ˆβ– | 178/1250 [01:09<06:25, 2.78it/s] Training 1/1 epoch (loss 3.0796): 14%|β–ˆβ– | 179/1250 [01:09<06:03, 2.95it/s] Training 1/1 epoch (loss 2.9889): 14%|β–ˆβ– | 179/1250 [01:09<06:03, 2.95it/s] Training 1/1 epoch (loss 2.9889): 14%|β–ˆβ– | 180/1250 [01:09<05:57, 2.99it/s] Training 1/1 epoch (loss 2.9761): 14%|β–ˆβ– | 180/1250 [01:09<05:57, 2.99it/s] Training 1/1 epoch (loss 2.9761): 14%|β–ˆβ– | 181/1250 [01:09<05:46, 3.09it/s] Training 1/1 epoch (loss 2.8659): 14%|β–ˆβ– | 181/1250 [01:10<05:46, 3.09it/s] Training 1/1 epoch (loss 2.8659): 15%|β–ˆβ– | 182/1250 [01:10<05:44, 3.10it/s] Training 1/1 epoch (loss 2.9024): 15%|β–ˆβ– | 182/1250 [01:10<05:44, 3.10it/s] Training 1/1 epoch (loss 2.9024): 15%|β–ˆβ– | 183/1250 [01:10<06:12, 2.86it/s] Training 1/1 epoch (loss 2.9271): 15%|β–ˆβ– | 183/1250 [01:11<06:12, 2.86it/s] Training 1/1 epoch (loss 2.9271): 15%|β–ˆβ– | 184/1250 [01:11<06:02, 2.94it/s] Training 1/1 epoch (loss 2.8184): 15%|β–ˆβ– | 184/1250 [01:11<06:02, 2.94it/s] Training 1/1 epoch (loss 2.8184): 15%|β–ˆβ– | 185/1250 [01:11<05:56, 2.99it/s] Training 1/1 epoch (loss 2.8956): 15%|β–ˆβ– | 185/1250 [01:11<05:56, 2.99it/s] Training 1/1 epoch (loss 2.8956): 15%|β–ˆβ– | 186/1250 [01:11<05:51, 3.03it/s] Training 1/1 epoch (loss 2.9444): 15%|β–ˆβ– | 186/1250 [01:11<05:51, 3.03it/s] Training 1/1 epoch (loss 2.9444): 15%|β–ˆβ– | 187/1250 [01:11<05:43, 3.09it/s] Training 1/1 epoch (loss 2.6345): 15%|β–ˆβ– | 187/1250 [01:12<05:43, 3.09it/s] Training 1/1 epoch (loss 2.6345): 15%|β–ˆβ–Œ | 188/1250 [01:12<05:52, 3.01it/s] Training 1/1 epoch (loss 2.7368): 15%|β–ˆβ–Œ | 188/1250 [01:12<05:52, 3.01it/s] Training 1/1 epoch (loss 2.7368): 15%|β–ˆβ–Œ | 189/1250 [01:12<05:55, 2.99it/s] Training 1/1 epoch (loss 2.7126): 15%|β–ˆβ–Œ | 189/1250 [01:12<05:55, 2.99it/s] Training 1/1 epoch (loss 2.7126): 15%|β–ˆβ–Œ | 190/1250 [01:12<05:44, 3.08it/s] Training 1/1 epoch (loss 2.9198): 15%|β–ˆβ–Œ | 190/1250 [01:13<05:44, 3.08it/s] Training 1/1 epoch (loss 2.9198): 15%|β–ˆβ–Œ | 191/1250 [01:13<06:03, 2.92it/s] Training 1/1 epoch (loss 3.0116): 15%|β–ˆβ–Œ | 191/1250 [01:13<06:03, 2.92it/s] Training 1/1 epoch (loss 3.0116): 15%|β–ˆβ–Œ | 192/1250 [01:13<05:53, 2.99it/s] Training 1/1 epoch (loss 2.8323): 15%|β–ˆβ–Œ | 192/1250 [01:14<05:53, 2.99it/s] Training 1/1 epoch (loss 2.8323): 15%|β–ˆβ–Œ | 193/1250 [01:14<06:04, 2.90it/s] Training 1/1 epoch (loss 2.9665): 15%|β–ˆβ–Œ | 193/1250 [01:14<06:04, 2.90it/s] Training 1/1 epoch (loss 2.9665): 16%|β–ˆβ–Œ | 194/1250 [01:14<06:17, 2.79it/s] Training 1/1 epoch (loss 2.8471): 16%|β–ˆβ–Œ | 194/1250 [01:14<06:17, 2.79it/s] Training 1/1 epoch (loss 2.8471): 16%|β–ˆβ–Œ | 195/1250 [01:14<06:26, 2.73it/s] Training 1/1 epoch (loss 2.7015): 16%|β–ˆβ–Œ | 195/1250 [01:15<06:26, 2.73it/s] Training 1/1 epoch (loss 2.7015): 16%|β–ˆβ–Œ | 196/1250 [01:15<06:20, 2.77it/s] Training 1/1 epoch (loss 2.8216): 16%|β–ˆβ–Œ | 196/1250 [01:15<06:20, 2.77it/s] Training 1/1 epoch (loss 2.8216): 16%|β–ˆβ–Œ | 197/1250 [01:15<06:09, 2.85it/s] Training 1/1 epoch (loss 2.9587): 16%|β–ˆβ–Œ | 197/1250 [01:15<06:09, 2.85it/s] Training 1/1 epoch (loss 2.9587): 16%|β–ˆβ–Œ | 198/1250 [01:15<05:59, 2.93it/s] Training 1/1 epoch (loss 2.9185): 16%|β–ˆβ–Œ | 198/1250 [01:16<05:59, 2.93it/s] Training 1/1 epoch (loss 2.9185): 16%|β–ˆβ–Œ | 199/1250 [01:16<05:46, 3.04it/s] Training 1/1 epoch (loss 2.8770): 16%|β–ˆβ–Œ | 199/1250 [01:16<05:46, 3.04it/s] Training 1/1 epoch (loss 2.8770): 16%|β–ˆβ–Œ | 200/1250 [01:16<06:10, 2.84it/s] Training 1/1 epoch (loss 2.9282): 16%|β–ˆβ–Œ | 200/1250 [01:16<06:10, 2.84it/s] Training 1/1 epoch (loss 2.9282): 16%|β–ˆβ–Œ | 201/1250 [01:16<06:06, 2.87it/s] Training 1/1 epoch (loss 2.7861): 16%|β–ˆβ–Œ | 201/1250 [01:17<06:06, 2.87it/s] Training 1/1 epoch (loss 2.7861): 16%|β–ˆβ–Œ | 202/1250 [01:17<05:57, 2.93it/s] Training 1/1 epoch (loss 2.7526): 16%|β–ˆβ–Œ | 202/1250 [01:17<05:57, 2.93it/s] Training 1/1 epoch (loss 2.7526): 16%|β–ˆβ–Œ | 203/1250 [01:17<05:52, 2.97it/s] Training 1/1 epoch (loss 2.7059): 16%|β–ˆβ–Œ | 203/1250 [01:17<05:52, 2.97it/s] Training 1/1 epoch (loss 2.7059): 16%|β–ˆβ–‹ | 204/1250 [01:17<05:47, 3.01it/s] Training 1/1 epoch (loss 2.7238): 16%|β–ˆβ–‹ | 204/1250 [01:18<05:47, 3.01it/s] Training 1/1 epoch (loss 2.7238): 16%|β–ˆβ–‹ | 205/1250 [01:18<05:42, 3.05it/s] Training 1/1 epoch (loss 2.9659): 16%|β–ˆβ–‹ | 205/1250 [01:18<05:42, 3.05it/s] Training 1/1 epoch (loss 2.9659): 16%|β–ˆβ–‹ | 206/1250 [01:18<06:09, 2.83it/s] Training 1/1 epoch (loss 2.8513): 16%|β–ˆβ–‹ | 206/1250 [01:18<06:09, 2.83it/s] Training 1/1 epoch (loss 2.8513): 17%|β–ˆβ–‹ | 207/1250 [01:18<06:06, 2.84it/s] Training 1/1 epoch (loss 2.8415): 17%|β–ˆβ–‹ | 207/1250 [01:19<06:06, 2.84it/s] Training 1/1 epoch (loss 2.8415): 17%|β–ˆβ–‹ | 208/1250 [01:19<05:56, 2.93it/s] Training 1/1 epoch (loss 2.8569): 17%|β–ˆβ–‹ | 208/1250 [01:19<05:56, 2.93it/s] Training 1/1 epoch (loss 2.8569): 17%|β–ˆβ–‹ | 209/1250 [01:19<06:01, 2.88it/s] Training 1/1 epoch (loss 2.9371): 17%|β–ˆβ–‹ | 209/1250 [01:19<06:01, 2.88it/s] Training 1/1 epoch (loss 2.9371): 17%|β–ˆβ–‹ | 210/1250 [01:19<05:54, 2.94it/s] Training 1/1 epoch (loss 2.8593): 17%|β–ˆβ–‹ | 210/1250 [01:20<05:54, 2.94it/s] Training 1/1 epoch (loss 2.8593): 17%|β–ˆβ–‹ | 211/1250 [01:20<05:43, 3.02it/s] Training 1/1 epoch (loss 3.0368): 17%|β–ˆβ–‹ | 211/1250 [01:20<05:43, 3.02it/s] Training 1/1 epoch (loss 3.0368): 17%|β–ˆβ–‹ | 212/1250 [01:20<05:43, 3.02it/s] Training 1/1 epoch (loss 2.7631): 17%|β–ˆβ–‹ | 212/1250 [01:21<05:43, 3.02it/s] Training 1/1 epoch (loss 2.7631): 17%|β–ˆβ–‹ | 213/1250 [01:21<06:26, 2.68it/s] Training 1/1 epoch (loss 2.7101): 17%|β–ˆβ–‹ | 213/1250 [01:21<06:26, 2.68it/s] Training 1/1 epoch (loss 2.7101): 17%|β–ˆβ–‹ | 214/1250 [01:21<06:06, 2.82it/s] Training 1/1 epoch (loss 2.6916): 17%|β–ˆβ–‹ | 214/1250 [01:21<06:06, 2.82it/s] Training 1/1 epoch (loss 2.6916): 17%|β–ˆβ–‹ | 215/1250 [01:21<05:51, 2.94it/s] Training 1/1 epoch (loss 2.5994): 17%|β–ˆβ–‹ | 215/1250 [01:21<05:51, 2.94it/s] Training 1/1 epoch (loss 2.5994): 17%|β–ˆβ–‹ | 216/1250 [01:21<05:48, 2.96it/s] Training 1/1 epoch (loss 2.7197): 17%|β–ˆβ–‹ | 216/1250 [01:22<05:48, 2.96it/s] Training 1/1 epoch (loss 2.7197): 17%|β–ˆβ–‹ | 217/1250 [01:22<05:52, 2.93it/s] Training 1/1 epoch (loss 2.9473): 17%|β–ˆβ–‹ | 217/1250 [01:22<05:52, 2.93it/s] Training 1/1 epoch (loss 2.9473): 17%|β–ˆβ–‹ | 218/1250 [01:22<05:54, 2.91it/s] Training 1/1 epoch (loss 2.9564): 17%|β–ˆβ–‹ | 218/1250 [01:23<05:54, 2.91it/s] Training 1/1 epoch (loss 2.9564): 18%|β–ˆβ–Š | 219/1250 [01:23<06:08, 2.80it/s] Training 1/1 epoch (loss 2.6365): 18%|β–ˆβ–Š | 219/1250 [01:23<06:08, 2.80it/s] Training 1/1 epoch (loss 2.6365): 18%|β–ˆβ–Š | 220/1250 [01:23<07:19, 2.34it/s] Training 1/1 epoch (loss 2.7110): 18%|β–ˆβ–Š | 220/1250 [01:24<07:19, 2.34it/s] Training 1/1 epoch (loss 2.7110): 18%|β–ˆβ–Š | 221/1250 [01:24<06:53, 2.49it/s] Training 1/1 epoch (loss 2.9319): 18%|β–ˆβ–Š | 221/1250 [01:24<06:53, 2.49it/s] Training 1/1 epoch (loss 2.9319): 18%|β–ˆβ–Š | 222/1250 [01:24<06:32, 2.62it/s] Training 1/1 epoch (loss 2.7660): 18%|β–ˆβ–Š | 222/1250 [01:24<06:32, 2.62it/s] Training 1/1 epoch (loss 2.7660): 18%|β–ˆβ–Š | 223/1250 [01:24<06:32, 2.62it/s] Training 1/1 epoch (loss 2.7666): 18%|β–ˆβ–Š | 223/1250 [01:25<06:32, 2.62it/s] Training 1/1 epoch (loss 2.7666): 18%|β–ˆβ–Š | 224/1250 [01:25<06:43, 2.54it/s] Training 1/1 epoch (loss 2.6435): 18%|β–ˆβ–Š | 224/1250 [01:25<06:43, 2.54it/s] Training 1/1 epoch (loss 2.6435): 18%|β–ˆβ–Š | 225/1250 [01:25<06:18, 2.71it/s] Training 1/1 epoch (loss 2.9251): 18%|β–ˆβ–Š | 225/1250 [01:25<06:18, 2.71it/s] Training 1/1 epoch (loss 2.9251): 18%|β–ˆβ–Š | 226/1250 [01:25<06:00, 2.84it/s] Training 1/1 epoch (loss 2.9908): 18%|β–ˆβ–Š | 226/1250 [01:26<06:00, 2.84it/s] Training 1/1 epoch (loss 2.9908): 18%|β–ˆβ–Š | 227/1250 [01:26<06:02, 2.82it/s] Training 1/1 epoch (loss 2.8930): 18%|β–ˆβ–Š | 227/1250 [01:26<06:02, 2.82it/s] Training 1/1 epoch (loss 2.8930): 18%|β–ˆβ–Š | 228/1250 [01:26<06:08, 2.78it/s] Training 1/1 epoch (loss 2.8373): 18%|β–ˆβ–Š | 228/1250 [01:26<06:08, 2.78it/s] Training 1/1 epoch (loss 2.8373): 18%|β–ˆβ–Š | 229/1250 [01:26<06:03, 2.81it/s] Training 1/1 epoch (loss 2.7076): 18%|β–ˆβ–Š | 229/1250 [01:27<06:03, 2.81it/s] Training 1/1 epoch (loss 2.7076): 18%|β–ˆβ–Š | 230/1250 [01:27<06:05, 2.79it/s] Training 1/1 epoch (loss 2.9365): 18%|β–ˆβ–Š | 230/1250 [01:27<06:05, 2.79it/s] Training 1/1 epoch (loss 2.9365): 18%|β–ˆβ–Š | 231/1250 [01:27<05:55, 2.87it/s] Training 1/1 epoch (loss 2.9870): 18%|β–ˆβ–Š | 231/1250 [01:27<05:55, 2.87it/s] Training 1/1 epoch (loss 2.9870): 19%|β–ˆβ–Š | 232/1250 [01:27<06:09, 2.76it/s] Training 1/1 epoch (loss 2.8369): 19%|β–ˆβ–Š | 232/1250 [01:28<06:09, 2.76it/s] Training 1/1 epoch (loss 2.8369): 19%|β–ˆβ–Š | 233/1250 [01:28<06:28, 2.62it/s] Training 1/1 epoch (loss 2.8622): 19%|β–ˆβ–Š | 233/1250 [01:28<06:28, 2.62it/s] Training 1/1 epoch (loss 2.8622): 19%|β–ˆβ–Š | 234/1250 [01:28<07:43, 2.19it/s] Training 1/1 epoch (loss 2.8816): 19%|β–ˆβ–Š | 234/1250 [01:29<07:43, 2.19it/s] Training 1/1 epoch (loss 2.8816): 19%|β–ˆβ–‰ | 235/1250 [01:29<07:24, 2.28it/s] Training 1/1 epoch (loss 2.9095): 19%|β–ˆβ–‰ | 235/1250 [01:29<07:24, 2.28it/s] Training 1/1 epoch (loss 2.9095): 19%|β–ˆβ–‰ | 236/1250 [01:29<07:22, 2.29it/s] Training 1/1 epoch (loss 2.6702): 19%|β–ˆβ–‰ | 236/1250 [01:30<07:22, 2.29it/s] Training 1/1 epoch (loss 2.6702): 19%|β–ˆβ–‰ | 237/1250 [01:30<07:38, 2.21it/s] Training 1/1 epoch (loss 2.8421): 19%|β–ˆβ–‰ | 237/1250 [01:30<07:38, 2.21it/s] Training 1/1 epoch (loss 2.8421): 19%|β–ˆβ–‰ | 238/1250 [01:30<07:39, 2.20it/s] Training 1/1 epoch (loss 2.7217): 19%|β–ˆβ–‰ | 238/1250 [01:31<07:39, 2.20it/s] Training 1/1 epoch (loss 2.7217): 19%|β–ˆβ–‰ | 239/1250 [01:31<07:31, 2.24it/s] Training 1/1 epoch (loss 2.7442): 19%|β–ˆβ–‰ | 239/1250 [01:31<07:31, 2.24it/s] Training 1/1 epoch (loss 2.7442): 19%|β–ˆβ–‰ | 240/1250 [01:31<07:23, 2.28it/s] Training 1/1 epoch (loss 2.6688): 19%|β–ˆβ–‰ | 240/1250 [01:31<07:23, 2.28it/s] Training 1/1 epoch (loss 2.6688): 19%|β–ˆβ–‰ | 241/1250 [01:31<06:47, 2.48it/s] Training 1/1 epoch (loss 2.7901): 19%|β–ˆβ–‰ | 241/1250 [01:32<06:47, 2.48it/s] Training 1/1 epoch (loss 2.7901): 19%|β–ˆβ–‰ | 242/1250 [01:32<06:16, 2.67it/s] Training 1/1 epoch (loss 2.6722): 19%|β–ˆβ–‰ | 242/1250 [01:32<06:16, 2.67it/s] Training 1/1 epoch (loss 2.6722): 19%|β–ˆβ–‰ | 243/1250 [01:32<06:04, 2.76it/s] Training 1/1 epoch (loss 2.6406): 19%|β–ˆβ–‰ | 243/1250 [01:32<06:04, 2.76it/s] Training 1/1 epoch (loss 2.6406): 20%|β–ˆβ–‰ | 244/1250 [01:32<06:02, 2.77it/s] Training 1/1 epoch (loss 2.7727): 20%|β–ˆβ–‰ | 244/1250 [01:33<06:02, 2.77it/s] Training 1/1 epoch (loss 2.7727): 20%|β–ˆβ–‰ | 245/1250 [01:33<06:04, 2.76it/s] Training 1/1 epoch (loss 2.8293): 20%|β–ˆβ–‰ | 245/1250 [01:33<06:04, 2.76it/s] Training 1/1 epoch (loss 2.8293): 20%|β–ˆβ–‰ | 246/1250 [01:33<05:44, 2.91it/s] Training 1/1 epoch (loss 2.7062): 20%|β–ˆβ–‰ | 246/1250 [01:33<05:44, 2.91it/s] Training 1/1 epoch (loss 2.7062): 20%|β–ˆβ–‰ | 247/1250 [01:33<05:52, 2.85it/s] Training 1/1 epoch (loss 2.7384): 20%|β–ˆβ–‰ | 247/1250 [01:34<05:52, 2.85it/s] Training 1/1 epoch (loss 2.7384): 20%|β–ˆβ–‰ | 248/1250 [01:34<05:54, 2.82it/s] Training 1/1 epoch (loss 2.9392): 20%|β–ˆβ–‰ | 248/1250 [01:34<05:54, 2.82it/s] Training 1/1 epoch (loss 2.9392): 20%|β–ˆβ–‰ | 249/1250 [01:34<06:01, 2.77it/s] Training 1/1 epoch (loss 2.8576): 20%|β–ˆβ–‰ | 249/1250 [01:35<06:01, 2.77it/s] Training 1/1 epoch (loss 2.8576): 20%|β–ˆβ–ˆ | 250/1250 [01:35<05:48, 2.87it/s] Training 1/1 epoch (loss 2.7553): 20%|β–ˆβ–ˆ | 250/1250 [01:35<05:48, 2.87it/s] Training 1/1 epoch (loss 2.7553): 20%|β–ˆβ–ˆ | 251/1250 [01:35<05:39, 2.94it/s] Training 1/1 epoch (loss 2.8370): 20%|β–ˆβ–ˆ | 251/1250 [01:35<05:39, 2.94it/s] Training 1/1 epoch (loss 2.8370): 20%|β–ˆβ–ˆ | 252/1250 [01:35<05:29, 3.03it/s] Training 1/1 epoch (loss 2.7298): 20%|β–ˆβ–ˆ | 252/1250 [01:35<05:29, 3.03it/s] Training 1/1 epoch (loss 2.7298): 20%|β–ˆβ–ˆ | 253/1250 [01:35<05:29, 3.02it/s] Training 1/1 epoch (loss 2.8798): 20%|β–ˆβ–ˆ | 253/1250 [01:36<05:29, 3.02it/s] Training 1/1 epoch (loss 2.8798): 20%|β–ˆβ–ˆ | 254/1250 [01:36<05:39, 2.93it/s] Training 1/1 epoch (loss 2.7551): 20%|β–ˆβ–ˆ | 254/1250 [01:36<05:39, 2.93it/s] Training 1/1 epoch (loss 2.7551): 20%|β–ˆβ–ˆ | 255/1250 [01:36<05:45, 2.88it/s] Training 1/1 epoch (loss 2.8803): 20%|β–ˆβ–ˆ | 255/1250 [01:37<05:45, 2.88it/s] Training 1/1 epoch (loss 2.8803): 20%|β–ˆβ–ˆ | 256/1250 [01:37<05:54, 2.80it/s] Training 1/1 epoch (loss 2.8360): 20%|β–ˆβ–ˆ | 256/1250 [01:37<05:54, 2.80it/s] Training 1/1 epoch (loss 2.8360): 21%|β–ˆβ–ˆ | 257/1250 [01:37<05:55, 2.79it/s] Training 1/1 epoch (loss 2.8185): 21%|β–ˆβ–ˆ | 257/1250 [01:37<05:55, 2.79it/s] Training 1/1 epoch (loss 2.8185): 21%|β–ˆβ–ˆ | 258/1250 [01:37<05:44, 2.88it/s] Training 1/1 epoch (loss 2.7580): 21%|β–ˆβ–ˆ | 258/1250 [01:38<05:44, 2.88it/s] Training 1/1 epoch (loss 2.7580): 21%|β–ˆβ–ˆ | 259/1250 [01:38<05:36, 2.95it/s] Training 1/1 epoch (loss 2.8601): 21%|β–ˆβ–ˆ | 259/1250 [01:38<05:36, 2.95it/s] Training 1/1 epoch (loss 2.8601): 21%|β–ˆβ–ˆ | 260/1250 [01:38<05:45, 2.86it/s] Training 1/1 epoch (loss 2.7503): 21%|β–ˆβ–ˆ | 260/1250 [01:38<05:45, 2.86it/s] Training 1/1 epoch (loss 2.7503): 21%|β–ˆβ–ˆ | 261/1250 [01:38<05:52, 2.81it/s] Training 1/1 epoch (loss 2.7581): 21%|β–ˆβ–ˆ | 261/1250 [01:39<05:52, 2.81it/s] Training 1/1 epoch (loss 2.7581): 21%|β–ˆβ–ˆ | 262/1250 [01:39<05:57, 2.76it/s] Training 1/1 epoch (loss 2.7755): 21%|β–ˆβ–ˆ | 262/1250 [01:39<05:57, 2.76it/s] Training 1/1 epoch (loss 2.7755): 21%|β–ˆβ–ˆ | 263/1250 [01:39<05:53, 2.79it/s] Training 1/1 epoch (loss 2.5074): 21%|β–ˆβ–ˆ | 263/1250 [01:39<05:53, 2.79it/s] Training 1/1 epoch (loss 2.5074): 21%|β–ˆβ–ˆ | 264/1250 [01:39<05:48, 2.83it/s] Training 1/1 epoch (loss 2.8189): 21%|β–ˆβ–ˆ | 264/1250 [01:40<05:48, 2.83it/s] Training 1/1 epoch (loss 2.8189): 21%|β–ˆβ–ˆ | 265/1250 [01:40<05:41, 2.88it/s] Training 1/1 epoch (loss 2.8258): 21%|β–ˆβ–ˆ | 265/1250 [01:40<05:41, 2.88it/s] Training 1/1 epoch (loss 2.8258): 21%|β–ˆβ–ˆβ– | 266/1250 [01:40<05:34, 2.94it/s] Training 1/1 epoch (loss 2.9070): 21%|β–ˆβ–ˆβ– | 266/1250 [01:40<05:34, 2.94it/s] Training 1/1 epoch (loss 2.9070): 21%|β–ˆβ–ˆβ– | 267/1250 [01:40<05:32, 2.96it/s] Training 1/1 epoch (loss 2.8604): 21%|β–ˆβ–ˆβ– | 267/1250 [01:41<05:32, 2.96it/s] Training 1/1 epoch (loss 2.8604): 21%|β–ˆβ–ˆβ– | 268/1250 [01:41<05:32, 2.95it/s] Training 1/1 epoch (loss 2.8039): 21%|β–ˆβ–ˆβ– | 268/1250 [01:41<05:32, 2.95it/s] Training 1/1 epoch (loss 2.8039): 22%|β–ˆβ–ˆβ– | 269/1250 [01:41<05:44, 2.85it/s] Training 1/1 epoch (loss 2.7712): 22%|β–ˆβ–ˆβ– | 269/1250 [01:41<05:44, 2.85it/s] Training 1/1 epoch (loss 2.7712): 22%|β–ˆβ–ˆβ– | 270/1250 [01:41<05:31, 2.96it/s] Training 1/1 epoch (loss 2.6228): 22%|β–ˆβ–ˆβ– | 270/1250 [01:42<05:31, 2.96it/s] Training 1/1 epoch (loss 2.6228): 22%|β–ˆβ–ˆβ– | 271/1250 [01:42<05:20, 3.05it/s] Training 1/1 epoch (loss 2.5480): 22%|β–ˆβ–ˆβ– | 271/1250 [01:42<05:20, 3.05it/s] Training 1/1 epoch (loss 2.5480): 22%|β–ˆβ–ˆβ– | 272/1250 [01:42<05:26, 2.99it/s] Training 1/1 epoch (loss 2.7337): 22%|β–ˆβ–ˆβ– | 272/1250 [01:42<05:26, 2.99it/s] Training 1/1 epoch (loss 2.7337): 22%|β–ˆβ–ˆβ– | 273/1250 [01:42<05:42, 2.85it/s] Training 1/1 epoch (loss 2.8524): 22%|β–ˆβ–ˆβ– | 273/1250 [01:43<05:42, 2.85it/s] Training 1/1 epoch (loss 2.8524): 22%|β–ˆβ–ˆβ– | 274/1250 [01:43<05:35, 2.91it/s] Training 1/1 epoch (loss 2.9081): 22%|β–ˆβ–ˆβ– | 274/1250 [01:43<05:35, 2.91it/s] Training 1/1 epoch (loss 2.9081): 22%|β–ˆβ–ˆβ– | 275/1250 [01:43<05:29, 2.96it/s] Training 1/1 epoch (loss 2.8290): 22%|β–ˆβ–ˆβ– | 275/1250 [01:43<05:29, 2.96it/s] Training 1/1 epoch (loss 2.8290): 22%|β–ˆβ–ˆβ– | 276/1250 [01:43<05:40, 2.86it/s] Training 1/1 epoch (loss 2.7824): 22%|β–ˆβ–ˆβ– | 276/1250 [01:44<05:40, 2.86it/s] Training 1/1 epoch (loss 2.7824): 22%|β–ˆβ–ˆβ– | 277/1250 [01:44<05:52, 2.76it/s] Training 1/1 epoch (loss 2.9044): 22%|β–ˆβ–ˆβ– | 277/1250 [01:44<05:52, 2.76it/s] Training 1/1 epoch (loss 2.9044): 22%|β–ˆβ–ˆβ– | 278/1250 [01:44<05:56, 2.73it/s] Training 1/1 epoch (loss 2.9523): 22%|β–ˆβ–ˆβ– | 278/1250 [01:45<05:56, 2.73it/s] Training 1/1 epoch (loss 2.9523): 22%|β–ˆβ–ˆβ– | 279/1250 [01:45<05:59, 2.70it/s] Training 1/1 epoch (loss 2.7367): 22%|β–ˆβ–ˆβ– | 279/1250 [01:45<05:59, 2.70it/s] Training 1/1 epoch (loss 2.7367): 22%|β–ˆβ–ˆβ– | 280/1250 [01:45<05:48, 2.78it/s] Training 1/1 epoch (loss 2.9114): 22%|β–ˆβ–ˆβ– | 280/1250 [01:45<05:48, 2.78it/s] Training 1/1 epoch (loss 2.9114): 22%|β–ˆβ–ˆβ– | 281/1250 [01:45<05:44, 2.81it/s] Training 1/1 epoch (loss 2.7701): 22%|β–ˆβ–ˆβ– | 281/1250 [01:46<05:44, 2.81it/s] Training 1/1 epoch (loss 2.7701): 23%|β–ˆβ–ˆβ–Ž | 282/1250 [01:46<05:31, 2.92it/s] Training 1/1 epoch (loss 2.8238): 23%|β–ˆβ–ˆβ–Ž | 282/1250 [01:46<05:31, 2.92it/s] Training 1/1 epoch (loss 2.8238): 23%|β–ˆβ–ˆβ–Ž | 283/1250 [01:46<05:32, 2.91it/s] Training 1/1 epoch (loss 2.9327): 23%|β–ˆβ–ˆβ–Ž | 283/1250 [01:46<05:32, 2.91it/s] Training 1/1 epoch (loss 2.9327): 23%|β–ˆβ–ˆβ–Ž | 284/1250 [01:46<05:31, 2.92it/s] Training 1/1 epoch (loss 2.7839): 23%|β–ˆβ–ˆβ–Ž | 284/1250 [01:47<05:31, 2.92it/s] Training 1/1 epoch (loss 2.7839): 23%|β–ˆβ–ˆβ–Ž | 285/1250 [01:47<05:22, 2.99it/s] Training 1/1 epoch (loss 2.6116): 23%|β–ˆβ–ˆβ–Ž | 285/1250 [01:47<05:22, 2.99it/s] Training 1/1 epoch (loss 2.6116): 23%|β–ˆβ–ˆβ–Ž | 286/1250 [01:47<05:19, 3.02it/s] Training 1/1 epoch (loss 2.7865): 23%|β–ˆβ–ˆβ–Ž | 286/1250 [01:47<05:19, 3.02it/s] Training 1/1 epoch (loss 2.7865): 23%|β–ˆβ–ˆβ–Ž | 287/1250 [01:47<05:16, 3.04it/s] Training 1/1 epoch (loss 2.7880): 23%|β–ˆβ–ˆβ–Ž | 287/1250 [01:48<05:16, 3.04it/s] Training 1/1 epoch (loss 2.7880): 23%|β–ˆβ–ˆβ–Ž | 288/1250 [01:48<05:13, 3.07it/s] Training 1/1 epoch (loss 2.6956): 23%|β–ˆβ–ˆβ–Ž | 288/1250 [01:48<05:13, 3.07it/s] Training 1/1 epoch (loss 2.6956): 23%|β–ˆβ–ˆβ–Ž | 289/1250 [01:48<05:19, 3.01it/s] Training 1/1 epoch (loss 2.6824): 23%|β–ˆβ–ˆβ–Ž | 289/1250 [01:48<05:19, 3.01it/s] Training 1/1 epoch (loss 2.6824): 23%|β–ˆβ–ˆβ–Ž | 290/1250 [01:48<05:34, 2.87it/s] Training 1/1 epoch (loss 2.7482): 23%|β–ˆβ–ˆβ–Ž | 290/1250 [01:49<05:34, 2.87it/s] Training 1/1 epoch (loss 2.7482): 23%|β–ˆβ–ˆβ–Ž | 291/1250 [01:49<05:36, 2.85it/s] Training 1/1 epoch (loss 2.9872): 23%|β–ˆβ–ˆβ–Ž | 291/1250 [01:49<05:36, 2.85it/s] Training 1/1 epoch (loss 2.9872): 23%|β–ˆβ–ˆβ–Ž | 292/1250 [01:49<05:33, 2.87it/s] Training 1/1 epoch (loss 2.9354): 23%|β–ˆβ–ˆβ–Ž | 292/1250 [01:49<05:33, 2.87it/s] Training 1/1 epoch (loss 2.9354): 23%|β–ˆβ–ˆβ–Ž | 293/1250 [01:49<05:44, 2.77it/s] Training 1/1 epoch (loss 2.7712): 23%|β–ˆβ–ˆβ–Ž | 293/1250 [01:50<05:44, 2.77it/s] Training 1/1 epoch (loss 2.7712): 24%|β–ˆβ–ˆβ–Ž | 294/1250 [01:50<05:29, 2.90it/s] Training 1/1 epoch (loss 2.7200): 24%|β–ˆβ–ˆβ–Ž | 294/1250 [01:50<05:29, 2.90it/s] Training 1/1 epoch (loss 2.7200): 24%|β–ˆβ–ˆβ–Ž | 295/1250 [01:50<05:24, 2.95it/s] Training 1/1 epoch (loss 2.8250): 24%|β–ˆβ–ˆβ–Ž | 295/1250 [01:51<05:24, 2.95it/s] Training 1/1 epoch (loss 2.8250): 24%|β–ˆβ–ˆβ–Ž | 296/1250 [01:51<05:54, 2.69it/s] Training 1/1 epoch (loss 2.7569): 24%|β–ˆβ–ˆβ–Ž | 296/1250 [01:51<05:54, 2.69it/s] Training 1/1 epoch (loss 2.7569): 24%|β–ˆβ–ˆβ– | 297/1250 [01:51<05:47, 2.74it/s] Training 1/1 epoch (loss 2.8134): 24%|β–ˆβ–ˆβ– | 297/1250 [01:51<05:47, 2.74it/s] Training 1/1 epoch (loss 2.8134): 24%|β–ˆβ–ˆβ– | 298/1250 [01:51<05:46, 2.75it/s] Training 1/1 epoch (loss 2.6956): 24%|β–ˆβ–ˆβ– | 298/1250 [01:52<05:46, 2.75it/s] Training 1/1 epoch (loss 2.6956): 24%|β–ˆβ–ˆβ– | 299/1250 [01:52<05:31, 2.87it/s] Training 1/1 epoch (loss 2.8514): 24%|β–ˆβ–ˆβ– | 299/1250 [01:52<05:31, 2.87it/s] Training 1/1 epoch (loss 2.8514): 24%|β–ˆβ–ˆβ– | 300/1250 [01:52<05:23, 2.94it/s] Training 1/1 epoch (loss 2.7176): 24%|β–ˆβ–ˆβ– | 300/1250 [01:52<05:23, 2.94it/s] Training 1/1 epoch (loss 2.7176): 24%|β–ˆβ–ˆβ– | 301/1250 [01:52<05:16, 3.00it/s] Training 1/1 epoch (loss 2.6137): 24%|β–ˆβ–ˆβ– | 301/1250 [01:53<05:16, 3.00it/s] Training 1/1 epoch (loss 2.6137): 24%|β–ˆβ–ˆβ– | 302/1250 [01:53<05:39, 2.79it/s] Training 1/1 epoch (loss 2.4917): 24%|β–ˆβ–ˆβ– | 302/1250 [01:53<05:39, 2.79it/s] Training 1/1 epoch (loss 2.4917): 24%|β–ˆβ–ˆβ– | 303/1250 [01:53<06:32, 2.41it/s] Training 1/1 epoch (loss 2.9178): 24%|β–ˆβ–ˆβ– | 303/1250 [01:54<06:32, 2.41it/s] Training 1/1 epoch (loss 2.9178): 24%|β–ˆβ–ˆβ– | 304/1250 [01:54<06:23, 2.47it/s] Training 1/1 epoch (loss 3.0055): 24%|β–ˆβ–ˆβ– | 304/1250 [01:54<06:23, 2.47it/s] Training 1/1 epoch (loss 3.0055): 24%|β–ˆβ–ˆβ– | 305/1250 [01:54<06:11, 2.54it/s] Training 1/1 epoch (loss 2.9180): 24%|β–ˆβ–ˆβ– | 305/1250 [01:54<06:11, 2.54it/s] Training 1/1 epoch (loss 2.9180): 24%|β–ˆβ–ˆβ– | 306/1250 [01:54<05:56, 2.65it/s] Training 1/1 epoch (loss 2.8254): 24%|β–ˆβ–ˆβ– | 306/1250 [01:55<05:56, 2.65it/s] Training 1/1 epoch (loss 2.8254): 25%|β–ˆβ–ˆβ– | 307/1250 [01:55<05:46, 2.72it/s] Training 1/1 epoch (loss 2.7462): 25%|β–ˆβ–ˆβ– | 307/1250 [01:55<05:46, 2.72it/s] Training 1/1 epoch (loss 2.7462): 25%|β–ˆβ–ˆβ– | 308/1250 [01:55<05:27, 2.88it/s] Training 1/1 epoch (loss 2.6662): 25%|β–ˆβ–ˆβ– | 308/1250 [01:55<05:27, 2.88it/s] Training 1/1 epoch (loss 2.6662): 25%|β–ˆβ–ˆβ– | 309/1250 [01:55<05:15, 2.98it/s] Training 1/1 epoch (loss 2.6979): 25%|β–ˆβ–ˆβ– | 309/1250 [01:55<05:15, 2.98it/s] Training 1/1 epoch (loss 2.6979): 25%|β–ˆβ–ˆβ– | 310/1250 [01:55<05:12, 3.01it/s] Training 1/1 epoch (loss 2.8578): 25%|β–ˆβ–ˆβ– | 310/1250 [01:56<05:12, 3.01it/s] Training 1/1 epoch (loss 2.8578): 25%|β–ˆβ–ˆβ– | 311/1250 [01:56<05:15, 2.98it/s] Training 1/1 epoch (loss 2.7132): 25%|β–ˆβ–ˆβ– | 311/1250 [01:56<05:15, 2.98it/s] Training 1/1 epoch (loss 2.7132): 25%|β–ˆβ–ˆβ– | 312/1250 [01:56<05:26, 2.88it/s] Training 1/1 epoch (loss 2.5502): 25%|β–ˆβ–ˆβ– | 312/1250 [01:57<05:26, 2.88it/s] Training 1/1 epoch (loss 2.5502): 25%|β–ˆβ–ˆβ–Œ | 313/1250 [01:57<05:39, 2.76it/s] Training 1/1 epoch (loss 2.6810): 25%|β–ˆβ–ˆβ–Œ | 313/1250 [01:57<05:39, 2.76it/s] Training 1/1 epoch (loss 2.6810): 25%|β–ˆβ–ˆβ–Œ | 314/1250 [01:57<05:31, 2.82it/s] Training 1/1 epoch (loss 2.8947): 25%|β–ˆβ–ˆβ–Œ | 314/1250 [01:57<05:31, 2.82it/s] Training 1/1 epoch (loss 2.8947): 25%|β–ˆβ–ˆβ–Œ | 315/1250 [01:57<05:23, 2.89it/s] Training 1/1 epoch (loss 2.7441): 25%|β–ˆβ–ˆβ–Œ | 315/1250 [01:58<05:23, 2.89it/s] Training 1/1 epoch (loss 2.7441): 25%|β–ˆβ–ˆβ–Œ | 316/1250 [01:58<05:49, 2.67it/s] Training 1/1 epoch (loss 2.8273): 25%|β–ˆβ–ˆβ–Œ | 316/1250 [01:58<05:49, 2.67it/s] Training 1/1 epoch (loss 2.8273): 25%|β–ˆβ–ˆβ–Œ | 317/1250 [01:58<06:10, 2.52it/s] Training 1/1 epoch (loss 2.9711): 25%|β–ˆβ–ˆβ–Œ | 317/1250 [01:59<06:10, 2.52it/s] Training 1/1 epoch (loss 2.9711): 25%|β–ˆβ–ˆβ–Œ | 318/1250 [01:59<05:57, 2.60it/s] Training 1/1 epoch (loss 2.7190): 25%|β–ˆβ–ˆβ–Œ | 318/1250 [01:59<05:57, 2.60it/s] Training 1/1 epoch (loss 2.7190): 26%|β–ˆβ–ˆβ–Œ | 319/1250 [01:59<05:32, 2.80it/s] Training 1/1 epoch (loss 2.8411): 26%|β–ˆβ–ˆβ–Œ | 319/1250 [01:59<05:32, 2.80it/s] Training 1/1 epoch (loss 2.8411): 26%|β–ˆβ–ˆβ–Œ | 320/1250 [01:59<05:31, 2.80it/s] Training 1/1 epoch (loss 2.7386): 26%|β–ˆβ–ˆβ–Œ | 320/1250 [02:00<05:31, 2.80it/s] Training 1/1 epoch (loss 2.7386): 26%|β–ˆβ–ˆβ–Œ | 321/1250 [02:00<05:38, 2.74it/s] Training 1/1 epoch (loss 2.8214): 26%|β–ˆβ–ˆβ–Œ | 321/1250 [02:00<05:38, 2.74it/s] Training 1/1 epoch (loss 2.8214): 26%|β–ˆβ–ˆβ–Œ | 322/1250 [02:00<05:39, 2.73it/s] Training 1/1 epoch (loss 2.6663): 26%|β–ˆβ–ˆβ–Œ | 322/1250 [02:00<05:39, 2.73it/s] Training 1/1 epoch (loss 2.6663): 26%|β–ˆβ–ˆβ–Œ | 323/1250 [02:00<05:31, 2.80it/s] Training 1/1 epoch (loss 2.9963): 26%|β–ˆβ–ˆβ–Œ | 323/1250 [02:01<05:31, 2.80it/s] Training 1/1 epoch (loss 2.9963): 26%|β–ˆβ–ˆβ–Œ | 324/1250 [02:01<05:16, 2.92it/s] Training 1/1 epoch (loss 2.5491): 26%|β–ˆβ–ˆβ–Œ | 324/1250 [02:01<05:16, 2.92it/s] Training 1/1 epoch (loss 2.5491): 26%|β–ˆβ–ˆβ–Œ | 325/1250 [02:01<05:09, 2.99it/s] Training 1/1 epoch (loss 2.8459): 26%|β–ˆβ–ˆβ–Œ | 325/1250 [02:01<05:09, 2.99it/s] Training 1/1 epoch (loss 2.8459): 26%|β–ˆβ–ˆβ–Œ | 326/1250 [02:01<05:08, 2.99it/s] Training 1/1 epoch (loss 2.6885): 26%|β–ˆβ–ˆβ–Œ | 326/1250 [02:02<05:08, 2.99it/s] Training 1/1 epoch (loss 2.6885): 26%|β–ˆβ–ˆβ–Œ | 327/1250 [02:02<05:13, 2.94it/s] Training 1/1 epoch (loss 2.4939): 26%|β–ˆβ–ˆβ–Œ | 327/1250 [02:02<05:13, 2.94it/s] Training 1/1 epoch (loss 2.4939): 26%|β–ˆβ–ˆβ–Œ | 328/1250 [02:02<05:06, 3.01it/s] Training 1/1 epoch (loss 3.0906): 26%|β–ˆβ–ˆβ–Œ | 328/1250 [02:02<05:06, 3.01it/s] Training 1/1 epoch (loss 3.0906): 26%|β–ˆβ–ˆβ–‹ | 329/1250 [02:02<05:08, 2.98it/s] Training 1/1 epoch (loss 2.6919): 26%|β–ˆβ–ˆβ–‹ | 329/1250 [02:03<05:08, 2.98it/s] Training 1/1 epoch (loss 2.6919): 26%|β–ˆβ–ˆβ–‹ | 330/1250 [02:03<05:10, 2.96it/s] Training 1/1 epoch (loss 2.7513): 26%|β–ˆβ–ˆβ–‹ | 330/1250 [02:03<05:10, 2.96it/s] Training 1/1 epoch (loss 2.7513): 26%|β–ˆβ–ˆβ–‹ | 331/1250 [02:03<05:08, 2.98it/s] Training 1/1 epoch (loss 2.9258): 26%|β–ˆβ–ˆβ–‹ | 331/1250 [02:03<05:08, 2.98it/s] Training 1/1 epoch (loss 2.9258): 27%|β–ˆβ–ˆβ–‹ | 332/1250 [02:03<05:06, 3.00it/s] Training 1/1 epoch (loss 2.6962): 27%|β–ˆβ–ˆβ–‹ | 332/1250 [02:04<05:06, 3.00it/s] Training 1/1 epoch (loss 2.6962): 27%|β–ˆβ–ˆβ–‹ | 333/1250 [02:04<05:04, 3.01it/s] Training 1/1 epoch (loss 2.8803): 27%|β–ˆβ–ˆβ–‹ | 333/1250 [02:04<05:04, 3.01it/s] Training 1/1 epoch (loss 2.8803): 27%|β–ˆβ–ˆβ–‹ | 334/1250 [02:04<05:04, 3.01it/s] Training 1/1 epoch (loss 2.7907): 27%|β–ˆβ–ˆβ–‹ | 334/1250 [02:04<05:04, 3.01it/s] Training 1/1 epoch (loss 2.7907): 27%|β–ˆβ–ˆβ–‹ | 335/1250 [02:04<05:00, 3.04it/s] Training 1/1 epoch (loss 2.8659): 27%|β–ˆβ–ˆβ–‹ | 335/1250 [02:05<05:00, 3.04it/s] Training 1/1 epoch (loss 2.8659): 27%|β–ˆβ–ˆβ–‹ | 336/1250 [02:05<05:14, 2.91it/s] Training 1/1 epoch (loss 3.0631): 27%|β–ˆβ–ˆβ–‹ | 336/1250 [02:05<05:14, 2.91it/s] Training 1/1 epoch (loss 3.0631): 27%|β–ˆβ–ˆβ–‹ | 337/1250 [02:05<05:12, 2.92it/s] Training 1/1 epoch (loss 2.8838): 27%|β–ˆβ–ˆβ–‹ | 337/1250 [02:05<05:12, 2.92it/s] Training 1/1 epoch (loss 2.8838): 27%|β–ˆβ–ˆβ–‹ | 338/1250 [02:05<05:23, 2.82it/s] Training 1/1 epoch (loss 2.8658): 27%|β–ˆβ–ˆβ–‹ | 338/1250 [02:06<05:23, 2.82it/s] Training 1/1 epoch (loss 2.8658): 27%|β–ˆβ–ˆβ–‹ | 339/1250 [02:06<05:25, 2.80it/s] Training 1/1 epoch (loss 2.8144): 27%|β–ˆβ–ˆβ–‹ | 339/1250 [02:06<05:25, 2.80it/s] Training 1/1 epoch (loss 2.8144): 27%|β–ˆβ–ˆβ–‹ | 340/1250 [02:06<05:21, 2.83it/s] Training 1/1 epoch (loss 2.6536): 27%|β–ˆβ–ˆβ–‹ | 340/1250 [02:06<05:21, 2.83it/s] Training 1/1 epoch (loss 2.6536): 27%|β–ˆβ–ˆβ–‹ | 341/1250 [02:06<05:15, 2.88it/s] Training 1/1 epoch (loss 2.7537): 27%|β–ˆβ–ˆβ–‹ | 341/1250 [02:07<05:15, 2.88it/s] Training 1/1 epoch (loss 2.7537): 27%|β–ˆβ–ˆβ–‹ | 342/1250 [02:07<05:16, 2.87it/s] Training 1/1 epoch (loss 2.6642): 27%|β–ˆβ–ˆβ–‹ | 342/1250 [02:07<05:16, 2.87it/s] Training 1/1 epoch (loss 2.6642): 27%|β–ˆβ–ˆβ–‹ | 343/1250 [02:07<05:03, 2.99it/s] Training 1/1 epoch (loss 2.8087): 27%|β–ˆβ–ˆβ–‹ | 343/1250 [02:07<05:03, 2.99it/s] Training 1/1 epoch (loss 2.8087): 28%|β–ˆβ–ˆβ–Š | 344/1250 [02:07<05:07, 2.94it/s] Training 1/1 epoch (loss 2.7556): 28%|β–ˆβ–ˆβ–Š | 344/1250 [02:08<05:07, 2.94it/s] Training 1/1 epoch (loss 2.7556): 28%|β–ˆβ–ˆβ–Š | 345/1250 [02:08<05:10, 2.92it/s] Training 1/1 epoch (loss 2.7583): 28%|β–ˆβ–ˆβ–Š | 345/1250 [02:08<05:10, 2.92it/s] Training 1/1 epoch (loss 2.7583): 28%|β–ˆβ–ˆβ–Š | 346/1250 [02:08<05:06, 2.95it/s] Training 1/1 epoch (loss 2.8910): 28%|β–ˆβ–ˆβ–Š | 346/1250 [02:08<05:06, 2.95it/s] Training 1/1 epoch (loss 2.8910): 28%|β–ˆβ–ˆβ–Š | 347/1250 [02:08<05:03, 2.97it/s] Training 1/1 epoch (loss 2.9117): 28%|β–ˆβ–ˆβ–Š | 347/1250 [02:09<05:03, 2.97it/s] Training 1/1 epoch (loss 2.9117): 28%|β–ˆβ–ˆβ–Š | 348/1250 [02:09<05:04, 2.96it/s] Training 1/1 epoch (loss 2.7459): 28%|β–ˆβ–ˆβ–Š | 348/1250 [02:09<05:04, 2.96it/s] Training 1/1 epoch (loss 2.7459): 28%|β–ˆβ–ˆβ–Š | 349/1250 [02:09<04:57, 3.03it/s] Training 1/1 epoch (loss 2.8986): 28%|β–ˆβ–ˆβ–Š | 349/1250 [02:09<04:57, 3.03it/s] Training 1/1 epoch (loss 2.8986): 28%|β–ˆβ–ˆβ–Š | 350/1250 [02:09<04:50, 3.10it/s] Training 1/1 epoch (loss 2.7822): 28%|β–ˆβ–ˆβ–Š | 350/1250 [02:10<04:50, 3.10it/s] Training 1/1 epoch (loss 2.7822): 28%|β–ˆβ–ˆβ–Š | 351/1250 [02:10<04:46, 3.13it/s] Training 1/1 epoch (loss 2.9602): 28%|β–ˆβ–ˆβ–Š | 351/1250 [02:10<04:46, 3.13it/s] Training 1/1 epoch (loss 2.9602): 28%|β–ˆβ–ˆβ–Š | 352/1250 [02:10<05:07, 2.92it/s] Training 1/1 epoch (loss 2.9708): 28%|β–ˆβ–ˆβ–Š | 352/1250 [02:10<05:07, 2.92it/s] Training 1/1 epoch (loss 2.9708): 28%|β–ˆβ–ˆβ–Š | 353/1250 [02:10<05:13, 2.86it/s] Training 1/1 epoch (loss 2.7881): 28%|β–ˆβ–ˆβ–Š | 353/1250 [02:11<05:13, 2.86it/s] Training 1/1 epoch (loss 2.7881): 28%|β–ˆβ–ˆβ–Š | 354/1250 [02:11<05:11, 2.88it/s] Training 1/1 epoch (loss 2.6643): 28%|β–ˆβ–ˆβ–Š | 354/1250 [02:11<05:11, 2.88it/s] Training 1/1 epoch (loss 2.6643): 28%|β–ˆβ–ˆβ–Š | 355/1250 [02:11<05:05, 2.93it/s] Training 1/1 epoch (loss 2.7512): 28%|β–ˆβ–ˆβ–Š | 355/1250 [02:11<05:05, 2.93it/s] Training 1/1 epoch (loss 2.7512): 28%|β–ˆβ–ˆβ–Š | 356/1250 [02:11<05:02, 2.95it/s] Training 1/1 epoch (loss 2.9928): 28%|β–ˆβ–ˆβ–Š | 356/1250 [02:12<05:02, 2.95it/s] Training 1/1 epoch (loss 2.9928): 29%|β–ˆβ–ˆβ–Š | 357/1250 [02:12<05:14, 2.84it/s] Training 1/1 epoch (loss 2.9059): 29%|β–ˆβ–ˆβ–Š | 357/1250 [02:12<05:14, 2.84it/s] Training 1/1 epoch (loss 2.9059): 29%|β–ˆβ–ˆβ–Š | 358/1250 [02:12<05:16, 2.81it/s] Training 1/1 epoch (loss 2.7712): 29%|β–ˆβ–ˆβ–Š | 358/1250 [02:12<05:16, 2.81it/s] Training 1/1 epoch (loss 2.7712): 29%|β–ˆβ–ˆβ–Š | 359/1250 [02:12<05:05, 2.92it/s] Training 1/1 epoch (loss 2.6026): 29%|β–ˆβ–ˆβ–Š | 359/1250 [02:13<05:05, 2.92it/s] Training 1/1 epoch (loss 2.6026): 29%|β–ˆβ–ˆβ–‰ | 360/1250 [02:13<05:10, 2.87it/s] Training 1/1 epoch (loss 2.6747): 29%|β–ˆβ–ˆβ–‰ | 360/1250 [02:13<05:10, 2.87it/s] Training 1/1 epoch (loss 2.6747): 29%|β–ˆβ–ˆβ–‰ | 361/1250 [02:13<05:08, 2.88it/s] Training 1/1 epoch (loss 2.7518): 29%|β–ˆβ–ˆβ–‰ | 361/1250 [02:14<05:08, 2.88it/s] Training 1/1 epoch (loss 2.7518): 29%|β–ˆβ–ˆβ–‰ | 362/1250 [02:14<05:09, 2.87it/s] Training 1/1 epoch (loss 2.8559): 29%|β–ˆβ–ˆβ–‰ | 362/1250 [02:14<05:09, 2.87it/s] Training 1/1 epoch (loss 2.8559): 29%|β–ˆβ–ˆβ–‰ | 363/1250 [02:14<04:57, 2.98it/s] Training 1/1 epoch (loss 2.9245): 29%|β–ˆβ–ˆβ–‰ | 363/1250 [02:14<04:57, 2.98it/s] Training 1/1 epoch (loss 2.9245): 29%|β–ˆβ–ˆβ–‰ | 364/1250 [02:14<04:57, 2.98it/s] Training 1/1 epoch (loss 2.6623): 29%|β–ˆβ–ˆβ–‰ | 364/1250 [02:15<04:57, 2.98it/s] Training 1/1 epoch (loss 2.6623): 29%|β–ˆβ–ˆβ–‰ | 365/1250 [02:15<05:18, 2.78it/s] Training 1/1 epoch (loss 2.5515): 29%|β–ˆβ–ˆβ–‰ | 365/1250 [02:15<05:18, 2.78it/s] Training 1/1 epoch (loss 2.5515): 29%|β–ˆβ–ˆβ–‰ | 366/1250 [02:15<05:12, 2.83it/s] Training 1/1 epoch (loss 2.6564): 29%|β–ˆβ–ˆβ–‰ | 366/1250 [02:15<05:12, 2.83it/s] Training 1/1 epoch (loss 2.6564): 29%|β–ˆβ–ˆβ–‰ | 367/1250 [02:15<05:03, 2.91it/s] Training 1/1 epoch (loss 2.7153): 29%|β–ˆβ–ˆβ–‰ | 367/1250 [02:16<05:03, 2.91it/s] Training 1/1 epoch (loss 2.7153): 29%|β–ˆβ–ˆβ–‰ | 368/1250 [02:16<05:03, 2.91it/s] Training 1/1 epoch (loss 2.8273): 29%|β–ˆβ–ˆβ–‰ | 368/1250 [02:16<05:03, 2.91it/s] Training 1/1 epoch (loss 2.8273): 30%|β–ˆβ–ˆβ–‰ | 369/1250 [02:16<05:03, 2.91it/s] Training 1/1 epoch (loss 2.8851): 30%|β–ˆβ–ˆβ–‰ | 369/1250 [02:16<05:03, 2.91it/s] Training 1/1 epoch (loss 2.8851): 30%|β–ˆβ–ˆβ–‰ | 370/1250 [02:16<05:02, 2.91it/s] Training 1/1 epoch (loss 2.6380): 30%|β–ˆβ–ˆβ–‰ | 370/1250 [02:17<05:02, 2.91it/s] Training 1/1 epoch (loss 2.6380): 30%|β–ˆβ–ˆβ–‰ | 371/1250 [02:17<04:59, 2.94it/s] Training 1/1 epoch (loss 2.7528): 30%|β–ˆβ–ˆβ–‰ | 371/1250 [02:17<04:59, 2.94it/s] Training 1/1 epoch (loss 2.7528): 30%|β–ˆβ–ˆβ–‰ | 372/1250 [02:17<04:56, 2.96it/s] Training 1/1 epoch (loss 2.6994): 30%|β–ˆβ–ˆβ–‰ | 372/1250 [02:17<04:56, 2.96it/s] Training 1/1 epoch (loss 2.6994): 30%|β–ˆβ–ˆβ–‰ | 373/1250 [02:17<04:48, 3.04it/s] Training 1/1 epoch (loss 2.5205): 30%|β–ˆβ–ˆβ–‰ | 373/1250 [02:18<04:48, 3.04it/s] Training 1/1 epoch (loss 2.5205): 30%|β–ˆβ–ˆβ–‰ | 374/1250 [02:18<04:40, 3.12it/s] Training 1/1 epoch (loss 2.7465): 30%|β–ˆβ–ˆβ–‰ | 374/1250 [02:18<04:40, 3.12it/s] Training 1/1 epoch (loss 2.7465): 30%|β–ˆβ–ˆβ–ˆ | 375/1250 [02:18<04:40, 3.12it/s] Training 1/1 epoch (loss 2.9521): 30%|β–ˆβ–ˆβ–ˆ | 375/1250 [02:18<04:40, 3.12it/s] Training 1/1 epoch (loss 2.9521): 30%|β–ˆβ–ˆβ–ˆ | 376/1250 [02:18<04:54, 2.97it/s] Training 1/1 epoch (loss 2.9065): 30%|β–ˆβ–ˆβ–ˆ | 376/1250 [02:19<04:54, 2.97it/s] Training 1/1 epoch (loss 2.9065): 30%|β–ˆβ–ˆβ–ˆ | 377/1250 [02:19<05:09, 2.82it/s] Training 1/1 epoch (loss 2.7587): 30%|β–ˆβ–ˆβ–ˆ | 377/1250 [02:19<05:09, 2.82it/s] Training 1/1 epoch (loss 2.7587): 30%|β–ˆβ–ˆβ–ˆ | 378/1250 [02:19<05:01, 2.89it/s] Training 1/1 epoch (loss 2.6394): 30%|β–ˆβ–ˆβ–ˆ | 378/1250 [02:19<05:01, 2.89it/s] Training 1/1 epoch (loss 2.6394): 30%|β–ˆβ–ˆβ–ˆ | 379/1250 [02:19<04:50, 3.00it/s] Training 1/1 epoch (loss 2.8081): 30%|β–ˆβ–ˆβ–ˆ | 379/1250 [02:20<04:50, 3.00it/s] Training 1/1 epoch (loss 2.8081): 30%|β–ˆβ–ˆβ–ˆ | 380/1250 [02:20<04:44, 3.06it/s] Training 1/1 epoch (loss 2.8622): 30%|β–ˆβ–ˆβ–ˆ | 380/1250 [02:20<04:44, 3.06it/s] Training 1/1 epoch (loss 2.8622): 30%|β–ˆβ–ˆβ–ˆ | 381/1250 [02:20<04:43, 3.06it/s] Training 1/1 epoch (loss 2.9108): 30%|β–ˆβ–ˆβ–ˆ | 381/1250 [02:20<04:43, 3.06it/s] Training 1/1 epoch (loss 2.9108): 31%|β–ˆβ–ˆβ–ˆ | 382/1250 [02:20<04:50, 2.99it/s] Training 1/1 epoch (loss 2.7704): 31%|β–ˆβ–ˆβ–ˆ | 382/1250 [02:21<04:50, 2.99it/s] Training 1/1 epoch (loss 2.7704): 31%|β–ˆβ–ˆβ–ˆ | 383/1250 [02:21<04:47, 3.02it/s] Training 1/1 epoch (loss 2.6240): 31%|β–ˆβ–ˆβ–ˆ | 383/1250 [02:21<04:47, 3.02it/s] Training 1/1 epoch (loss 2.6240): 31%|β–ˆβ–ˆβ–ˆ | 384/1250 [02:21<05:05, 2.83it/s] Training 1/1 epoch (loss 2.7724): 31%|β–ˆβ–ˆβ–ˆ | 384/1250 [02:21<05:05, 2.83it/s] Training 1/1 epoch (loss 2.7724): 31%|β–ˆβ–ˆβ–ˆ | 385/1250 [02:21<05:03, 2.85it/s] Training 1/1 epoch (loss 2.5598): 31%|β–ˆβ–ˆβ–ˆ | 385/1250 [02:22<05:03, 2.85it/s] Training 1/1 epoch (loss 2.5598): 31%|β–ˆβ–ˆβ–ˆ | 386/1250 [02:22<04:47, 3.01it/s] Training 1/1 epoch (loss 2.6407): 31%|β–ˆβ–ˆβ–ˆ | 386/1250 [02:22<04:47, 3.01it/s] Training 1/1 epoch (loss 2.6407): 31%|β–ˆβ–ˆβ–ˆ | 387/1250 [02:22<05:09, 2.79it/s] Training 1/1 epoch (loss 2.7247): 31%|β–ˆβ–ˆβ–ˆ | 387/1250 [02:22<05:09, 2.79it/s] Training 1/1 epoch (loss 2.7247): 31%|β–ˆβ–ˆβ–ˆ | 388/1250 [02:22<05:02, 2.85it/s] Training 1/1 epoch (loss 2.7953): 31%|β–ˆβ–ˆβ–ˆ | 388/1250 [02:23<05:02, 2.85it/s] Training 1/1 epoch (loss 2.7953): 31%|β–ˆβ–ˆβ–ˆ | 389/1250 [02:23<06:21, 2.26it/s] Training 1/1 epoch (loss 2.7305): 31%|β–ˆβ–ˆβ–ˆ | 389/1250 [02:23<06:21, 2.26it/s] Training 1/1 epoch (loss 2.7305): 31%|β–ˆβ–ˆβ–ˆ | 390/1250 [02:23<05:47, 2.48it/s] Training 1/1 epoch (loss 2.9599): 31%|β–ˆβ–ˆβ–ˆ | 390/1250 [02:24<05:47, 2.48it/s] Training 1/1 epoch (loss 2.9599): 31%|β–ˆβ–ˆβ–ˆβ– | 391/1250 [02:24<05:37, 2.55it/s] Training 1/1 epoch (loss 2.7857): 31%|β–ˆβ–ˆβ–ˆβ– | 391/1250 [02:24<05:37, 2.55it/s] Training 1/1 epoch (loss 2.7857): 31%|β–ˆβ–ˆβ–ˆβ– | 392/1250 [02:24<05:29, 2.61it/s] Training 1/1 epoch (loss 2.8636): 31%|β–ˆβ–ˆβ–ˆβ– | 392/1250 [02:24<05:29, 2.61it/s] Training 1/1 epoch (loss 2.8636): 31%|β–ˆβ–ˆβ–ˆβ– | 393/1250 [02:24<05:32, 2.58it/s] Training 1/1 epoch (loss 2.6808): 31%|β–ˆβ–ˆβ–ˆβ– | 393/1250 [02:25<05:32, 2.58it/s] Training 1/1 epoch (loss 2.6808): 32%|β–ˆβ–ˆβ–ˆβ– | 394/1250 [02:25<05:25, 2.63it/s] Training 1/1 epoch (loss 2.4952): 32%|β–ˆβ–ˆβ–ˆβ– | 394/1250 [02:25<05:25, 2.63it/s] Training 1/1 epoch (loss 2.4952): 32%|β–ˆβ–ˆβ–ˆβ– | 395/1250 [02:25<05:27, 2.61it/s] Training 1/1 epoch (loss 2.7016): 32%|β–ˆβ–ˆβ–ˆβ– | 395/1250 [02:26<05:27, 2.61it/s] Training 1/1 epoch (loss 2.7016): 32%|β–ˆβ–ˆβ–ˆβ– | 396/1250 [02:26<05:12, 2.74it/s] Training 1/1 epoch (loss 2.5014): 32%|β–ˆβ–ˆβ–ˆβ– | 396/1250 [02:26<05:12, 2.74it/s] Training 1/1 epoch (loss 2.5014): 32%|β–ˆβ–ˆβ–ˆβ– | 397/1250 [02:26<05:01, 2.83it/s] Training 1/1 epoch (loss 2.6134): 32%|β–ˆβ–ˆβ–ˆβ– | 397/1250 [02:26<05:01, 2.83it/s] Training 1/1 epoch (loss 2.6134): 32%|β–ˆβ–ˆβ–ˆβ– | 398/1250 [02:26<05:19, 2.67it/s] Training 1/1 epoch (loss 2.8076): 32%|β–ˆβ–ˆβ–ˆβ– | 398/1250 [02:27<05:19, 2.67it/s] Training 1/1 epoch (loss 2.8076): 32%|β–ˆβ–ˆβ–ˆβ– | 399/1250 [02:27<05:11, 2.73it/s] Training 1/1 epoch (loss 2.8822): 32%|β–ˆβ–ˆβ–ˆβ– | 399/1250 [02:27<05:11, 2.73it/s] Training 1/1 epoch (loss 2.8822): 32%|β–ˆβ–ˆβ–ˆβ– | 400/1250 [02:27<05:13, 2.71it/s] Training 1/1 epoch (loss 2.6062): 32%|β–ˆβ–ˆβ–ˆβ– | 400/1250 [02:27<05:13, 2.71it/s] Training 1/1 epoch (loss 2.6062): 32%|β–ˆβ–ˆβ–ˆβ– | 401/1250 [02:27<05:22, 2.63it/s] Training 1/1 epoch (loss 2.9317): 32%|β–ˆβ–ˆβ–ˆβ– | 401/1250 [02:28<05:22, 2.63it/s] Training 1/1 epoch (loss 2.9317): 32%|β–ˆβ–ˆβ–ˆβ– | 402/1250 [02:28<05:21, 2.64it/s] Training 1/1 epoch (loss 2.8921): 32%|β–ˆβ–ˆβ–ˆβ– | 402/1250 [02:28<05:21, 2.64it/s] Training 1/1 epoch (loss 2.8921): 32%|β–ˆβ–ˆβ–ˆβ– | 403/1250 [02:28<05:47, 2.43it/s] Training 1/1 epoch (loss 2.6160): 32%|β–ˆβ–ˆβ–ˆβ– | 403/1250 [02:29<05:47, 2.43it/s] Training 1/1 epoch (loss 2.6160): 32%|β–ˆβ–ˆβ–ˆβ– | 404/1250 [02:29<05:39, 2.49it/s] Training 1/1 epoch (loss 2.7883): 32%|β–ˆβ–ˆβ–ˆβ– | 404/1250 [02:29<05:39, 2.49it/s] Training 1/1 epoch (loss 2.7883): 32%|β–ˆβ–ˆβ–ˆβ– | 405/1250 [02:29<05:09, 2.73it/s] Training 1/1 epoch (loss 2.7512): 32%|β–ˆβ–ˆβ–ˆβ– | 405/1250 [02:29<05:09, 2.73it/s] Training 1/1 epoch (loss 2.7512): 32%|β–ˆβ–ˆβ–ˆβ– | 406/1250 [02:29<04:55, 2.85it/s] Training 1/1 epoch (loss 2.6987): 32%|β–ˆβ–ˆβ–ˆβ– | 406/1250 [02:30<04:55, 2.85it/s] Training 1/1 epoch (loss 2.6987): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 407/1250 [02:30<04:50, 2.90it/s] Training 1/1 epoch (loss 2.7689): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 407/1250 [02:30<04:50, 2.90it/s] Training 1/1 epoch (loss 2.7689): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 408/1250 [02:30<04:47, 2.93it/s] Training 1/1 epoch (loss 2.9144): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 408/1250 [02:30<04:47, 2.93it/s] Training 1/1 epoch (loss 2.9144): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 409/1250 [02:30<04:50, 2.89it/s] Training 1/1 epoch (loss 2.9180): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 409/1250 [02:31<04:50, 2.89it/s] Training 1/1 epoch (loss 2.9180): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 410/1250 [02:31<04:59, 2.81it/s] Training 1/1 epoch (loss 2.6599): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 410/1250 [02:31<04:59, 2.81it/s] Training 1/1 epoch (loss 2.6599): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 411/1250 [02:31<04:45, 2.94it/s] Training 1/1 epoch (loss 2.5595): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 411/1250 [02:31<04:45, 2.94it/s] Training 1/1 epoch (loss 2.5595): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 412/1250 [02:31<04:35, 3.04it/s] Training 1/1 epoch (loss 2.6950): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 412/1250 [02:32<04:35, 3.04it/s] Training 1/1 epoch (loss 2.6950): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 413/1250 [02:32<04:36, 3.03it/s] Training 1/1 epoch (loss 2.8636): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 413/1250 [02:32<04:36, 3.03it/s] Training 1/1 epoch (loss 2.8636): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 414/1250 [02:32<04:32, 3.06it/s] Training 1/1 epoch (loss 2.7607): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 414/1250 [02:32<04:32, 3.06it/s] Training 1/1 epoch (loss 2.7607): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 415/1250 [02:32<04:29, 3.09it/s] Training 1/1 epoch (loss 2.8561): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 415/1250 [02:33<04:29, 3.09it/s] Training 1/1 epoch (loss 2.8561): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 416/1250 [02:33<04:43, 2.94it/s] Training 1/1 epoch (loss 2.8872): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 416/1250 [02:33<04:43, 2.94it/s] Training 1/1 epoch (loss 2.8872): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 417/1250 [02:33<04:45, 2.92it/s] Training 1/1 epoch (loss 2.7830): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 417/1250 [02:33<04:45, 2.92it/s] Training 1/1 epoch (loss 2.7830): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 418/1250 [02:33<04:47, 2.89it/s] Training 1/1 epoch (loss 2.8993): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 418/1250 [02:34<04:47, 2.89it/s] Training 1/1 epoch (loss 2.8993): 34%|β–ˆβ–ˆβ–ˆβ–Ž | 419/1250 [02:34<04:59, 2.78it/s] Training 1/1 epoch (loss 2.7858): 34%|β–ˆβ–ˆβ–ˆβ–Ž | 419/1250 [02:34<04:59, 2.78it/s] Training 1/1 epoch (loss 2.7858): 34%|β–ˆβ–ˆβ–ˆβ–Ž | 420/1250 [02:34<04:56, 2.80it/s] Training 1/1 epoch (loss 2.7142): 34%|β–ˆβ–ˆβ–ˆβ–Ž | 420/1250 [02:34<04:56, 2.80it/s] Training 1/1 epoch (loss 2.7142): 34%|β–ˆβ–ˆβ–ˆβ–Ž | 421/1250 [02:34<04:57, 2.79it/s] Training 1/1 epoch (loss 2.7387): 34%|β–ˆβ–ˆβ–ˆβ–Ž | 421/1250 [02:35<04:57, 2.79it/s] Training 1/1 epoch (loss 2.7387): 34%|β–ˆβ–ˆβ–ˆβ– | 422/1250 [02:35<05:13, 2.64it/s] Training 1/1 epoch (loss 2.8256): 34%|β–ˆβ–ˆβ–ˆβ– | 422/1250 [02:35<05:13, 2.64it/s] Training 1/1 epoch (loss 2.8256): 34%|β–ˆβ–ˆβ–ˆβ– | 423/1250 [02:35<05:04, 2.71it/s] Training 1/1 epoch (loss 2.7409): 34%|β–ˆβ–ˆβ–ˆβ– | 423/1250 [02:36<05:04, 2.71it/s] Training 1/1 epoch (loss 2.7409): 34%|β–ˆβ–ˆβ–ˆβ– | 424/1250 [02:36<05:04, 2.71it/s] Training 1/1 epoch (loss 2.8812): 34%|β–ˆβ–ˆβ–ˆβ– | 424/1250 [02:36<05:04, 2.71it/s] Training 1/1 epoch (loss 2.8812): 34%|β–ˆβ–ˆβ–ˆβ– | 425/1250 [02:36<05:06, 2.69it/s] Training 1/1 epoch (loss 3.0137): 34%|β–ˆβ–ˆβ–ˆβ– | 425/1250 [02:36<05:06, 2.69it/s] Training 1/1 epoch (loss 3.0137): 34%|β–ˆβ–ˆβ–ˆβ– | 426/1250 [02:36<04:59, 2.75it/s] Training 1/1 epoch (loss 2.7769): 34%|β–ˆβ–ˆβ–ˆβ– | 426/1250 [02:37<04:59, 2.75it/s] Training 1/1 epoch (loss 2.7769): 34%|β–ˆβ–ˆβ–ˆβ– | 427/1250 [02:37<04:50, 2.83it/s] Training 1/1 epoch (loss 2.6991): 34%|β–ˆβ–ˆβ–ˆβ– | 427/1250 [02:37<04:50, 2.83it/s] Training 1/1 epoch (loss 2.6991): 34%|β–ˆβ–ˆβ–ˆβ– | 428/1250 [02:37<04:59, 2.74it/s] Training 1/1 epoch (loss 2.7067): 34%|β–ˆβ–ˆβ–ˆβ– | 428/1250 [02:37<04:59, 2.74it/s] Training 1/1 epoch (loss 2.7067): 34%|β–ˆβ–ˆβ–ˆβ– | 429/1250 [02:37<04:45, 2.88it/s] Training 1/1 epoch (loss 2.5678): 34%|β–ˆβ–ˆβ–ˆβ– | 429/1250 [02:38<04:45, 2.88it/s] Training 1/1 epoch (loss 2.5678): 34%|β–ˆβ–ˆβ–ˆβ– | 430/1250 [02:38<04:35, 2.98it/s] Training 1/1 epoch (loss 2.7060): 34%|β–ˆβ–ˆβ–ˆβ– | 430/1250 [02:38<04:35, 2.98it/s] Training 1/1 epoch (loss 2.7060): 34%|β–ˆβ–ˆβ–ˆβ– | 431/1250 [02:38<04:31, 3.02it/s] Training 1/1 epoch (loss 2.8099): 34%|β–ˆβ–ˆβ–ˆβ– | 431/1250 [02:38<04:31, 3.02it/s] Training 1/1 epoch (loss 2.8099): 35%|β–ˆβ–ˆβ–ˆβ– | 432/1250 [02:38<04:28, 3.05it/s] Training 1/1 epoch (loss 2.6792): 35%|β–ˆβ–ˆβ–ˆβ– | 432/1250 [02:39<04:28, 3.05it/s] Training 1/1 epoch (loss 2.6792): 35%|β–ˆβ–ˆβ–ˆβ– | 433/1250 [02:39<04:38, 2.93it/s] Training 1/1 epoch (loss 2.8604): 35%|β–ˆβ–ˆβ–ˆβ– | 433/1250 [02:39<04:38, 2.93it/s] Training 1/1 epoch (loss 2.8604): 35%|β–ˆβ–ˆβ–ˆβ– | 434/1250 [02:39<04:31, 3.00it/s] Training 1/1 epoch (loss 2.8567): 35%|β–ˆβ–ˆβ–ˆβ– | 434/1250 [02:39<04:31, 3.00it/s] Training 1/1 epoch (loss 2.8567): 35%|β–ˆβ–ˆβ–ˆβ– | 435/1250 [02:39<04:27, 3.04it/s] Training 1/1 epoch (loss 2.8899): 35%|β–ˆβ–ˆβ–ˆβ– | 435/1250 [02:40<04:27, 3.04it/s] Training 1/1 epoch (loss 2.8899): 35%|β–ˆβ–ˆβ–ˆβ– | 436/1250 [02:40<04:18, 3.14it/s] Training 1/1 epoch (loss 2.7012): 35%|β–ˆβ–ˆβ–ˆβ– | 436/1250 [02:40<04:18, 3.14it/s] Training 1/1 epoch (loss 2.7012): 35%|β–ˆβ–ˆβ–ˆβ– | 437/1250 [02:40<04:17, 3.16it/s] Training 1/1 epoch (loss 2.6956): 35%|β–ˆβ–ˆβ–ˆβ– | 437/1250 [02:40<04:17, 3.16it/s] Training 1/1 epoch (loss 2.6956): 35%|β–ˆβ–ˆβ–ˆβ–Œ | 438/1250 [02:40<04:19, 3.13it/s] Training 1/1 epoch (loss 2.8203): 35%|β–ˆβ–ˆβ–ˆβ–Œ | 438/1250 [02:41<04:19, 3.13it/s] Training 1/1 epoch (loss 2.8203): 35%|β–ˆβ–ˆβ–ˆβ–Œ | 439/1250 [02:41<04:23, 3.07it/s] Training 1/1 epoch (loss 3.1007): 35%|β–ˆβ–ˆβ–ˆβ–Œ | 439/1250 [02:41<04:23, 3.07it/s] Training 1/1 epoch (loss 3.1007): 35%|β–ˆβ–ˆβ–ˆβ–Œ | 440/1250 [02:41<04:37, 2.92it/s] Training 1/1 epoch (loss 2.6527): 35%|β–ˆβ–ˆβ–ˆβ–Œ | 440/1250 [02:41<04:37, 2.92it/s] Training 1/1 epoch (loss 2.6527): 35%|β–ˆβ–ˆβ–ˆβ–Œ | 441/1250 [02:41<04:37, 2.92it/s] Training 1/1 epoch (loss 2.7218): 35%|β–ˆβ–ˆβ–ˆβ–Œ | 441/1250 [02:42<04:37, 2.92it/s] Training 1/1 epoch (loss 2.7218): 35%|β–ˆβ–ˆβ–ˆβ–Œ | 442/1250 [02:42<04:24, 3.06it/s] Training 1/1 epoch (loss 2.7354): 35%|β–ˆβ–ˆβ–ˆβ–Œ | 442/1250 [02:42<04:24, 3.06it/s] Training 1/1 epoch (loss 2.7354): 35%|β–ˆβ–ˆβ–ˆβ–Œ | 443/1250 [02:42<04:19, 3.11it/s] Training 1/1 epoch (loss 2.9662): 35%|β–ˆβ–ˆβ–ˆβ–Œ | 443/1250 [02:42<04:19, 3.11it/s] Training 1/1 epoch (loss 2.9662): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 444/1250 [02:42<04:38, 2.90it/s] Training 1/1 epoch (loss 2.6802): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 444/1250 [02:43<04:38, 2.90it/s] Training 1/1 epoch (loss 2.6802): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 445/1250 [02:43<04:32, 2.95it/s] Training 1/1 epoch (loss 2.6602): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 445/1250 [02:43<04:32, 2.95it/s] Training 1/1 epoch (loss 2.6602): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 446/1250 [02:43<04:34, 2.93it/s] Training 1/1 epoch (loss 2.7585): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 446/1250 [02:43<04:34, 2.93it/s] Training 1/1 epoch (loss 2.7585): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 447/1250 [02:43<04:28, 2.99it/s] Training 1/1 epoch (loss 2.5878): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 447/1250 [02:44<04:28, 2.99it/s] Training 1/1 epoch (loss 2.5878): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 448/1250 [02:44<04:36, 2.91it/s] Training 1/1 epoch (loss 2.8178): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 448/1250 [02:44<04:36, 2.91it/s] Training 1/1 epoch (loss 2.8178): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 449/1250 [02:44<04:31, 2.95it/s] Training 1/1 epoch (loss 2.7410): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 449/1250 [02:44<04:31, 2.95it/s] Training 1/1 epoch (loss 2.7410): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 450/1250 [02:44<04:29, 2.97it/s] Training 1/1 epoch (loss 2.6744): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 450/1250 [02:45<04:29, 2.97it/s] Training 1/1 epoch (loss 2.6744): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 451/1250 [02:45<04:42, 2.83it/s] Training 1/1 epoch (loss 2.6418): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 451/1250 [02:45<04:42, 2.83it/s] Training 1/1 epoch (loss 2.6418): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 452/1250 [02:45<04:32, 2.93it/s] Training 1/1 epoch (loss 2.8310): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 452/1250 [02:45<04:32, 2.93it/s] Training 1/1 epoch (loss 2.8310): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 453/1250 [02:45<04:30, 2.95it/s] Training 1/1 epoch (loss 2.7421): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 453/1250 [02:46<04:30, 2.95it/s] Training 1/1 epoch (loss 2.7421): 36%|β–ˆβ–ˆβ–ˆβ–‹ | 454/1250 [02:46<04:25, 3.00it/s] Training 1/1 epoch (loss 2.7024): 36%|β–ˆβ–ˆβ–ˆβ–‹ | 454/1250 [02:46<04:25, 3.00it/s] Training 1/1 epoch (loss 2.7024): 36%|β–ˆβ–ˆβ–ˆβ–‹ | 455/1250 [02:46<04:31, 2.93it/s] Training 1/1 epoch (loss 2.8726): 36%|β–ˆβ–ˆβ–ˆβ–‹ | 455/1250 [02:46<04:31, 2.93it/s] Training 1/1 epoch (loss 2.8726): 36%|β–ˆβ–ˆβ–ˆβ–‹ | 456/1250 [02:46<04:47, 2.76it/s] Training 1/1 epoch (loss 2.9177): 36%|β–ˆβ–ˆβ–ˆβ–‹ | 456/1250 [02:47<04:47, 2.76it/s] Training 1/1 epoch (loss 2.9177): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 457/1250 [02:47<04:48, 2.75it/s] Training 1/1 epoch (loss 2.7573): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 457/1250 [02:47<04:48, 2.75it/s] Training 1/1 epoch (loss 2.7573): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 458/1250 [02:47<04:36, 2.86it/s] Training 1/1 epoch (loss 2.9776): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 458/1250 [02:47<04:36, 2.86it/s] Training 1/1 epoch (loss 2.9776): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 459/1250 [02:47<04:38, 2.84it/s] Training 1/1 epoch (loss 2.8129): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 459/1250 [02:48<04:38, 2.84it/s] Training 1/1 epoch (loss 2.8129): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 460/1250 [02:48<04:31, 2.91it/s] Training 1/1 epoch (loss 2.6111): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 460/1250 [02:48<04:31, 2.91it/s] Training 1/1 epoch (loss 2.6111): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 461/1250 [02:48<04:19, 3.04it/s] Training 1/1 epoch (loss 2.6947): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 461/1250 [02:48<04:19, 3.04it/s] Training 1/1 epoch (loss 2.6947): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 462/1250 [02:48<04:19, 3.04it/s] Training 1/1 epoch (loss 2.7731): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 462/1250 [02:49<04:19, 3.04it/s] Training 1/1 epoch (loss 2.7731): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 463/1250 [02:49<04:24, 2.97it/s] Training 1/1 epoch (loss 2.6580): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 463/1250 [02:49<04:24, 2.97it/s] Training 1/1 epoch (loss 2.6580): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 464/1250 [02:49<04:29, 2.92it/s] Training 1/1 epoch (loss 2.4423): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 464/1250 [02:49<04:29, 2.92it/s] Training 1/1 epoch (loss 2.4423): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 465/1250 [02:49<04:25, 2.95it/s] Training 1/1 epoch (loss 2.5912): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 465/1250 [02:50<04:25, 2.95it/s] Training 1/1 epoch (loss 2.5912): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 466/1250 [02:50<04:15, 3.07it/s] Training 1/1 epoch (loss 2.6140): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 466/1250 [02:50<04:15, 3.07it/s] Training 1/1 epoch (loss 2.6140): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 467/1250 [02:50<04:13, 3.09it/s] Training 1/1 epoch (loss 2.7745): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 467/1250 [02:50<04:13, 3.09it/s] Training 1/1 epoch (loss 2.7745): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 468/1250 [02:50<04:20, 3.00it/s] Training 1/1 epoch (loss 2.8650): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 468/1250 [02:51<04:20, 3.00it/s] Training 1/1 epoch (loss 2.8650): 38%|β–ˆβ–ˆβ–ˆβ–Š | 469/1250 [02:51<04:21, 2.98it/s] Training 1/1 epoch (loss 2.8666): 38%|β–ˆβ–ˆβ–ˆβ–Š | 469/1250 [02:51<04:21, 2.98it/s] Training 1/1 epoch (loss 2.8666): 38%|β–ˆβ–ˆβ–ˆβ–Š | 470/1250 [02:51<04:18, 3.01it/s] Training 1/1 epoch (loss 2.8657): 38%|β–ˆβ–ˆβ–ˆβ–Š | 470/1250 [02:51<04:18, 3.01it/s] Training 1/1 epoch (loss 2.8657): 38%|β–ˆβ–ˆβ–ˆβ–Š | 471/1250 [02:51<04:21, 2.98it/s] Training 1/1 epoch (loss 2.6816): 38%|β–ˆβ–ˆβ–ˆβ–Š | 471/1250 [02:52<04:21, 2.98it/s] Training 1/1 epoch (loss 2.6816): 38%|β–ˆβ–ˆβ–ˆβ–Š | 472/1250 [02:52<04:28, 2.90it/s] Training 1/1 epoch (loss 2.6764): 38%|β–ˆβ–ˆβ–ˆβ–Š | 472/1250 [02:52<04:28, 2.90it/s] Training 1/1 epoch (loss 2.6764): 38%|β–ˆβ–ˆβ–ˆβ–Š | 473/1250 [02:52<04:24, 2.94it/s] Training 1/1 epoch (loss 2.6320): 38%|β–ˆβ–ˆβ–ˆβ–Š | 473/1250 [02:52<04:24, 2.94it/s] Training 1/1 epoch (loss 2.6320): 38%|β–ˆβ–ˆβ–ˆβ–Š | 474/1250 [02:52<04:32, 2.84it/s] Training 1/1 epoch (loss 2.7111): 38%|β–ˆβ–ˆβ–ˆβ–Š | 474/1250 [02:53<04:32, 2.84it/s] Training 1/1 epoch (loss 2.7111): 38%|β–ˆβ–ˆβ–ˆβ–Š | 475/1250 [02:53<06:06, 2.11it/s] Training 1/1 epoch (loss 2.8139): 38%|β–ˆβ–ˆβ–ˆβ–Š | 475/1250 [02:54<06:06, 2.11it/s] Training 1/1 epoch (loss 2.8139): 38%|β–ˆβ–ˆβ–ˆβ–Š | 476/1250 [02:54<05:56, 2.17it/s] Training 1/1 epoch (loss 2.7732): 38%|β–ˆβ–ˆβ–ˆβ–Š | 476/1250 [02:54<05:56, 2.17it/s] Training 1/1 epoch (loss 2.7732): 38%|β–ˆβ–ˆβ–ˆβ–Š | 477/1250 [02:54<05:48, 2.22it/s] Training 1/1 epoch (loss 2.9237): 38%|β–ˆβ–ˆβ–ˆβ–Š | 477/1250 [02:54<05:48, 2.22it/s] Training 1/1 epoch (loss 2.9237): 38%|β–ˆβ–ˆβ–ˆβ–Š | 478/1250 [02:54<05:34, 2.31it/s] Training 1/1 epoch (loss 2.6960): 38%|β–ˆβ–ˆβ–ˆβ–Š | 478/1250 [02:55<05:34, 2.31it/s] Training 1/1 epoch (loss 2.6960): 38%|β–ˆβ–ˆβ–ˆβ–Š | 479/1250 [02:55<05:26, 2.36it/s] Training 1/1 epoch (loss 2.7336): 38%|β–ˆβ–ˆβ–ˆβ–Š | 479/1250 [02:56<05:26, 2.36it/s] Training 1/1 epoch (loss 2.7336): 38%|β–ˆβ–ˆβ–ˆβ–Š | 480/1250 [02:56<06:11, 2.07it/s] Training 1/1 epoch (loss 2.6960): 38%|β–ˆβ–ˆβ–ˆβ–Š | 480/1250 [02:56<06:11, 2.07it/s] Training 1/1 epoch (loss 2.6960): 38%|β–ˆβ–ˆβ–ˆβ–Š | 481/1250 [02:56<06:17, 2.04it/s] Training 1/1 epoch (loss 2.9209): 38%|β–ˆβ–ˆβ–ˆβ–Š | 481/1250 [02:57<06:17, 2.04it/s] Training 1/1 epoch (loss 2.9209): 39%|β–ˆβ–ˆβ–ˆβ–Š | 482/1250 [02:57<06:52, 1.86it/s] Training 1/1 epoch (loss 2.6867): 39%|β–ˆβ–ˆβ–ˆβ–Š | 482/1250 [02:57<06:52, 1.86it/s] Training 1/1 epoch (loss 2.6867): 39%|β–ˆβ–ˆβ–ˆβ–Š | 483/1250 [02:57<07:35, 1.68it/s] Training 1/1 epoch (loss 2.6091): 39%|β–ˆβ–ˆβ–ˆβ–Š | 483/1250 [02:58<07:35, 1.68it/s] Training 1/1 epoch (loss 2.6091): 39%|β–ˆβ–ˆβ–ˆβ–Š | 484/1250 [02:58<08:41, 1.47it/s] Training 1/1 epoch (loss 2.6082): 39%|β–ˆβ–ˆβ–ˆβ–Š | 484/1250 [02:59<08:41, 1.47it/s] Training 1/1 epoch (loss 2.6082): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 485/1250 [02:59<08:49, 1.44it/s] Training 1/1 epoch (loss 2.7619): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 485/1250 [03:00<08:49, 1.44it/s] Training 1/1 epoch (loss 2.7619): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 486/1250 [03:00<08:28, 1.50it/s] Training 1/1 epoch (loss 2.7679): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 486/1250 [03:00<08:28, 1.50it/s] Training 1/1 epoch (loss 2.7679): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 487/1250 [03:00<07:06, 1.79it/s] Training 1/1 epoch (loss 2.6222): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 487/1250 [03:00<07:06, 1.79it/s] Training 1/1 epoch (loss 2.6222): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 488/1250 [03:00<06:15, 2.03it/s] Training 1/1 epoch (loss 2.7442): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 488/1250 [03:01<06:15, 2.03it/s] Training 1/1 epoch (loss 2.7442): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 489/1250 [03:01<05:52, 2.16it/s] Training 1/1 epoch (loss 2.9563): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 489/1250 [03:01<05:52, 2.16it/s] Training 1/1 epoch (loss 2.9563): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 490/1250 [03:01<05:30, 2.30it/s] Training 1/1 epoch (loss 2.7571): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 490/1250 [03:01<05:30, 2.30it/s] Training 1/1 epoch (loss 2.7571): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 491/1250 [03:01<05:01, 2.52it/s] Training 1/1 epoch (loss 2.6694): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 491/1250 [03:02<05:01, 2.52it/s] Training 1/1 epoch (loss 2.6694): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 492/1250 [03:02<04:54, 2.57it/s] Training 1/1 epoch (loss 2.7813): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 492/1250 [03:02<04:54, 2.57it/s] Training 1/1 epoch (loss 2.7813): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 493/1250 [03:02<04:34, 2.76it/s] Training 1/1 epoch (loss 2.5468): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 493/1250 [03:02<04:34, 2.76it/s] Training 1/1 epoch (loss 2.5468): 40%|β–ˆβ–ˆβ–ˆβ–‰ | 494/1250 [03:02<04:30, 2.80it/s] Training 1/1 epoch (loss 2.8784): 40%|β–ˆβ–ˆβ–ˆβ–‰ | 494/1250 [03:03<04:30, 2.80it/s] Training 1/1 epoch (loss 2.8784): 40%|β–ˆβ–ˆβ–ˆβ–‰ | 495/1250 [03:03<04:32, 2.77it/s] Training 1/1 epoch (loss 2.8191): 40%|β–ˆβ–ˆβ–ˆβ–‰ | 495/1250 [03:03<04:32, 2.77it/s] Training 1/1 epoch (loss 2.8191): 40%|β–ˆβ–ˆβ–ˆβ–‰ | 496/1250 [03:03<04:41, 2.68it/s] Training 1/1 epoch (loss 2.6459): 40%|β–ˆβ–ˆβ–ˆβ–‰ | 496/1250 [03:03<04:41, 2.68it/s] Training 1/1 epoch (loss 2.6459): 40%|β–ˆβ–ˆβ–ˆβ–‰ | 497/1250 [03:03<04:44, 2.65it/s] Training 1/1 epoch (loss 2.8724): 40%|β–ˆβ–ˆβ–ˆβ–‰ | 497/1250 [03:04<04:44, 2.65it/s] Training 1/1 epoch (loss 2.8724): 40%|β–ˆβ–ˆβ–ˆβ–‰ | 498/1250 [03:04<04:51, 2.58it/s] Training 1/1 epoch (loss 2.4903): 40%|β–ˆβ–ˆβ–ˆβ–‰ | 498/1250 [03:04<04:51, 2.58it/s] Training 1/1 epoch (loss 2.4903): 40%|β–ˆβ–ˆβ–ˆβ–‰ | 499/1250 [03:04<04:50, 2.58it/s] Training 1/1 epoch (loss 2.8719): 40%|β–ˆβ–ˆβ–ˆβ–‰ | 499/1250 [03:05<04:50, 2.58it/s] Training 1/1 epoch (loss 2.8719): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 500/1250 [03:05<04:35, 2.72it/s] Training 1/1 epoch (loss 2.7786): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 500/1250 [03:05<04:35, 2.72it/s] Training 1/1 epoch (loss 2.7786): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 501/1250 [03:05<04:32, 2.75it/s] Training 1/1 epoch (loss 2.9849): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 501/1250 [03:05<04:32, 2.75it/s] Training 1/1 epoch (loss 2.9849): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 502/1250 [03:05<04:32, 2.74it/s] Training 1/1 epoch (loss 3.0052): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 502/1250 [03:06<04:32, 2.74it/s] Training 1/1 epoch (loss 3.0052): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 503/1250 [03:06<04:31, 2.76it/s] Training 1/1 epoch (loss 2.5883): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 503/1250 [03:06<04:31, 2.76it/s] Training 1/1 epoch (loss 2.5883): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 504/1250 [03:06<04:30, 2.76it/s] Training 1/1 epoch (loss 2.9588): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 504/1250 [03:06<04:30, 2.76it/s] Training 1/1 epoch (loss 2.9588): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 505/1250 [03:06<04:28, 2.78it/s] Training 1/1 epoch (loss 2.9290): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 505/1250 [03:07<04:28, 2.78it/s] Training 1/1 epoch (loss 2.9290): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 506/1250 [03:07<04:28, 2.77it/s] Training 1/1 epoch (loss 2.7336): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 506/1250 [03:07<04:28, 2.77it/s] Training 1/1 epoch (loss 2.7336): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 507/1250 [03:07<04:40, 2.65it/s] Training 1/1 epoch (loss 2.7791): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 507/1250 [03:08<04:40, 2.65it/s] Training 1/1 epoch (loss 2.7791): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 508/1250 [03:08<04:44, 2.60it/s] Training 1/1 epoch (loss 2.8120): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 508/1250 [03:08<04:44, 2.60it/s] Training 1/1 epoch (loss 2.8120): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 509/1250 [03:08<04:41, 2.63it/s] Training 1/1 epoch (loss 2.6461): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 509/1250 [03:08<04:41, 2.63it/s] Training 1/1 epoch (loss 2.6461): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 510/1250 [03:08<04:38, 2.66it/s] Training 1/1 epoch (loss 2.7195): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 510/1250 [03:09<04:38, 2.66it/s] Training 1/1 epoch (loss 2.7195): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 511/1250 [03:09<04:30, 2.74it/s] Training 1/1 epoch (loss 2.7153): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 511/1250 [03:09<04:30, 2.74it/s] Training 1/1 epoch (loss 2.7153): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 512/1250 [03:09<04:42, 2.61it/s] Training 1/1 epoch (loss 2.7367): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 512/1250 [03:10<04:42, 2.61it/s] Training 1/1 epoch (loss 2.7367): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 513/1250 [03:10<04:53, 2.51it/s] Training 1/1 epoch (loss 2.6231): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 513/1250 [03:10<04:53, 2.51it/s] Training 1/1 epoch (loss 2.6231): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 514/1250 [03:10<04:40, 2.62it/s] Training 1/1 epoch (loss 2.6504): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 514/1250 [03:10<04:40, 2.62it/s] Training 1/1 epoch (loss 2.6504): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 515/1250 [03:10<04:28, 2.74it/s] Training 1/1 epoch (loss 2.7719): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 515/1250 [03:11<04:28, 2.74it/s] Training 1/1 epoch (loss 2.7719): 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 516/1250 [03:11<04:17, 2.86it/s] Training 1/1 epoch (loss 2.8487): 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 516/1250 [03:11<04:17, 2.86it/s] Training 1/1 epoch (loss 2.8487): 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 517/1250 [03:11<04:06, 2.97it/s] Training 1/1 epoch (loss 2.7260): 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 517/1250 [03:11<04:06, 2.97it/s] Training 1/1 epoch (loss 2.7260): 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 518/1250 [03:11<04:09, 2.93it/s] Training 1/1 epoch (loss 2.6551): 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 518/1250 [03:12<04:09, 2.93it/s] Training 1/1 epoch (loss 2.6551): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 519/1250 [03:12<04:12, 2.90it/s] Training 1/1 epoch (loss 2.5670): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 519/1250 [03:12<04:12, 2.90it/s] Training 1/1 epoch (loss 2.5670): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 520/1250 [03:12<04:06, 2.96it/s] Training 1/1 epoch (loss 2.8664): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 520/1250 [03:12<04:06, 2.96it/s] Training 1/1 epoch (loss 2.8664): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 521/1250 [03:12<04:04, 2.98it/s] Training 1/1 epoch (loss 2.8194): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 521/1250 [03:13<04:04, 2.98it/s] Training 1/1 epoch (loss 2.8194): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 522/1250 [03:13<04:03, 2.99it/s] Training 1/1 epoch (loss 2.9336): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 522/1250 [03:13<04:03, 2.99it/s] Training 1/1 epoch (loss 2.9336): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 523/1250 [03:13<04:02, 3.00it/s] Training 1/1 epoch (loss 2.8083): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 523/1250 [03:13<04:02, 3.00it/s] Training 1/1 epoch (loss 2.8083): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 524/1250 [03:13<04:00, 3.01it/s] Training 1/1 epoch (loss 2.7809): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 524/1250 [03:14<04:00, 3.01it/s] Training 1/1 epoch (loss 2.7809): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 525/1250 [03:14<04:17, 2.81it/s] Training 1/1 epoch (loss 2.6987): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 525/1250 [03:14<04:17, 2.81it/s] Training 1/1 epoch (loss 2.6987): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 526/1250 [03:14<04:17, 2.82it/s] Training 1/1 epoch (loss 2.6630): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 526/1250 [03:14<04:17, 2.82it/s] Training 1/1 epoch (loss 2.6630): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 527/1250 [03:14<04:18, 2.80it/s] Training 1/1 epoch (loss 2.7548): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 527/1250 [03:15<04:18, 2.80it/s] Training 1/1 epoch (loss 2.7548): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 528/1250 [03:15<04:51, 2.48it/s] Training 1/1 epoch (loss 2.6078): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 528/1250 [03:15<04:51, 2.48it/s] Training 1/1 epoch (loss 2.6078): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 529/1250 [03:15<04:43, 2.54it/s] Training 1/1 epoch (loss 2.7915): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 529/1250 [03:15<04:43, 2.54it/s] Training 1/1 epoch (loss 2.7915): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 530/1250 [03:15<04:24, 2.72it/s] Training 1/1 epoch (loss 2.5355): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 530/1250 [03:16<04:24, 2.72it/s] Training 1/1 epoch (loss 2.5355): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 531/1250 [03:16<04:20, 2.76it/s] Training 1/1 epoch (loss 2.7242): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 531/1250 [03:16<04:20, 2.76it/s] Training 1/1 epoch (loss 2.7242): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 532/1250 [03:16<04:13, 2.83it/s] Training 1/1 epoch (loss 2.8145): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 532/1250 [03:16<04:13, 2.83it/s] Training 1/1 epoch (loss 2.8145): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 533/1250 [03:16<04:01, 2.97it/s] Training 1/1 epoch (loss 2.8687): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 533/1250 [03:17<04:01, 2.97it/s] Training 1/1 epoch (loss 2.8687): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 534/1250 [03:17<04:06, 2.91it/s] Training 1/1 epoch (loss 2.9662): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 534/1250 [03:17<04:06, 2.91it/s] Training 1/1 epoch (loss 2.9662): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 535/1250 [03:17<04:03, 2.93it/s] Training 1/1 epoch (loss 2.8878): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 535/1250 [03:17<04:03, 2.93it/s] Training 1/1 epoch (loss 2.8878): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 536/1250 [03:17<03:59, 2.98it/s] Training 1/1 epoch (loss 2.9633): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 536/1250 [03:18<03:59, 2.98it/s] Training 1/1 epoch (loss 2.9633): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 537/1250 [03:18<04:08, 2.87it/s] Training 1/1 epoch (loss 2.6094): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 537/1250 [03:18<04:08, 2.87it/s] Training 1/1 epoch (loss 2.6094): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 538/1250 [03:18<03:58, 2.99it/s] Training 1/1 epoch (loss 2.8166): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 538/1250 [03:18<03:58, 2.99it/s] Training 1/1 epoch (loss 2.8166): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 539/1250 [03:18<03:55, 3.02it/s] Training 1/1 epoch (loss 2.8963): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 539/1250 [03:19<03:55, 3.02it/s] Training 1/1 epoch (loss 2.8963): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 540/1250 [03:19<03:52, 3.05it/s] Training 1/1 epoch (loss 2.7442): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 540/1250 [03:19<03:52, 3.05it/s] Training 1/1 epoch (loss 2.7442): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 541/1250 [03:19<03:54, 3.02it/s] Training 1/1 epoch (loss 2.7688): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 541/1250 [03:19<03:54, 3.02it/s] Training 1/1 epoch (loss 2.7688): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 542/1250 [03:19<03:47, 3.11it/s] Training 1/1 epoch (loss 2.7116): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 542/1250 [03:20<03:47, 3.11it/s] Training 1/1 epoch (loss 2.7116): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 543/1250 [03:20<03:54, 3.01it/s] Training 1/1 epoch (loss 2.6899): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 543/1250 [03:20<03:54, 3.01it/s] Training 1/1 epoch (loss 2.6899): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 544/1250 [03:20<04:03, 2.90it/s] Training 1/1 epoch (loss 2.7599): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 544/1250 [03:20<04:03, 2.90it/s] Training 1/1 epoch (loss 2.7599): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 545/1250 [03:20<03:59, 2.95it/s] Training 1/1 epoch (loss 2.8811): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 545/1250 [03:21<03:59, 2.95it/s] Training 1/1 epoch (loss 2.8811): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 546/1250 [03:21<03:54, 3.01it/s] Training 1/1 epoch (loss 2.7996): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 546/1250 [03:21<03:54, 3.01it/s] Training 1/1 epoch (loss 2.7996): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 547/1250 [03:21<04:05, 2.86it/s] Training 1/1 epoch (loss 2.9047): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 547/1250 [03:22<04:05, 2.86it/s] Training 1/1 epoch (loss 2.9047): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 548/1250 [03:22<03:59, 2.93it/s] Training 1/1 epoch (loss 2.9886): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 548/1250 [03:22<03:59, 2.93it/s] Training 1/1 epoch (loss 2.9886): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 549/1250 [03:22<04:04, 2.87it/s] Training 1/1 epoch (loss 2.8122): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 549/1250 [03:22<04:04, 2.87it/s] Training 1/1 epoch (loss 2.8122): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 550/1250 [03:22<04:00, 2.91it/s] Training 1/1 epoch (loss 2.6779): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 550/1250 [03:23<04:00, 2.91it/s] Training 1/1 epoch (loss 2.6779): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 551/1250 [03:23<03:48, 3.05it/s] Training 1/1 epoch (loss 2.5751): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 551/1250 [03:23<03:48, 3.05it/s] Training 1/1 epoch (loss 2.5751): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 552/1250 [03:23<04:49, 2.41it/s] Training 1/1 epoch (loss 2.8549): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 552/1250 [03:23<04:49, 2.41it/s] Training 1/1 epoch (loss 2.8549): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 553/1250 [03:23<04:37, 2.51it/s] Training 1/1 epoch (loss 2.7576): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 553/1250 [03:24<04:37, 2.51it/s] Training 1/1 epoch (loss 2.7576): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 554/1250 [03:24<04:51, 2.39it/s] Training 1/1 epoch (loss 2.5462): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 554/1250 [03:24<04:51, 2.39it/s] Training 1/1 epoch (loss 2.5462): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 555/1250 [03:24<04:31, 2.56it/s] Training 1/1 epoch (loss 2.7674): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 555/1250 [03:25<04:31, 2.56it/s] Training 1/1 epoch (loss 2.7674): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 556/1250 [03:25<04:11, 2.76it/s] Training 1/1 epoch (loss 2.9608): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 556/1250 [03:25<04:11, 2.76it/s] Training 1/1 epoch (loss 2.9608): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 557/1250 [03:25<04:06, 2.81it/s] Training 1/1 epoch (loss 2.7740): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 557/1250 [03:25<04:06, 2.81it/s] Training 1/1 epoch (loss 2.7740): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 558/1250 [03:25<04:20, 2.66it/s] Training 1/1 epoch (loss 2.8548): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 558/1250 [03:26<04:20, 2.66it/s] Training 1/1 epoch (loss 2.8548): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 559/1250 [03:26<04:07, 2.79it/s] Training 1/1 epoch (loss 2.8711): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 559/1250 [03:26<04:07, 2.79it/s] Training 1/1 epoch (loss 2.8711): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 560/1250 [03:26<04:03, 2.83it/s] Training 1/1 epoch (loss 2.6154): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 560/1250 [03:26<04:03, 2.83it/s] Training 1/1 epoch (loss 2.6154): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 561/1250 [03:26<04:04, 2.82it/s] Training 1/1 epoch (loss 2.7962): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 561/1250 [03:27<04:04, 2.82it/s] Training 1/1 epoch (loss 2.7962): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 562/1250 [03:27<04:00, 2.87it/s] Training 1/1 epoch (loss 2.8797): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 562/1250 [03:27<04:00, 2.87it/s] Training 1/1 epoch (loss 2.8797): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 563/1250 [03:27<03:49, 2.99it/s] Training 1/1 epoch (loss 2.8930): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 563/1250 [03:27<03:49, 2.99it/s] Training 1/1 epoch (loss 2.8930): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 564/1250 [03:27<03:56, 2.90it/s] Training 1/1 epoch (loss 2.8607): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 564/1250 [03:28<03:56, 2.90it/s] Training 1/1 epoch (loss 2.8607): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 565/1250 [03:28<04:16, 2.67it/s] Training 1/1 epoch (loss 2.8291): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 565/1250 [03:28<04:16, 2.67it/s] Training 1/1 epoch (loss 2.8291): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 566/1250 [03:28<04:03, 2.81it/s] Training 1/1 epoch (loss 2.7973): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 566/1250 [03:29<04:03, 2.81it/s] Training 1/1 epoch (loss 2.7973): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 567/1250 [03:29<04:17, 2.65it/s] Training 1/1 epoch (loss 2.7298): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 567/1250 [03:29<04:17, 2.65it/s] Training 1/1 epoch (loss 2.7298): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 568/1250 [03:29<04:24, 2.58it/s] Training 1/1 epoch (loss 2.8594): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 568/1250 [03:29<04:24, 2.58it/s] Training 1/1 epoch (loss 2.8594): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 569/1250 [03:29<04:44, 2.39it/s] Training 1/1 epoch (loss 2.7135): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 569/1250 [03:30<04:44, 2.39it/s] Training 1/1 epoch (loss 2.7135): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 570/1250 [03:30<04:43, 2.40it/s] Training 1/1 epoch (loss 2.7890): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 570/1250 [03:30<04:43, 2.40it/s] Training 1/1 epoch (loss 2.7890): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 571/1250 [03:30<04:20, 2.61it/s] Training 1/1 epoch (loss 2.8807): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 571/1250 [03:30<04:20, 2.61it/s] Training 1/1 epoch (loss 2.8807): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 572/1250 [03:30<04:07, 2.73it/s] Training 1/1 epoch (loss 2.7958): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 572/1250 [03:31<04:07, 2.73it/s] Training 1/1 epoch (loss 2.7958): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 573/1250 [03:31<04:18, 2.62it/s] Training 1/1 epoch (loss 2.8241): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 573/1250 [03:31<04:18, 2.62it/s] Training 1/1 epoch (loss 2.8241): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 574/1250 [03:31<04:22, 2.58it/s] Training 1/1 epoch (loss 2.8871): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 574/1250 [03:32<04:22, 2.58it/s] Training 1/1 epoch (loss 2.8871): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 575/1250 [03:32<04:16, 2.63it/s] Training 1/1 epoch (loss 2.6851): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 575/1250 [03:32<04:16, 2.63it/s] Training 1/1 epoch (loss 2.6851): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 576/1250 [03:32<04:09, 2.71it/s] Training 1/1 epoch (loss 2.6616): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 576/1250 [03:32<04:09, 2.71it/s] Training 1/1 epoch (loss 2.6616): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 577/1250 [03:32<04:00, 2.80it/s] Training 1/1 epoch (loss 2.8728): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 577/1250 [03:33<04:00, 2.80it/s] Training 1/1 epoch (loss 2.8728): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 578/1250 [03:33<04:00, 2.80it/s] Training 1/1 epoch (loss 2.9902): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 578/1250 [03:33<04:00, 2.80it/s] Training 1/1 epoch (loss 2.9902): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 579/1250 [03:33<03:51, 2.90it/s] Training 1/1 epoch (loss 2.9352): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 579/1250 [03:33<03:51, 2.90it/s] Training 1/1 epoch (loss 2.9352): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 580/1250 [03:33<03:57, 2.82it/s] Training 1/1 epoch (loss 2.7541): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 580/1250 [03:34<03:57, 2.82it/s] Training 1/1 epoch (loss 2.7541): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 581/1250 [03:34<04:01, 2.77it/s] Training 1/1 epoch (loss 2.6526): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 581/1250 [03:34<04:01, 2.77it/s] Training 1/1 epoch (loss 2.6526): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 582/1250 [03:34<04:03, 2.75it/s] Training 1/1 epoch (loss 2.6189): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 582/1250 [03:34<04:03, 2.75it/s] Training 1/1 epoch (loss 2.6189): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 583/1250 [03:34<03:54, 2.85it/s] Training 1/1 epoch (loss 2.6658): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 583/1250 [03:35<03:54, 2.85it/s] Training 1/1 epoch (loss 2.6658): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 584/1250 [03:35<04:00, 2.76it/s] Training 1/1 epoch (loss 2.8424): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 584/1250 [03:35<04:00, 2.76it/s] Training 1/1 epoch (loss 2.8424): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 585/1250 [03:35<04:02, 2.74it/s] Training 1/1 epoch (loss 2.6215): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 585/1250 [03:36<04:02, 2.74it/s] Training 1/1 epoch (loss 2.6215): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 586/1250 [03:36<03:53, 2.84it/s] Training 1/1 epoch (loss 2.7304): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 586/1250 [03:36<03:53, 2.84it/s] Training 1/1 epoch (loss 2.7304): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 587/1250 [03:36<04:00, 2.76it/s] Training 1/1 epoch (loss 2.5255): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 587/1250 [03:36<04:00, 2.76it/s] Training 1/1 epoch (loss 2.5255): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 588/1250 [03:36<03:52, 2.85it/s] Training 1/1 epoch (loss 2.6795): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 588/1250 [03:37<03:52, 2.85it/s] Training 1/1 epoch (loss 2.6795): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 589/1250 [03:37<03:46, 2.91it/s] Training 1/1 epoch (loss 2.7579): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 589/1250 [03:37<03:46, 2.91it/s] Training 1/1 epoch (loss 2.7579): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 590/1250 [03:37<03:46, 2.91it/s] Training 1/1 epoch (loss 2.6447): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 590/1250 [03:37<03:46, 2.91it/s] Training 1/1 epoch (loss 2.6447): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 591/1250 [03:37<03:50, 2.86it/s] Training 1/1 epoch (loss 2.7672): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 591/1250 [03:38<03:50, 2.86it/s] Training 1/1 epoch (loss 2.7672): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 592/1250 [03:38<03:48, 2.88it/s] Training 1/1 epoch (loss 2.8807): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 592/1250 [03:38<03:48, 2.88it/s] Training 1/1 epoch (loss 2.8807): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 593/1250 [03:38<03:49, 2.86it/s] Training 1/1 epoch (loss 2.6761): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 593/1250 [03:38<03:49, 2.86it/s] Training 1/1 epoch (loss 2.6761): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 594/1250 [03:38<03:49, 2.86it/s] Training 1/1 epoch (loss 2.6565): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 594/1250 [03:39<03:49, 2.86it/s] Training 1/1 epoch (loss 2.6565): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 595/1250 [03:39<03:46, 2.89it/s] Training 1/1 epoch (loss 2.7000): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 595/1250 [03:39<03:46, 2.89it/s] Training 1/1 epoch (loss 2.7000): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 596/1250 [03:39<03:35, 3.03it/s] Training 1/1 epoch (loss 2.7839): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 596/1250 [03:39<03:35, 3.03it/s] Training 1/1 epoch (loss 2.7839): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 597/1250 [03:39<03:47, 2.87it/s] Training 1/1 epoch (loss 2.7255): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 597/1250 [03:40<03:47, 2.87it/s] Training 1/1 epoch (loss 2.7255): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 598/1250 [03:40<03:41, 2.94it/s] Training 1/1 epoch (loss 2.7522): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 598/1250 [03:40<03:41, 2.94it/s] Training 1/1 epoch (loss 2.7522): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 599/1250 [03:40<03:43, 2.92it/s] Training 1/1 epoch (loss 2.7732): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 599/1250 [03:40<03:43, 2.92it/s] Training 1/1 epoch (loss 2.7732): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 600/1250 [03:40<03:49, 2.83it/s] Training 1/1 epoch (loss 2.9322): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 600/1250 [03:41<03:49, 2.83it/s] Training 1/1 epoch (loss 2.9322): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 601/1250 [03:41<03:50, 2.82it/s] Training 1/1 epoch (loss 2.6647): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 601/1250 [03:41<03:50, 2.82it/s] Training 1/1 epoch (loss 2.6647): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 602/1250 [03:41<03:44, 2.89it/s] Training 1/1 epoch (loss 2.6633): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 602/1250 [03:41<03:44, 2.89it/s] Training 1/1 epoch (loss 2.6633): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 603/1250 [03:41<03:51, 2.80it/s] Training 1/1 epoch (loss 2.8071): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 603/1250 [03:42<03:51, 2.80it/s] Training 1/1 epoch (loss 2.8071): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 604/1250 [03:42<03:39, 2.95it/s] Training 1/1 epoch (loss 2.7215): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 604/1250 [03:42<03:39, 2.95it/s] Training 1/1 epoch (loss 2.7215): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 605/1250 [03:42<03:34, 3.01it/s] Training 1/1 epoch (loss 2.8361): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 605/1250 [03:42<03:34, 3.01it/s] Training 1/1 epoch (loss 2.8361): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 606/1250 [03:42<03:37, 2.96it/s] Training 1/1 epoch (loss 2.5946): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 606/1250 [03:43<03:37, 2.96it/s] Training 1/1 epoch (loss 2.5946): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 607/1250 [03:43<03:32, 3.02it/s] Training 1/1 epoch (loss 2.9467): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 607/1250 [03:43<03:32, 3.02it/s] Training 1/1 epoch (loss 2.9467): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 608/1250 [03:43<03:36, 2.96it/s] Training 1/1 epoch (loss 2.6877): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 608/1250 [03:43<03:36, 2.96it/s] Training 1/1 epoch (loss 2.6877): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 609/1250 [03:43<03:44, 2.86it/s] Training 1/1 epoch (loss 2.8047): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 609/1250 [03:44<03:44, 2.86it/s] Training 1/1 epoch (loss 2.8047): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 610/1250 [03:44<03:49, 2.79it/s] Training 1/1 epoch (loss 2.8422): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 610/1250 [03:44<03:49, 2.79it/s] Training 1/1 epoch (loss 2.8422): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 611/1250 [03:44<03:43, 2.85it/s] Training 1/1 epoch (loss 2.6157): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 611/1250 [03:45<03:43, 2.85it/s] Training 1/1 epoch (loss 2.6157): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 612/1250 [03:45<03:49, 2.78it/s] Training 1/1 epoch (loss 2.8673): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 612/1250 [03:45<03:49, 2.78it/s] Training 1/1 epoch (loss 2.8673): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 613/1250 [03:45<03:44, 2.84it/s] Training 1/1 epoch (loss 2.9131): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 613/1250 [03:45<03:44, 2.84it/s] Training 1/1 epoch (loss 2.9131): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 614/1250 [03:45<03:39, 2.89it/s] Training 1/1 epoch (loss 2.7717): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 614/1250 [03:46<03:39, 2.89it/s] Training 1/1 epoch (loss 2.7717): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 615/1250 [03:46<03:36, 2.93it/s] Training 1/1 epoch (loss 2.8407): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 615/1250 [03:46<03:36, 2.93it/s] Training 1/1 epoch (loss 2.8407): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 616/1250 [03:46<03:43, 2.84it/s] Training 1/1 epoch (loss 2.3852): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 616/1250 [03:46<03:43, 2.84it/s] Training 1/1 epoch (loss 2.3852): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 617/1250 [03:46<03:50, 2.74it/s] Training 1/1 epoch (loss 2.7459): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 617/1250 [03:47<03:50, 2.74it/s] Training 1/1 epoch (loss 2.7459): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 618/1250 [03:47<03:46, 2.79it/s] Training 1/1 epoch (loss 2.6517): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 618/1250 [03:47<03:46, 2.79it/s] Training 1/1 epoch (loss 2.6517): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 619/1250 [03:47<03:44, 2.80it/s] Training 1/1 epoch (loss 2.6508): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 619/1250 [03:47<03:44, 2.80it/s] Training 1/1 epoch (loss 2.6508): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 620/1250 [03:47<03:39, 2.86it/s] Training 1/1 epoch (loss 2.5915): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 620/1250 [03:48<03:39, 2.86it/s] Training 1/1 epoch (loss 2.5915): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 621/1250 [03:48<03:32, 2.96it/s] Training 1/1 epoch (loss 2.9703): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 621/1250 [03:48<03:32, 2.96it/s] Training 1/1 epoch (loss 2.9703): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 622/1250 [03:48<03:32, 2.95it/s] Training 1/1 epoch (loss 2.7742): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 622/1250 [03:48<03:32, 2.95it/s] Training 1/1 epoch (loss 2.7742): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 623/1250 [03:48<03:29, 3.00it/s] Training 1/1 epoch (loss 2.7100): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 623/1250 [03:49<03:29, 3.00it/s] Training 1/1 epoch (loss 2.7100): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 624/1250 [03:49<03:28, 3.01it/s] Training 1/1 epoch (loss 2.7804): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 624/1250 [03:49<03:28, 3.01it/s] Training 1/1 epoch (loss 2.7804): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 625/1250 [03:49<03:42, 2.81it/s] Training 1/1 epoch (loss 2.8126): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 625/1250 [03:49<03:42, 2.81it/s] Training 1/1 epoch (loss 2.8126): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 626/1250 [03:49<03:44, 2.78it/s] Training 1/1 epoch (loss 2.5222): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 626/1250 [03:50<03:44, 2.78it/s] Training 1/1 epoch (loss 2.5222): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 627/1250 [03:50<03:35, 2.88it/s] Training 1/1 epoch (loss 2.7319): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 627/1250 [03:50<03:35, 2.88it/s] Training 1/1 epoch (loss 2.7319): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 628/1250 [03:50<03:30, 2.96it/s] Training 1/1 epoch (loss 2.8925): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 628/1250 [03:50<03:30, 2.96it/s] Training 1/1 epoch (loss 2.8925): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 629/1250 [03:50<03:26, 3.01it/s] Training 1/1 epoch (loss 2.4747): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 629/1250 [03:51<03:26, 3.01it/s] Training 1/1 epoch (loss 2.4747): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 630/1250 [03:51<03:26, 3.00it/s] Training 1/1 epoch (loss 2.7060): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 630/1250 [03:51<03:26, 3.00it/s] Training 1/1 epoch (loss 2.7060): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 631/1250 [03:51<03:21, 3.07it/s] Training 1/1 epoch (loss 2.8996): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 631/1250 [03:51<03:21, 3.07it/s] Training 1/1 epoch (loss 2.8996): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 632/1250 [03:51<03:38, 2.83it/s] Training 1/1 epoch (loss 2.7837): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 632/1250 [03:52<03:38, 2.83it/s] Training 1/1 epoch (loss 2.7837): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 633/1250 [03:52<03:38, 2.83it/s] Training 1/1 epoch (loss 2.7224): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 633/1250 [03:52<03:38, 2.83it/s] Training 1/1 epoch (loss 2.7224): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 634/1250 [03:52<03:30, 2.93it/s] Training 1/1 epoch (loss 2.6303): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 634/1250 [03:52<03:30, 2.93it/s] Training 1/1 epoch (loss 2.6303): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 635/1250 [03:52<03:31, 2.91it/s] Training 1/1 epoch (loss 2.5313): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 635/1250 [03:53<03:31, 2.91it/s] Training 1/1 epoch (loss 2.5313): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 636/1250 [03:53<04:03, 2.52it/s] Training 1/1 epoch (loss 2.6430): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 636/1250 [03:53<04:03, 2.52it/s] Training 1/1 epoch (loss 2.6430): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 637/1250 [03:53<03:53, 2.62it/s] Training 1/1 epoch (loss 2.5807): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 637/1250 [03:54<03:53, 2.62it/s] Training 1/1 epoch (loss 2.5807): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 638/1250 [03:54<03:48, 2.67it/s] Training 1/1 epoch (loss 2.9059): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 638/1250 [03:54<03:48, 2.67it/s] Training 1/1 epoch (loss 2.9059): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 639/1250 [03:54<03:44, 2.73it/s] Training 1/1 epoch (loss 2.7133): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 639/1250 [03:54<03:44, 2.73it/s] Training 1/1 epoch (loss 2.7133): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 640/1250 [03:54<03:44, 2.71it/s] Training 1/1 epoch (loss 2.6685): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 640/1250 [03:55<03:44, 2.71it/s] Training 1/1 epoch (loss 2.6685): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 641/1250 [03:55<03:43, 2.72it/s] Training 1/1 epoch (loss 2.6587): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 641/1250 [03:55<03:43, 2.72it/s] Training 1/1 epoch (loss 2.6587): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 642/1250 [03:55<03:32, 2.86it/s] Training 1/1 epoch (loss 2.7331): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 642/1250 [03:56<03:32, 2.86it/s] Training 1/1 epoch (loss 2.7331): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 643/1250 [03:56<03:45, 2.69it/s] Training 1/1 epoch (loss 2.7027): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 643/1250 [03:56<03:45, 2.69it/s] Training 1/1 epoch (loss 2.7027): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 644/1250 [03:56<03:40, 2.75it/s] Training 1/1 epoch (loss 2.6384): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 644/1250 [03:56<03:40, 2.75it/s] Training 1/1 epoch (loss 2.6384): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 645/1250 [03:56<03:33, 2.83it/s] Training 1/1 epoch (loss 2.3503): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 645/1250 [03:57<03:33, 2.83it/s] Training 1/1 epoch (loss 2.3503): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 646/1250 [03:57<03:38, 2.76it/s] Training 1/1 epoch (loss 2.6597): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 646/1250 [03:57<03:38, 2.76it/s] Training 1/1 epoch (loss 2.6597): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 647/1250 [03:57<03:38, 2.76it/s] Training 1/1 epoch (loss 2.9339): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 647/1250 [03:57<03:38, 2.76it/s] Training 1/1 epoch (loss 2.9339): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 648/1250 [03:57<03:42, 2.71it/s] Training 1/1 epoch (loss 2.5646): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 648/1250 [03:58<03:42, 2.71it/s] Training 1/1 epoch (loss 2.5646): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 649/1250 [03:58<04:15, 2.35it/s] Training 1/1 epoch (loss 2.8481): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 649/1250 [03:58<04:15, 2.35it/s] Training 1/1 epoch (loss 2.8481): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 650/1250 [03:58<04:09, 2.41it/s] Training 1/1 epoch (loss 2.7800): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 650/1250 [03:59<04:09, 2.41it/s] Training 1/1 epoch (loss 2.7800): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 651/1250 [03:59<03:57, 2.53it/s] Training 1/1 epoch (loss 2.6242): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 651/1250 [03:59<03:57, 2.53it/s] Training 1/1 epoch (loss 2.6242): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 652/1250 [03:59<03:42, 2.69it/s] Training 1/1 epoch (loss 2.6171): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 652/1250 [03:59<03:42, 2.69it/s] Training 1/1 epoch (loss 2.6171): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 653/1250 [03:59<03:44, 2.66it/s] Training 1/1 epoch (loss 2.5592): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 653/1250 [04:00<03:44, 2.66it/s] Training 1/1 epoch (loss 2.5592): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 654/1250 [04:00<03:39, 2.72it/s] Training 1/1 epoch (loss 2.7348): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 654/1250 [04:00<03:39, 2.72it/s] Training 1/1 epoch (loss 2.7348): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 655/1250 [04:00<03:32, 2.80it/s] Training 1/1 epoch (loss 2.9552): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 655/1250 [04:00<03:32, 2.80it/s] Training 1/1 epoch (loss 2.9552): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 656/1250 [04:00<03:26, 2.87it/s] Training 1/1 epoch (loss 2.8662): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 656/1250 [04:01<03:26, 2.87it/s] Training 1/1 epoch (loss 2.8662): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 657/1250 [04:01<03:26, 2.87it/s] Training 1/1 epoch (loss 2.8978): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 657/1250 [04:01<03:26, 2.87it/s] Training 1/1 epoch (loss 2.8978): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 658/1250 [04:01<03:20, 2.96it/s] Training 1/1 epoch (loss 2.6213): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 658/1250 [04:01<03:20, 2.96it/s] Training 1/1 epoch (loss 2.6213): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 659/1250 [04:01<03:21, 2.93it/s] Training 1/1 epoch (loss 2.4187): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 659/1250 [04:02<03:21, 2.93it/s] Training 1/1 epoch (loss 2.4187): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 660/1250 [04:02<03:23, 2.90it/s] Training 1/1 epoch (loss 2.5700): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 660/1250 [04:02<03:23, 2.90it/s] Training 1/1 epoch (loss 2.5700): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 661/1250 [04:02<03:28, 2.83it/s] Training 1/1 epoch (loss 2.6825): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 661/1250 [04:02<03:28, 2.83it/s] Training 1/1 epoch (loss 2.6825): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 662/1250 [04:02<03:24, 2.88it/s] Training 1/1 epoch (loss 2.7438): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 662/1250 [04:03<03:24, 2.88it/s] Training 1/1 epoch (loss 2.7438): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 663/1250 [04:03<03:18, 2.96it/s] Training 1/1 epoch (loss 2.7994): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 663/1250 [04:03<03:18, 2.96it/s] Training 1/1 epoch (loss 2.7994): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 664/1250 [04:03<03:20, 2.93it/s] Training 1/1 epoch (loss 2.7399): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 664/1250 [04:03<03:20, 2.93it/s] Training 1/1 epoch (loss 2.7399): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 665/1250 [04:03<03:24, 2.86it/s] Training 1/1 epoch (loss 2.7107): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 665/1250 [04:04<03:24, 2.86it/s] Training 1/1 epoch (loss 2.7107): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 666/1250 [04:04<03:29, 2.78it/s] Training 1/1 epoch (loss 2.9298): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 666/1250 [04:04<03:29, 2.78it/s] Training 1/1 epoch (loss 2.9298): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 667/1250 [04:04<03:28, 2.80it/s] Training 1/1 epoch (loss 2.7903): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 667/1250 [04:05<03:28, 2.80it/s] Training 1/1 epoch (loss 2.7903): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 668/1250 [04:05<03:27, 2.81it/s] Training 1/1 epoch (loss 2.7540): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 668/1250 [04:05<03:27, 2.81it/s] Training 1/1 epoch (loss 2.7540): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 669/1250 [04:05<03:22, 2.86it/s] Training 1/1 epoch (loss 3.0198): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 669/1250 [04:05<03:22, 2.86it/s] Training 1/1 epoch (loss 3.0198): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 670/1250 [04:05<03:15, 2.97it/s] Training 1/1 epoch (loss 3.0074): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 670/1250 [04:06<03:15, 2.97it/s] Training 1/1 epoch (loss 3.0074): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 671/1250 [04:06<03:18, 2.92it/s] Training 1/1 epoch (loss 2.9435): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 671/1250 [04:06<03:18, 2.92it/s] Training 1/1 epoch (loss 2.9435): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 672/1250 [04:06<03:15, 2.96it/s] Training 1/1 epoch (loss 2.6271): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 672/1250 [04:06<03:15, 2.96it/s] Training 1/1 epoch (loss 2.6271): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 673/1250 [04:06<03:17, 2.92it/s] Training 1/1 epoch (loss 2.8534): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 673/1250 [04:07<03:17, 2.92it/s] Training 1/1 epoch (loss 2.8534): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 674/1250 [04:07<03:09, 3.04it/s] Training 1/1 epoch (loss 2.5239): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 674/1250 [04:07<03:09, 3.04it/s] Training 1/1 epoch (loss 2.5239): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 675/1250 [04:07<03:11, 3.00it/s] Training 1/1 epoch (loss 2.7671): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 675/1250 [04:07<03:11, 3.00it/s] Training 1/1 epoch (loss 2.7671): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 676/1250 [04:07<03:15, 2.94it/s] Training 1/1 epoch (loss 2.5753): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 676/1250 [04:08<03:15, 2.94it/s] Training 1/1 epoch (loss 2.5753): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 677/1250 [04:08<03:15, 2.93it/s] Training 1/1 epoch (loss 2.8298): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 677/1250 [04:08<03:15, 2.93it/s] Training 1/1 epoch (loss 2.8298): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 678/1250 [04:08<03:09, 3.01it/s] Training 1/1 epoch (loss 2.8314): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 678/1250 [04:08<03:09, 3.01it/s] Training 1/1 epoch (loss 2.8314): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 679/1250 [04:08<03:08, 3.03it/s] Training 1/1 epoch (loss 2.7359): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 679/1250 [04:09<03:08, 3.03it/s] Training 1/1 epoch (loss 2.7359): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 680/1250 [04:09<03:11, 2.98it/s] Training 1/1 epoch (loss 2.6415): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 680/1250 [04:09<03:11, 2.98it/s] Training 1/1 epoch (loss 2.6415): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 681/1250 [04:09<03:11, 2.97it/s] Training 1/1 epoch (loss 2.8631): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 681/1250 [04:09<03:11, 2.97it/s] Training 1/1 epoch (loss 2.8631): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 682/1250 [04:09<03:09, 3.00it/s] Training 1/1 epoch (loss 2.8550): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 682/1250 [04:10<03:09, 3.00it/s] Training 1/1 epoch (loss 2.8550): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 683/1250 [04:10<03:11, 2.96it/s] Training 1/1 epoch (loss 2.8308): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 683/1250 [04:10<03:11, 2.96it/s] Training 1/1 epoch (loss 2.8308): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 684/1250 [04:10<03:07, 3.02it/s] Training 1/1 epoch (loss 2.5500): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 684/1250 [04:10<03:07, 3.02it/s] Training 1/1 epoch (loss 2.5500): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 685/1250 [04:10<03:06, 3.03it/s] Training 1/1 epoch (loss 2.8118): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 685/1250 [04:11<03:06, 3.03it/s] Training 1/1 epoch (loss 2.8118): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 686/1250 [04:11<03:08, 3.00it/s] Training 1/1 epoch (loss 2.6204): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 686/1250 [04:11<03:08, 3.00it/s] Training 1/1 epoch (loss 2.6204): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 687/1250 [04:11<02:59, 3.13it/s] Training 1/1 epoch (loss 2.7879): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 687/1250 [04:11<02:59, 3.13it/s] Training 1/1 epoch (loss 2.7879): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 688/1250 [04:11<03:08, 2.99it/s] Training 1/1 epoch (loss 2.8381): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 688/1250 [04:12<03:08, 2.99it/s] Training 1/1 epoch (loss 2.8381): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 689/1250 [04:12<03:06, 3.00it/s] Training 1/1 epoch (loss 2.8529): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 689/1250 [04:12<03:06, 3.00it/s] Training 1/1 epoch (loss 2.8529): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 690/1250 [04:12<03:04, 3.03it/s] Training 1/1 epoch (loss 2.6622): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 690/1250 [04:12<03:04, 3.03it/s] Training 1/1 epoch (loss 2.6622): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 691/1250 [04:12<03:01, 3.08it/s] Training 1/1 epoch (loss 2.7844): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 691/1250 [04:13<03:01, 3.08it/s] Training 1/1 epoch (loss 2.7844): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 692/1250 [04:13<03:08, 2.96it/s] Training 1/1 epoch (loss 2.7658): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 692/1250 [04:13<03:08, 2.96it/s] Training 1/1 epoch (loss 2.7658): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 693/1250 [04:13<03:04, 3.02it/s] Training 1/1 epoch (loss 2.9522): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 693/1250 [04:13<03:04, 3.02it/s] Training 1/1 epoch (loss 2.9522): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 694/1250 [04:13<02:59, 3.10it/s] Training 1/1 epoch (loss 2.7292): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 694/1250 [04:14<02:59, 3.10it/s] Training 1/1 epoch (loss 2.7292): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 695/1250 [04:14<03:13, 2.87it/s] Training 1/1 epoch (loss 2.9572): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 695/1250 [04:14<03:13, 2.87it/s] Training 1/1 epoch (loss 2.9572): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 696/1250 [04:14<03:20, 2.76it/s] Training 1/1 epoch (loss 2.7399): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 696/1250 [04:14<03:20, 2.76it/s] Training 1/1 epoch (loss 2.7399): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 697/1250 [04:14<03:25, 2.69it/s] Training 1/1 epoch (loss 2.6775): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 697/1250 [04:15<03:25, 2.69it/s] Training 1/1 epoch (loss 2.6775): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 698/1250 [04:15<03:24, 2.70it/s] Training 1/1 epoch (loss 2.8843): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 698/1250 [04:15<03:24, 2.70it/s] Training 1/1 epoch (loss 2.8843): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 699/1250 [04:15<03:22, 2.72it/s] Training 1/1 epoch (loss 3.0212): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 699/1250 [04:15<03:22, 2.72it/s] Training 1/1 epoch (loss 3.0212): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 700/1250 [04:15<03:14, 2.83it/s] Training 1/1 epoch (loss 2.9040): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 700/1250 [04:16<03:14, 2.83it/s] Training 1/1 epoch (loss 2.9040): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 701/1250 [04:16<03:14, 2.83it/s] Training 1/1 epoch (loss 2.7188): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 701/1250 [04:16<03:14, 2.83it/s] Training 1/1 epoch (loss 2.7188): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 702/1250 [04:16<03:07, 2.93it/s] Training 1/1 epoch (loss 2.7964): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 702/1250 [04:16<03:07, 2.93it/s] Training 1/1 epoch (loss 2.7964): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 703/1250 [04:16<03:05, 2.95it/s] Training 1/1 epoch (loss 2.5746): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 703/1250 [04:17<03:05, 2.95it/s] Training 1/1 epoch (loss 2.5746): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 704/1250 [04:17<03:10, 2.86it/s] Training 1/1 epoch (loss 2.7986): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 704/1250 [04:17<03:10, 2.86it/s] Training 1/1 epoch (loss 2.7986): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 705/1250 [04:17<03:06, 2.93it/s] Training 1/1 epoch (loss 2.8114): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 705/1250 [04:17<03:06, 2.93it/s] Training 1/1 epoch (loss 2.8114): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 706/1250 [04:17<03:07, 2.90it/s] Training 1/1 epoch (loss 2.7998): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 706/1250 [04:18<03:07, 2.90it/s] Training 1/1 epoch (loss 2.7998): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 707/1250 [04:18<03:19, 2.72it/s] Training 1/1 epoch (loss 2.8607): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 707/1250 [04:18<03:19, 2.72it/s] Training 1/1 epoch (loss 2.8607): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 708/1250 [04:18<03:12, 2.81it/s] Training 1/1 epoch (loss 2.8848): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 708/1250 [04:18<03:12, 2.81it/s] Training 1/1 epoch (loss 2.8848): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 709/1250 [04:18<03:02, 2.97it/s] Training 1/1 epoch (loss 2.7264): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 709/1250 [04:19<03:02, 2.97it/s] Training 1/1 epoch (loss 2.7264): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 710/1250 [04:19<03:00, 2.99it/s] Training 1/1 epoch (loss 2.8575): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 710/1250 [04:19<03:00, 2.99it/s] Training 1/1 epoch (loss 2.8575): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 711/1250 [04:19<02:57, 3.04it/s] Training 1/1 epoch (loss 2.8407): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 711/1250 [04:19<02:57, 3.04it/s] Training 1/1 epoch (loss 2.8407): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 712/1250 [04:19<02:59, 3.00it/s] Training 1/1 epoch (loss 2.6309): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 712/1250 [04:20<02:59, 3.00it/s] Training 1/1 epoch (loss 2.6309): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 713/1250 [04:20<03:01, 2.95it/s] Training 1/1 epoch (loss 2.9797): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 713/1250 [04:20<03:01, 2.95it/s] Training 1/1 epoch (loss 2.9797): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 714/1250 [04:20<03:01, 2.95it/s] Training 1/1 epoch (loss 2.8169): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 714/1250 [04:20<03:01, 2.95it/s] Training 1/1 epoch (loss 2.8169): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 715/1250 [04:20<02:57, 3.02it/s] Training 1/1 epoch (loss 2.4872): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 715/1250 [04:21<02:57, 3.02it/s] Training 1/1 epoch (loss 2.4872): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 716/1250 [04:21<03:02, 2.92it/s] Training 1/1 epoch (loss 2.8422): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 716/1250 [04:21<03:02, 2.92it/s] Training 1/1 epoch (loss 2.8422): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 717/1250 [04:21<02:57, 3.00it/s] Training 1/1 epoch (loss 2.8244): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 717/1250 [04:21<02:57, 3.00it/s] Training 1/1 epoch (loss 2.8244): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 718/1250 [04:21<02:56, 3.02it/s] Training 1/1 epoch (loss 2.8057): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 718/1250 [04:22<02:56, 3.02it/s] Training 1/1 epoch (loss 2.8057): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 719/1250 [04:22<02:59, 2.96it/s] Training 1/1 epoch (loss 2.9178): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 719/1250 [04:22<02:59, 2.96it/s] Training 1/1 epoch (loss 2.9178): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 720/1250 [04:22<03:02, 2.90it/s] Training 1/1 epoch (loss 2.6634): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 720/1250 [04:23<03:02, 2.90it/s] Training 1/1 epoch (loss 2.6634): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 721/1250 [04:23<03:03, 2.88it/s] Training 1/1 epoch (loss 2.6638): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 721/1250 [04:23<03:03, 2.88it/s] Training 1/1 epoch (loss 2.6638): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 722/1250 [04:23<03:52, 2.27it/s] Training 1/1 epoch (loss 2.5081): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 722/1250 [04:24<03:52, 2.27it/s] Training 1/1 epoch (loss 2.5081): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 723/1250 [04:24<03:41, 2.38it/s] Training 1/1 epoch (loss 2.6696): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 723/1250 [04:24<03:41, 2.38it/s] Training 1/1 epoch (loss 2.6696): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 724/1250 [04:24<03:27, 2.53it/s] Training 1/1 epoch (loss 3.0889): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 724/1250 [04:24<03:27, 2.53it/s] Training 1/1 epoch (loss 3.0889): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 725/1250 [04:24<03:22, 2.60it/s] Training 1/1 epoch (loss 2.6679): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 725/1250 [04:25<03:22, 2.60it/s] Training 1/1 epoch (loss 2.6679): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 726/1250 [04:25<03:14, 2.70it/s] Training 1/1 epoch (loss 2.6324): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 726/1250 [04:25<03:14, 2.70it/s] Training 1/1 epoch (loss 2.6324): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 727/1250 [04:25<03:06, 2.80it/s] Training 1/1 epoch (loss 2.6400): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 727/1250 [04:25<03:06, 2.80it/s] Training 1/1 epoch (loss 2.6400): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 728/1250 [04:25<03:14, 2.69it/s] Training 1/1 epoch (loss 2.8310): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 728/1250 [04:26<03:14, 2.69it/s] Training 1/1 epoch (loss 2.8310): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 729/1250 [04:26<03:12, 2.70it/s] Training 1/1 epoch (loss 2.8372): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 729/1250 [04:26<03:12, 2.70it/s] Training 1/1 epoch (loss 2.8372): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 730/1250 [04:26<03:07, 2.77it/s] Training 1/1 epoch (loss 2.7676): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 730/1250 [04:26<03:07, 2.77it/s] Training 1/1 epoch (loss 2.7676): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 731/1250 [04:26<03:06, 2.78it/s] Training 1/1 epoch (loss 2.4912): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 731/1250 [04:27<03:06, 2.78it/s] Training 1/1 epoch (loss 2.4912): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 732/1250 [04:27<02:56, 2.93it/s] Training 1/1 epoch (loss 2.6205): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 732/1250 [04:27<02:56, 2.93it/s] Training 1/1 epoch (loss 2.6205): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 733/1250 [04:27<02:55, 2.94it/s] Training 1/1 epoch (loss 2.7251): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 733/1250 [04:27<02:55, 2.94it/s] Training 1/1 epoch (loss 2.7251): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 734/1250 [04:27<02:55, 2.94it/s] Training 1/1 epoch (loss 2.6939): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 734/1250 [04:28<02:55, 2.94it/s] Training 1/1 epoch (loss 2.6939): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 735/1250 [04:28<03:11, 2.69it/s] Training 1/1 epoch (loss 2.9324): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 735/1250 [04:28<03:11, 2.69it/s] Training 1/1 epoch (loss 2.9324): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 736/1250 [04:28<03:26, 2.49it/s] Training 1/1 epoch (loss 2.5819): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 736/1250 [04:29<03:26, 2.49it/s] Training 1/1 epoch (loss 2.5819): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 737/1250 [04:29<03:30, 2.44it/s] Training 1/1 epoch (loss 2.7011): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 737/1250 [04:29<03:30, 2.44it/s] Training 1/1 epoch (loss 2.7011): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 738/1250 [04:29<03:21, 2.55it/s] Training 1/1 epoch (loss 2.7576): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 738/1250 [04:29<03:21, 2.55it/s] Training 1/1 epoch (loss 2.7576): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 739/1250 [04:29<03:12, 2.66it/s] Training 1/1 epoch (loss 2.6903): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 739/1250 [04:30<03:12, 2.66it/s] Training 1/1 epoch (loss 2.6903): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 740/1250 [04:30<03:05, 2.75it/s] Training 1/1 epoch (loss 2.6679): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 740/1250 [04:30<03:05, 2.75it/s] Training 1/1 epoch (loss 2.6679): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 741/1250 [04:30<02:59, 2.83it/s] Training 1/1 epoch (loss 2.8213): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 741/1250 [04:30<02:59, 2.83it/s] Training 1/1 epoch (loss 2.8213): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 742/1250 [04:30<02:54, 2.91it/s] Training 1/1 epoch (loss 2.7489): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 742/1250 [04:31<02:54, 2.91it/s] Training 1/1 epoch (loss 2.7489): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 743/1250 [04:31<02:54, 2.90it/s] Training 1/1 epoch (loss 2.8107): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 743/1250 [04:31<02:54, 2.90it/s] Training 1/1 epoch (loss 2.8107): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 744/1250 [04:31<02:56, 2.87it/s] Training 1/1 epoch (loss 2.7591): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 744/1250 [04:31<02:56, 2.87it/s] Training 1/1 epoch (loss 2.7591): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 745/1250 [04:31<02:59, 2.81it/s] Training 1/1 epoch (loss 2.8344): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 745/1250 [04:32<02:59, 2.81it/s] Training 1/1 epoch (loss 2.8344): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 746/1250 [04:32<03:01, 2.78it/s] Training 1/1 epoch (loss 2.5994): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 746/1250 [04:32<03:01, 2.78it/s] Training 1/1 epoch (loss 2.5994): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 747/1250 [04:32<02:59, 2.80it/s] Training 1/1 epoch (loss 2.7995): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 747/1250 [04:33<02:59, 2.80it/s] Training 1/1 epoch (loss 2.7995): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 748/1250 [04:33<02:57, 2.83it/s] Training 1/1 epoch (loss 2.5950): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 748/1250 [04:33<02:57, 2.83it/s] Training 1/1 epoch (loss 2.5950): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 749/1250 [04:33<02:52, 2.91it/s] Training 1/1 epoch (loss 2.7940): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 749/1250 [04:33<02:52, 2.91it/s] Training 1/1 epoch (loss 2.7940): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 750/1250 [04:33<02:49, 2.95it/s] Training 1/1 epoch (loss 2.5636): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 750/1250 [04:34<02:49, 2.95it/s] Training 1/1 epoch (loss 2.5636): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 751/1250 [04:34<02:48, 2.96it/s] Training 1/1 epoch (loss 2.8083): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 751/1250 [04:34<02:48, 2.96it/s] Training 1/1 epoch (loss 2.8083): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 752/1250 [04:34<03:07, 2.65it/s] Training 1/1 epoch (loss 2.9822): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 752/1250 [04:34<03:07, 2.65it/s] Training 1/1 epoch (loss 2.9822): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 753/1250 [04:34<03:04, 2.69it/s] Training 1/1 epoch (loss 2.9166): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 753/1250 [04:35<03:04, 2.69it/s] Training 1/1 epoch (loss 2.9166): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 754/1250 [04:35<02:58, 2.79it/s] Training 1/1 epoch (loss 2.7191): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 754/1250 [04:35<02:58, 2.79it/s] Training 1/1 epoch (loss 2.7191): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 755/1250 [04:35<02:53, 2.85it/s] Training 1/1 epoch (loss 2.9729): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 755/1250 [04:35<02:53, 2.85it/s] Training 1/1 epoch (loss 2.9729): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 756/1250 [04:35<02:50, 2.89it/s] Training 1/1 epoch (loss 2.7689): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 756/1250 [04:36<02:50, 2.89it/s] Training 1/1 epoch (loss 2.7689): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 757/1250 [04:36<02:51, 2.87it/s] Training 1/1 epoch (loss 2.6778): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 757/1250 [04:36<02:51, 2.87it/s] Training 1/1 epoch (loss 2.6778): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 758/1250 [04:36<02:48, 2.91it/s] Training 1/1 epoch (loss 2.7340): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 758/1250 [04:36<02:48, 2.91it/s] Training 1/1 epoch (loss 2.7340): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 759/1250 [04:36<02:48, 2.92it/s] Training 1/1 epoch (loss 2.7932): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 759/1250 [04:37<02:48, 2.92it/s] Training 1/1 epoch (loss 2.7932): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 760/1250 [04:37<02:46, 2.95it/s] Training 1/1 epoch (loss 2.8299): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 760/1250 [04:37<02:46, 2.95it/s] Training 1/1 epoch (loss 2.8299): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 761/1250 [04:37<02:45, 2.95it/s] Training 1/1 epoch (loss 2.6074): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 761/1250 [04:37<02:45, 2.95it/s] Training 1/1 epoch (loss 2.6074): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 762/1250 [04:37<02:46, 2.93it/s] Training 1/1 epoch (loss 2.6011): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 762/1250 [04:38<02:46, 2.93it/s] Training 1/1 epoch (loss 2.6011): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 763/1250 [04:38<02:45, 2.94it/s] Training 1/1 epoch (loss 2.7645): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 763/1250 [04:38<02:45, 2.94it/s] Training 1/1 epoch (loss 2.7645): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 764/1250 [04:38<02:39, 3.05it/s] Training 1/1 epoch (loss 2.7985): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 764/1250 [04:38<02:39, 3.05it/s] Training 1/1 epoch (loss 2.7985): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 765/1250 [04:38<02:44, 2.94it/s] Training 1/1 epoch (loss 2.5274): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 765/1250 [04:39<02:44, 2.94it/s] Training 1/1 epoch (loss 2.5274): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 766/1250 [04:39<02:43, 2.97it/s] Training 1/1 epoch (loss 2.5422): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 766/1250 [04:39<02:43, 2.97it/s] Training 1/1 epoch (loss 2.5422): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 767/1250 [04:39<02:44, 2.93it/s] Training 1/1 epoch (loss 2.7259): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 767/1250 [04:39<02:44, 2.93it/s] Training 1/1 epoch (loss 2.7259): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 768/1250 [04:39<02:49, 2.84it/s] Training 1/1 epoch (loss 2.9017): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 768/1250 [04:40<02:49, 2.84it/s] Training 1/1 epoch (loss 2.9017): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 769/1250 [04:40<02:46, 2.89it/s] Training 1/1 epoch (loss 2.7449): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 769/1250 [04:40<02:46, 2.89it/s] Training 1/1 epoch (loss 2.7449): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 770/1250 [04:40<02:48, 2.85it/s] Training 1/1 epoch (loss 2.8049): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 770/1250 [04:41<02:48, 2.85it/s] Training 1/1 epoch (loss 2.8049): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 771/1250 [04:41<02:48, 2.85it/s] Training 1/1 epoch (loss 2.8361): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 771/1250 [04:41<02:48, 2.85it/s] Training 1/1 epoch (loss 2.8361): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 772/1250 [04:41<02:42, 2.94it/s] Training 1/1 epoch (loss 2.6996): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 772/1250 [04:41<02:42, 2.94it/s] Training 1/1 epoch (loss 2.6996): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 773/1250 [04:41<02:39, 2.99it/s] Training 1/1 epoch (loss 2.6951): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 773/1250 [04:41<02:39, 2.99it/s] Training 1/1 epoch (loss 2.6951): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 774/1250 [04:41<02:38, 3.01it/s] Training 1/1 epoch (loss 2.7918): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 774/1250 [04:42<02:38, 3.01it/s] Training 1/1 epoch (loss 2.7918): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 775/1250 [04:42<02:38, 3.00it/s] Training 1/1 epoch (loss 2.6249): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 775/1250 [04:42<02:38, 3.00it/s] Training 1/1 epoch (loss 2.6249): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 776/1250 [04:42<02:40, 2.95it/s] Training 1/1 epoch (loss 2.7974): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 776/1250 [04:43<02:40, 2.95it/s] Training 1/1 epoch (loss 2.7974): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 777/1250 [04:43<02:44, 2.88it/s] Training 1/1 epoch (loss 2.8290): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 777/1250 [04:43<02:44, 2.88it/s] Training 1/1 epoch (loss 2.8290): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 778/1250 [04:43<02:37, 3.00it/s] Training 1/1 epoch (loss 2.6965): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 778/1250 [04:43<02:37, 3.00it/s] Training 1/1 epoch (loss 2.6965): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 779/1250 [04:43<02:36, 3.01it/s] Training 1/1 epoch (loss 2.9995): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 779/1250 [04:43<02:36, 3.01it/s] Training 1/1 epoch (loss 2.9995): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 780/1250 [04:43<02:33, 3.06it/s] Training 1/1 epoch (loss 2.5981): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 780/1250 [04:44<02:33, 3.06it/s] Training 1/1 epoch (loss 2.5981): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 781/1250 [04:44<02:40, 2.93it/s] Training 1/1 epoch (loss 2.8203): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 781/1250 [04:44<02:40, 2.93it/s] Training 1/1 epoch (loss 2.8203): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 782/1250 [04:44<02:49, 2.76it/s] Training 1/1 epoch (loss 2.7977): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 782/1250 [04:45<02:49, 2.76it/s] Training 1/1 epoch (loss 2.7977): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 783/1250 [04:45<02:52, 2.71it/s] Training 1/1 epoch (loss 2.5840): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 783/1250 [04:45<02:52, 2.71it/s] Training 1/1 epoch (loss 2.5840): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 784/1250 [04:45<02:46, 2.80it/s] Training 1/1 epoch (loss 2.9107): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 784/1250 [04:45<02:46, 2.80it/s] Training 1/1 epoch (loss 2.9107): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 785/1250 [04:45<02:43, 2.84it/s] Training 1/1 epoch (loss 2.8426): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 785/1250 [04:46<02:43, 2.84it/s] Training 1/1 epoch (loss 2.8426): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 786/1250 [04:46<02:44, 2.82it/s] Training 1/1 epoch (loss 2.8196): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 786/1250 [04:46<02:44, 2.82it/s] Training 1/1 epoch (loss 2.8196): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 787/1250 [04:46<03:43, 2.07it/s] Training 1/1 epoch (loss 2.5697): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 787/1250 [04:47<03:43, 2.07it/s] Training 1/1 epoch (loss 2.5697): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 788/1250 [04:47<03:25, 2.24it/s] Training 1/1 epoch (loss 2.5427): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 788/1250 [04:47<03:25, 2.24it/s] Training 1/1 epoch (loss 2.5427): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 789/1250 [04:47<03:04, 2.49it/s] Training 1/1 epoch (loss 2.6282): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 789/1250 [04:47<03:04, 2.49it/s] Training 1/1 epoch (loss 2.6282): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 790/1250 [04:47<02:56, 2.60it/s] Training 1/1 epoch (loss 2.7595): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 790/1250 [04:48<02:56, 2.60it/s] Training 1/1 epoch (loss 2.7595): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 791/1250 [04:48<02:51, 2.68it/s] Training 1/1 epoch (loss 2.7224): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 791/1250 [04:48<02:51, 2.68it/s] Training 1/1 epoch (loss 2.7224): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 792/1250 [04:48<02:45, 2.77it/s] Training 1/1 epoch (loss 2.5571): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 792/1250 [04:48<02:45, 2.77it/s] Training 1/1 epoch (loss 2.5571): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 793/1250 [04:48<02:44, 2.78it/s] Training 1/1 epoch (loss 2.7249): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 793/1250 [04:49<02:44, 2.78it/s] Training 1/1 epoch (loss 2.7249): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 794/1250 [04:49<02:44, 2.78it/s] Training 1/1 epoch (loss 2.8927): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 794/1250 [04:49<02:44, 2.78it/s] Training 1/1 epoch (loss 2.8927): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 795/1250 [04:49<02:36, 2.91it/s] Training 1/1 epoch (loss 2.6904): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 795/1250 [04:49<02:36, 2.91it/s] Training 1/1 epoch (loss 2.6904): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 796/1250 [04:49<02:30, 3.01it/s] Training 1/1 epoch (loss 2.6662): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 796/1250 [04:50<02:30, 3.01it/s] Training 1/1 epoch (loss 2.6662): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 797/1250 [04:50<02:33, 2.95it/s] Training 1/1 epoch (loss 2.8839): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 797/1250 [04:50<02:33, 2.95it/s] Training 1/1 epoch (loss 2.8839): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 798/1250 [04:50<03:00, 2.51it/s] Training 1/1 epoch (loss 2.8420): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 798/1250 [04:51<03:00, 2.51it/s] Training 1/1 epoch (loss 2.8420): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 799/1250 [04:51<02:51, 2.63it/s] Training 1/1 epoch (loss 2.7810): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 799/1250 [04:51<02:51, 2.63it/s] Training 1/1 epoch (loss 2.7810): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 800/1250 [04:51<02:48, 2.67it/s] Training 1/1 epoch (loss 2.7131): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 800/1250 [04:51<02:48, 2.67it/s] Training 1/1 epoch (loss 2.7131): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 801/1250 [04:51<02:42, 2.76it/s] Training 1/1 epoch (loss 2.8420): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 801/1250 [04:52<02:42, 2.76it/s] Training 1/1 epoch (loss 2.8420): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 802/1250 [04:52<02:39, 2.81it/s] Training 1/1 epoch (loss 2.7272): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 802/1250 [04:52<02:39, 2.81it/s] Training 1/1 epoch (loss 2.7272): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 803/1250 [04:52<02:34, 2.90it/s] Training 1/1 epoch (loss 2.6088): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 803/1250 [04:52<02:34, 2.90it/s] Training 1/1 epoch (loss 2.6088): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 804/1250 [04:52<02:31, 2.95it/s] Training 1/1 epoch (loss 2.7235): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 804/1250 [04:53<02:31, 2.95it/s] Training 1/1 epoch (loss 2.7235): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 805/1250 [04:53<02:30, 2.95it/s] Training 1/1 epoch (loss 2.7137): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 805/1250 [04:53<02:30, 2.95it/s] Training 1/1 epoch (loss 2.7137): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 806/1250 [04:53<03:05, 2.40it/s] Training 1/1 epoch (loss 2.7162): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 806/1250 [04:54<03:05, 2.40it/s] Training 1/1 epoch (loss 2.7162): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 807/1250 [04:54<02:57, 2.49it/s] Training 1/1 epoch (loss 2.6596): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 807/1250 [04:54<02:57, 2.49it/s] Training 1/1 epoch (loss 2.6596): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 808/1250 [04:54<02:54, 2.53it/s] Training 1/1 epoch (loss 2.6618): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 808/1250 [04:54<02:54, 2.53it/s] Training 1/1 epoch (loss 2.6618): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 809/1250 [04:54<02:56, 2.49it/s] Training 1/1 epoch (loss 2.8156): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 809/1250 [04:55<02:56, 2.49it/s] Training 1/1 epoch (loss 2.8156): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 810/1250 [04:55<02:49, 2.59it/s] Training 1/1 epoch (loss 2.7768): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 810/1250 [04:55<02:49, 2.59it/s] Training 1/1 epoch (loss 2.7768): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 811/1250 [04:55<02:42, 2.71it/s] Training 1/1 epoch (loss 2.7296): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 811/1250 [04:55<02:42, 2.71it/s] Training 1/1 epoch (loss 2.7296): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 812/1250 [04:55<02:38, 2.76it/s] Training 1/1 epoch (loss 2.6777): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 812/1250 [04:56<02:38, 2.76it/s] Training 1/1 epoch (loss 2.6777): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 813/1250 [04:56<02:32, 2.87it/s] Training 1/1 epoch (loss 2.7758): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 813/1250 [04:56<02:32, 2.87it/s] Training 1/1 epoch (loss 2.7758): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 814/1250 [04:56<02:43, 2.66it/s] Training 1/1 epoch (loss 2.8336): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 814/1250 [04:57<02:43, 2.66it/s] Training 1/1 epoch (loss 2.8336): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 815/1250 [04:57<02:40, 2.71it/s] Training 1/1 epoch (loss 2.8850): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 815/1250 [04:57<02:40, 2.71it/s] Training 1/1 epoch (loss 2.8850): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 816/1250 [04:57<02:34, 2.80it/s] Training 1/1 epoch (loss 2.8073): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 816/1250 [04:57<02:34, 2.80it/s] Training 1/1 epoch (loss 2.8073): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 817/1250 [04:57<02:31, 2.86it/s] Training 1/1 epoch (loss 2.6364): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 817/1250 [04:58<02:31, 2.86it/s] Training 1/1 epoch (loss 2.6364): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 818/1250 [04:58<02:52, 2.50it/s] Training 1/1 epoch (loss 2.9309): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 818/1250 [04:58<02:52, 2.50it/s] Training 1/1 epoch (loss 2.9309): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 819/1250 [04:58<02:54, 2.47it/s] Training 1/1 epoch (loss 2.6973): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 819/1250 [04:59<02:54, 2.47it/s] Training 1/1 epoch (loss 2.6973): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 820/1250 [04:59<03:02, 2.36it/s] Training 1/1 epoch (loss 2.6018): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 820/1250 [04:59<03:02, 2.36it/s] Training 1/1 epoch (loss 2.6018): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 821/1250 [04:59<02:56, 2.43it/s] Training 1/1 epoch (loss 2.5811): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 821/1250 [04:59<02:56, 2.43it/s] Training 1/1 epoch (loss 2.5811): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 822/1250 [04:59<02:48, 2.54it/s] Training 1/1 epoch (loss 2.6145): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 822/1250 [05:00<02:48, 2.54it/s] Training 1/1 epoch (loss 2.6145): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 823/1250 [05:00<02:44, 2.59it/s] Training 1/1 epoch (loss 2.6305): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 823/1250 [05:00<02:44, 2.59it/s] Training 1/1 epoch (loss 2.6305): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 824/1250 [05:00<02:44, 2.58it/s] Training 1/1 epoch (loss 2.7132): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 824/1250 [05:01<02:44, 2.58it/s] Training 1/1 epoch (loss 2.7132): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 825/1250 [05:01<02:38, 2.68it/s] Training 1/1 epoch (loss 2.8757): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 825/1250 [05:01<02:38, 2.68it/s] Training 1/1 epoch (loss 2.8757): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 826/1250 [05:01<02:31, 2.80it/s] Training 1/1 epoch (loss 2.7950): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 826/1250 [05:01<02:31, 2.80it/s] Training 1/1 epoch (loss 2.7950): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 827/1250 [05:01<02:32, 2.78it/s] Training 1/1 epoch (loss 2.6937): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 827/1250 [05:02<02:32, 2.78it/s] Training 1/1 epoch (loss 2.6937): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 828/1250 [05:02<02:27, 2.87it/s] Training 1/1 epoch (loss 2.7467): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 828/1250 [05:02<02:27, 2.87it/s] Training 1/1 epoch (loss 2.7467): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 829/1250 [05:02<02:26, 2.87it/s] Training 1/1 epoch (loss 2.6299): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 829/1250 [05:02<02:26, 2.87it/s] Training 1/1 epoch (loss 2.6299): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 830/1250 [05:02<02:29, 2.80it/s] Training 1/1 epoch (loss 2.7127): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 830/1250 [05:03<02:29, 2.80it/s] Training 1/1 epoch (loss 2.7127): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 831/1250 [05:03<02:24, 2.90it/s] Training 1/1 epoch (loss 2.8268): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 831/1250 [05:03<02:24, 2.90it/s] Training 1/1 epoch (loss 2.8268): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 832/1250 [05:03<02:30, 2.77it/s] Training 1/1 epoch (loss 2.6800): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 832/1250 [05:03<02:30, 2.77it/s] Training 1/1 epoch (loss 2.6800): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 833/1250 [05:03<02:25, 2.86it/s] Training 1/1 epoch (loss 2.8238): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 833/1250 [05:04<02:25, 2.86it/s] Training 1/1 epoch (loss 2.8238): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 834/1250 [05:04<02:27, 2.82it/s] Training 1/1 epoch (loss 2.9829): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 834/1250 [05:04<02:27, 2.82it/s] Training 1/1 epoch (loss 2.9829): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 835/1250 [05:04<02:33, 2.71it/s] Training 1/1 epoch (loss 2.7706): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 835/1250 [05:04<02:33, 2.71it/s] Training 1/1 epoch (loss 2.7706): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 836/1250 [05:04<02:27, 2.81it/s] Training 1/1 epoch (loss 2.7210): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 836/1250 [05:05<02:27, 2.81it/s] Training 1/1 epoch (loss 2.7210): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 837/1250 [05:05<02:21, 2.93it/s] Training 1/1 epoch (loss 2.8198): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 837/1250 [05:05<02:21, 2.93it/s] Training 1/1 epoch (loss 2.8198): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 838/1250 [05:05<02:19, 2.95it/s] Training 1/1 epoch (loss 2.8230): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 838/1250 [05:05<02:19, 2.95it/s] Training 1/1 epoch (loss 2.8230): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 839/1250 [05:05<02:25, 2.83it/s] Training 1/1 epoch (loss 2.5288): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 839/1250 [05:06<02:25, 2.83it/s] Training 1/1 epoch (loss 2.5288): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 840/1250 [05:06<02:21, 2.90it/s] Training 1/1 epoch (loss 2.7285): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 840/1250 [05:06<02:21, 2.90it/s] Training 1/1 epoch (loss 2.7285): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 841/1250 [05:06<02:26, 2.79it/s] Training 1/1 epoch (loss 2.8483): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 841/1250 [05:06<02:26, 2.79it/s] Training 1/1 epoch (loss 2.8483): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 842/1250 [05:06<02:28, 2.75it/s] Training 1/1 epoch (loss 2.6609): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 842/1250 [05:07<02:28, 2.75it/s] Training 1/1 epoch (loss 2.6609): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 843/1250 [05:07<02:20, 2.89it/s] Training 1/1 epoch (loss 2.8179): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 843/1250 [05:07<02:20, 2.89it/s] Training 1/1 epoch (loss 2.8179): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 844/1250 [05:07<02:23, 2.83it/s] Training 1/1 epoch (loss 2.7405): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 844/1250 [05:08<02:23, 2.83it/s] Training 1/1 epoch (loss 2.7405): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 845/1250 [05:08<02:21, 2.86it/s] Training 1/1 epoch (loss 2.8293): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 845/1250 [05:08<02:21, 2.86it/s] Training 1/1 epoch (loss 2.8293): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 846/1250 [05:08<02:16, 2.96it/s] Training 1/1 epoch (loss 2.6338): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 846/1250 [05:08<02:16, 2.96it/s] Training 1/1 epoch (loss 2.6338): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 847/1250 [05:08<02:13, 3.02it/s] Training 1/1 epoch (loss 2.7195): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 847/1250 [05:08<02:13, 3.02it/s] Training 1/1 epoch (loss 2.7195): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 848/1250 [05:08<02:13, 3.01it/s] Training 1/1 epoch (loss 2.7580): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 848/1250 [05:09<02:13, 3.01it/s] Training 1/1 epoch (loss 2.7580): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 849/1250 [05:09<02:16, 2.93it/s] Training 1/1 epoch (loss 2.8976): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 849/1250 [05:09<02:16, 2.93it/s] Training 1/1 epoch (loss 2.8976): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 850/1250 [05:09<02:15, 2.95it/s] Training 1/1 epoch (loss 2.6515): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 850/1250 [05:09<02:15, 2.95it/s] Training 1/1 epoch (loss 2.6515): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 851/1250 [05:09<02:14, 2.96it/s] Training 1/1 epoch (loss 2.5806): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 851/1250 [05:10<02:14, 2.96it/s] Training 1/1 epoch (loss 2.5806): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 852/1250 [05:10<02:11, 3.02it/s] Training 1/1 epoch (loss 2.9495): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 852/1250 [05:10<02:11, 3.02it/s] Training 1/1 epoch (loss 2.9495): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 853/1250 [05:10<02:09, 3.07it/s] Training 1/1 epoch (loss 2.6416): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 853/1250 [05:10<02:09, 3.07it/s] Training 1/1 epoch (loss 2.6416): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 854/1250 [05:10<02:14, 2.94it/s] Training 1/1 epoch (loss 2.8636): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 854/1250 [05:11<02:14, 2.94it/s] Training 1/1 epoch (loss 2.8636): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 855/1250 [05:11<02:10, 3.03it/s] Training 1/1 epoch (loss 2.4451): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 855/1250 [05:11<02:10, 3.03it/s] Training 1/1 epoch (loss 2.4451): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 856/1250 [05:11<02:13, 2.95it/s] Training 1/1 epoch (loss 2.7285): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 856/1250 [05:12<02:13, 2.95it/s] Training 1/1 epoch (loss 2.7285): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 857/1250 [05:12<02:18, 2.83it/s] Training 1/1 epoch (loss 2.6827): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 857/1250 [05:12<02:18, 2.83it/s] Training 1/1 epoch (loss 2.6827): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 858/1250 [05:12<02:13, 2.95it/s] Training 1/1 epoch (loss 2.6798): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 858/1250 [05:12<02:13, 2.95it/s] Training 1/1 epoch (loss 2.6798): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 859/1250 [05:12<02:14, 2.90it/s] Training 1/1 epoch (loss 2.7341): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 859/1250 [05:13<02:14, 2.90it/s] Training 1/1 epoch (loss 2.7341): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 860/1250 [05:13<02:12, 2.93it/s] Training 1/1 epoch (loss 2.8041): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 860/1250 [05:13<02:12, 2.93it/s] Training 1/1 epoch (loss 2.8041): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 861/1250 [05:13<02:10, 2.97it/s] Training 1/1 epoch (loss 2.7210): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 861/1250 [05:13<02:10, 2.97it/s] Training 1/1 epoch (loss 2.7210): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 862/1250 [05:13<02:10, 2.96it/s] Training 1/1 epoch (loss 2.7602): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 862/1250 [05:14<02:10, 2.96it/s] Training 1/1 epoch (loss 2.7602): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 863/1250 [05:14<02:12, 2.92it/s] Training 1/1 epoch (loss 2.7064): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 863/1250 [05:14<02:12, 2.92it/s] Training 1/1 epoch (loss 2.7064): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 864/1250 [05:14<02:14, 2.87it/s] Training 1/1 epoch (loss 2.6887): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 864/1250 [05:14<02:14, 2.87it/s] Training 1/1 epoch (loss 2.6887): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 865/1250 [05:14<02:18, 2.78it/s] Training 1/1 epoch (loss 2.6178): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 865/1250 [05:15<02:18, 2.78it/s] Training 1/1 epoch (loss 2.6178): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 866/1250 [05:15<02:20, 2.74it/s] Training 1/1 epoch (loss 2.7494): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 866/1250 [05:15<02:20, 2.74it/s] Training 1/1 epoch (loss 2.7494): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 867/1250 [05:15<02:14, 2.85it/s] Training 1/1 epoch (loss 2.9886): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 867/1250 [05:15<02:14, 2.85it/s] Training 1/1 epoch (loss 2.9886): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 868/1250 [05:15<02:08, 2.98it/s] Training 1/1 epoch (loss 2.5916): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 868/1250 [05:16<02:08, 2.98it/s] Training 1/1 epoch (loss 2.5916): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 869/1250 [05:16<02:11, 2.89it/s] Training 1/1 epoch (loss 2.5591): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 869/1250 [05:16<02:11, 2.89it/s] Training 1/1 epoch (loss 2.5591): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 870/1250 [05:16<02:08, 2.95it/s] Training 1/1 epoch (loss 2.6214): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 870/1250 [05:16<02:08, 2.95it/s] Training 1/1 epoch (loss 2.6214): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 871/1250 [05:16<02:04, 3.05it/s] Training 1/1 epoch (loss 2.7169): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 871/1250 [05:17<02:04, 3.05it/s] Training 1/1 epoch (loss 2.7169): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 872/1250 [05:17<02:13, 2.83it/s] Training 1/1 epoch (loss 2.9122): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 872/1250 [05:17<02:13, 2.83it/s] Training 1/1 epoch (loss 2.9122): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 873/1250 [05:17<02:15, 2.79it/s] Training 1/1 epoch (loss 2.6920): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 873/1250 [05:17<02:15, 2.79it/s] Training 1/1 epoch (loss 2.6920): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 874/1250 [05:17<02:11, 2.85it/s] Training 1/1 epoch (loss 2.7603): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 874/1250 [05:18<02:11, 2.85it/s] Training 1/1 epoch (loss 2.7603): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 875/1250 [05:18<02:21, 2.66it/s] Training 1/1 epoch (loss 2.8047): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 875/1250 [05:18<02:21, 2.66it/s] Training 1/1 epoch (loss 2.8047): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 876/1250 [05:18<02:16, 2.73it/s] Training 1/1 epoch (loss 2.8655): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 876/1250 [05:19<02:16, 2.73it/s] Training 1/1 epoch (loss 2.8655): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 877/1250 [05:19<02:16, 2.74it/s] Training 1/1 epoch (loss 2.9794): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 877/1250 [05:19<02:16, 2.74it/s] Training 1/1 epoch (loss 2.9794): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 878/1250 [05:19<02:08, 2.90it/s] Training 1/1 epoch (loss 2.7757): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 878/1250 [05:19<02:08, 2.90it/s] Training 1/1 epoch (loss 2.7757): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 879/1250 [05:19<02:10, 2.84it/s] Training 1/1 epoch (loss 2.8580): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 879/1250 [05:20<02:10, 2.84it/s] Training 1/1 epoch (loss 2.8580): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 880/1250 [05:20<02:13, 2.78it/s] Training 1/1 epoch (loss 2.6340): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 880/1250 [05:20<02:13, 2.78it/s] Training 1/1 epoch (loss 2.6340): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 881/1250 [05:20<02:17, 2.68it/s] Training 1/1 epoch (loss 2.7345): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 881/1250 [05:20<02:17, 2.68it/s] Training 1/1 epoch (loss 2.7345): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 882/1250 [05:20<02:14, 2.73it/s] Training 1/1 epoch (loss 2.8713): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 882/1250 [05:21<02:14, 2.73it/s] Training 1/1 epoch (loss 2.8713): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 883/1250 [05:21<02:12, 2.78it/s] Training 1/1 epoch (loss 2.8173): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 883/1250 [05:21<02:12, 2.78it/s] Training 1/1 epoch (loss 2.8173): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 884/1250 [05:21<02:08, 2.86it/s] Training 1/1 epoch (loss 2.6738): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 884/1250 [05:21<02:08, 2.86it/s] Training 1/1 epoch (loss 2.6738): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 885/1250 [05:21<02:04, 2.92it/s] Training 1/1 epoch (loss 2.7872): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 885/1250 [05:22<02:04, 2.92it/s] Training 1/1 epoch (loss 2.7872): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 886/1250 [05:22<02:02, 2.96it/s] Training 1/1 epoch (loss 2.7190): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 886/1250 [05:22<02:02, 2.96it/s] Training 1/1 epoch (loss 2.7190): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 887/1250 [05:22<02:00, 3.01it/s] Training 1/1 epoch (loss 2.6603): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 887/1250 [05:22<02:00, 3.01it/s] Training 1/1 epoch (loss 2.6603): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 888/1250 [05:22<02:03, 2.92it/s] Training 1/1 epoch (loss 2.7479): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 888/1250 [05:23<02:03, 2.92it/s] Training 1/1 epoch (loss 2.7479): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 889/1250 [05:23<02:02, 2.94it/s] Training 1/1 epoch (loss 2.7355): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 889/1250 [05:23<02:02, 2.94it/s] Training 1/1 epoch (loss 2.7355): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 890/1250 [05:23<02:11, 2.73it/s] Training 1/1 epoch (loss 2.8962): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 890/1250 [05:23<02:11, 2.73it/s] Training 1/1 epoch (loss 2.8962): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 891/1250 [05:23<02:09, 2.76it/s] Training 1/1 epoch (loss 2.7229): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 891/1250 [05:24<02:09, 2.76it/s] Training 1/1 epoch (loss 2.7229): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 892/1250 [05:24<02:13, 2.68it/s] Training 1/1 epoch (loss 2.5737): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 892/1250 [05:24<02:13, 2.68it/s] Training 1/1 epoch (loss 2.5737): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 893/1250 [05:24<02:14, 2.66it/s] Training 1/1 epoch (loss 2.5473): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 893/1250 [05:25<02:14, 2.66it/s] Training 1/1 epoch (loss 2.5473): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 894/1250 [05:25<02:13, 2.66it/s] Training 1/1 epoch (loss 3.0634): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 894/1250 [05:25<02:13, 2.66it/s] Training 1/1 epoch (loss 3.0634): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 895/1250 [05:25<02:08, 2.76it/s] Training 1/1 epoch (loss 2.4401): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 895/1250 [05:25<02:08, 2.76it/s] Training 1/1 epoch (loss 2.4401): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 896/1250 [05:25<02:03, 2.87it/s] Training 1/1 epoch (loss 2.6763): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 896/1250 [05:26<02:03, 2.87it/s] Training 1/1 epoch (loss 2.6763): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 897/1250 [05:26<02:04, 2.84it/s] Training 1/1 epoch (loss 2.8189): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 897/1250 [05:26<02:04, 2.84it/s] Training 1/1 epoch (loss 2.8189): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 898/1250 [05:26<02:05, 2.82it/s] Training 1/1 epoch (loss 2.7895): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 898/1250 [05:26<02:05, 2.82it/s] Training 1/1 epoch (loss 2.7895): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 899/1250 [05:26<02:07, 2.74it/s] Training 1/1 epoch (loss 2.9235): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 899/1250 [05:27<02:07, 2.74it/s] Training 1/1 epoch (loss 2.9235): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 900/1250 [05:27<02:10, 2.67it/s] Training 1/1 epoch (loss 2.8328): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 900/1250 [05:27<02:10, 2.67it/s] Training 1/1 epoch (loss 2.8328): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 901/1250 [05:27<02:05, 2.77it/s] Training 1/1 epoch (loss 2.7728): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 901/1250 [05:27<02:05, 2.77it/s] Training 1/1 epoch (loss 2.7728): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 902/1250 [05:27<01:58, 2.93it/s] Training 1/1 epoch (loss 2.7290): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 902/1250 [05:28<01:58, 2.93it/s] Training 1/1 epoch (loss 2.7290): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 903/1250 [05:28<02:13, 2.60it/s] Training 1/1 epoch (loss 2.7892): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 903/1250 [05:28<02:13, 2.60it/s] Training 1/1 epoch (loss 2.7892): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 904/1250 [05:28<02:28, 2.33it/s] Training 1/1 epoch (loss 2.6401): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 904/1250 [05:29<02:28, 2.33it/s] Training 1/1 epoch (loss 2.6401): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 905/1250 [05:29<02:19, 2.48it/s] Training 1/1 epoch (loss 2.6162): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 905/1250 [05:29<02:19, 2.48it/s] Training 1/1 epoch (loss 2.6162): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 906/1250 [05:29<02:11, 2.62it/s] Training 1/1 epoch (loss 2.5979): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 906/1250 [05:29<02:11, 2.62it/s] Training 1/1 epoch (loss 2.5979): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 907/1250 [05:29<02:09, 2.64it/s] Training 1/1 epoch (loss 2.5807): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 907/1250 [05:30<02:09, 2.64it/s] Training 1/1 epoch (loss 2.5807): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 908/1250 [05:30<02:06, 2.69it/s] Training 1/1 epoch (loss 2.8714): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 908/1250 [05:30<02:06, 2.69it/s] Training 1/1 epoch (loss 2.8714): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 909/1250 [05:30<02:03, 2.76it/s] Training 1/1 epoch (loss 2.6463): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 909/1250 [05:30<02:03, 2.76it/s] Training 1/1 epoch (loss 2.6463): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 910/1250 [05:30<01:58, 2.87it/s] Training 1/1 epoch (loss 2.5270): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 910/1250 [05:31<01:58, 2.87it/s] Training 1/1 epoch (loss 2.5270): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 911/1250 [05:31<01:58, 2.85it/s] Training 1/1 epoch (loss 2.7916): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 911/1250 [05:31<01:58, 2.85it/s] Training 1/1 epoch (loss 2.7916): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 912/1250 [05:31<01:58, 2.86it/s] Training 1/1 epoch (loss 2.5609): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 912/1250 [05:32<01:58, 2.86it/s] Training 1/1 epoch (loss 2.5609): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 913/1250 [05:32<02:04, 2.70it/s] Training 1/1 epoch (loss 2.6735): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 913/1250 [05:32<02:04, 2.70it/s] Training 1/1 epoch (loss 2.6735): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 914/1250 [05:32<02:01, 2.77it/s] Training 1/1 epoch (loss 2.8230): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 914/1250 [05:32<02:01, 2.77it/s] Training 1/1 epoch (loss 2.8230): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 915/1250 [05:32<02:03, 2.71it/s] Training 1/1 epoch (loss 2.7885): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 915/1250 [05:33<02:03, 2.71it/s] Training 1/1 epoch (loss 2.7885): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 916/1250 [05:33<02:05, 2.66it/s] Training 1/1 epoch (loss 2.5354): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 916/1250 [05:33<02:05, 2.66it/s] Training 1/1 epoch (loss 2.5354): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 917/1250 [05:33<02:02, 2.71it/s] Training 1/1 epoch (loss 2.9062): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 917/1250 [05:33<02:02, 2.71it/s] Training 1/1 epoch (loss 2.9062): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 918/1250 [05:33<01:56, 2.85it/s] Training 1/1 epoch (loss 2.6448): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 918/1250 [05:34<01:56, 2.85it/s] Training 1/1 epoch (loss 2.6448): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 919/1250 [05:34<01:58, 2.79it/s] Training 1/1 epoch (loss 2.8303): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 919/1250 [05:34<01:58, 2.79it/s] Training 1/1 epoch (loss 2.8303): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 920/1250 [05:34<02:04, 2.65it/s] Training 1/1 epoch (loss 2.7136): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 920/1250 [05:35<02:04, 2.65it/s] Training 1/1 epoch (loss 2.7136): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 921/1250 [05:35<02:05, 2.62it/s] Training 1/1 epoch (loss 2.6686): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 921/1250 [05:35<02:05, 2.62it/s] Training 1/1 epoch (loss 2.6686): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 922/1250 [05:35<01:59, 2.75it/s] Training 1/1 epoch (loss 2.6389): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 922/1250 [05:35<01:59, 2.75it/s] Training 1/1 epoch (loss 2.6389): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 923/1250 [05:35<01:58, 2.76it/s] Training 1/1 epoch (loss 2.9130): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 923/1250 [05:36<01:58, 2.76it/s] Training 1/1 epoch (loss 2.9130): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 924/1250 [05:36<01:53, 2.88it/s] Training 1/1 epoch (loss 2.7648): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 924/1250 [05:36<01:53, 2.88it/s] Training 1/1 epoch (loss 2.7648): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 925/1250 [05:36<01:50, 2.93it/s] Training 1/1 epoch (loss 2.7511): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 925/1250 [05:36<01:50, 2.93it/s] Training 1/1 epoch (loss 2.7511): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 926/1250 [05:36<01:52, 2.87it/s] Training 1/1 epoch (loss 2.6432): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 926/1250 [05:37<01:52, 2.87it/s] Training 1/1 epoch (loss 2.6432): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 927/1250 [05:37<01:51, 2.90it/s] Training 1/1 epoch (loss 2.9608): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 927/1250 [05:37<01:51, 2.90it/s] Training 1/1 epoch (loss 2.9608): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 928/1250 [05:37<01:59, 2.69it/s] Training 1/1 epoch (loss 2.8336): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 928/1250 [05:37<01:59, 2.69it/s] Training 1/1 epoch (loss 2.8336): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 929/1250 [05:37<02:01, 2.65it/s] Training 1/1 epoch (loss 2.9421): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 929/1250 [05:38<02:01, 2.65it/s] Training 1/1 epoch (loss 2.9421): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 930/1250 [05:38<01:55, 2.76it/s] Training 1/1 epoch (loss 2.8252): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 930/1250 [05:38<01:55, 2.76it/s] Training 1/1 epoch (loss 2.8252): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 931/1250 [05:38<01:52, 2.83it/s] Training 1/1 epoch (loss 2.8008): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 931/1250 [05:38<01:52, 2.83it/s] Training 1/1 epoch (loss 2.8008): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 932/1250 [05:38<01:51, 2.86it/s] Training 1/1 epoch (loss 2.7679): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 932/1250 [05:39<01:51, 2.86it/s] Training 1/1 epoch (loss 2.7679): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 933/1250 [05:39<01:51, 2.84it/s] Training 1/1 epoch (loss 2.6153): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 933/1250 [05:39<01:51, 2.84it/s] Training 1/1 epoch (loss 2.6153): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 934/1250 [05:39<01:49, 2.89it/s] Training 1/1 epoch (loss 2.6405): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 934/1250 [05:39<01:49, 2.89it/s] Training 1/1 epoch (loss 2.6405): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 935/1250 [05:39<01:49, 2.88it/s] Training 1/1 epoch (loss 2.7853): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 935/1250 [05:40<01:49, 2.88it/s] Training 1/1 epoch (loss 2.7853): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 936/1250 [05:40<01:50, 2.84it/s] Training 1/1 epoch (loss 2.5449): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 936/1250 [05:40<01:50, 2.84it/s] Training 1/1 epoch (loss 2.5449): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 937/1250 [05:40<01:49, 2.86it/s] Training 1/1 epoch (loss 2.6389): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 937/1250 [05:41<01:49, 2.86it/s] Training 1/1 epoch (loss 2.6389): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 938/1250 [05:41<01:52, 2.77it/s] Training 1/1 epoch (loss 2.7939): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 938/1250 [05:41<01:52, 2.77it/s] Training 1/1 epoch (loss 2.7939): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 939/1250 [05:41<01:46, 2.91it/s] Training 1/1 epoch (loss 2.7351): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 939/1250 [05:41<01:46, 2.91it/s] Training 1/1 epoch (loss 2.7351): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 940/1250 [05:41<01:45, 2.93it/s] Training 1/1 epoch (loss 2.7076): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 940/1250 [05:42<01:45, 2.93it/s] Training 1/1 epoch (loss 2.7076): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 941/1250 [05:42<01:47, 2.89it/s] Training 1/1 epoch (loss 2.7065): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 941/1250 [05:42<01:47, 2.89it/s] Training 1/1 epoch (loss 2.7065): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 942/1250 [05:42<01:48, 2.85it/s] Training 1/1 epoch (loss 2.5334): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 942/1250 [05:42<01:48, 2.85it/s] Training 1/1 epoch (loss 2.5334): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 943/1250 [05:42<01:51, 2.75it/s] Training 1/1 epoch (loss 2.4604): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 943/1250 [05:43<01:51, 2.75it/s] Training 1/1 epoch (loss 2.4604): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 944/1250 [05:43<01:50, 2.77it/s] Training 1/1 epoch (loss 2.7037): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 944/1250 [05:43<01:50, 2.77it/s] Training 1/1 epoch (loss 2.7037): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 945/1250 [05:43<01:49, 2.79it/s] Training 1/1 epoch (loss 2.7576): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 945/1250 [05:43<01:49, 2.79it/s] Training 1/1 epoch (loss 2.7576): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 946/1250 [05:43<01:49, 2.78it/s] Training 1/1 epoch (loss 2.6594): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 946/1250 [05:44<01:49, 2.78it/s] Training 1/1 epoch (loss 2.6594): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 947/1250 [05:44<01:51, 2.73it/s] Training 1/1 epoch (loss 2.6088): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 947/1250 [05:44<01:51, 2.73it/s] Training 1/1 epoch (loss 2.6088): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 948/1250 [05:44<01:48, 2.79it/s] Training 1/1 epoch (loss 2.7690): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 948/1250 [05:45<01:48, 2.79it/s] Training 1/1 epoch (loss 2.7690): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 949/1250 [05:45<01:53, 2.65it/s] Training 1/1 epoch (loss 2.5939): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 949/1250 [05:45<01:53, 2.65it/s] Training 1/1 epoch (loss 2.5939): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 950/1250 [05:45<01:52, 2.66it/s] Training 1/1 epoch (loss 2.5322): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 950/1250 [05:45<01:52, 2.66it/s] Training 1/1 epoch (loss 2.5322): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 951/1250 [05:45<01:47, 2.78it/s] Training 1/1 epoch (loss 2.8348): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 951/1250 [05:46<01:47, 2.78it/s] Training 1/1 epoch (loss 2.8348): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 952/1250 [05:46<01:46, 2.79it/s] Training 1/1 epoch (loss 2.5795): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 952/1250 [05:46<01:46, 2.79it/s] Training 1/1 epoch (loss 2.5795): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 953/1250 [05:46<01:46, 2.78it/s] Training 1/1 epoch (loss 2.6889): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 953/1250 [05:46<01:46, 2.78it/s] Training 1/1 epoch (loss 2.6889): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 954/1250 [05:46<01:47, 2.76it/s] Training 1/1 epoch (loss 2.6685): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 954/1250 [05:47<01:47, 2.76it/s] Training 1/1 epoch (loss 2.6685): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 955/1250 [05:47<01:46, 2.76it/s] Training 1/1 epoch (loss 2.7345): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 955/1250 [05:47<01:46, 2.76it/s] Training 1/1 epoch (loss 2.7345): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 956/1250 [05:47<01:41, 2.91it/s] Training 1/1 epoch (loss 2.9136): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 956/1250 [05:47<01:41, 2.91it/s] Training 1/1 epoch (loss 2.9136): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 957/1250 [05:47<01:47, 2.73it/s] Training 1/1 epoch (loss 2.6790): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 957/1250 [05:48<01:47, 2.73it/s] Training 1/1 epoch (loss 2.6790): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 958/1250 [05:48<01:47, 2.72it/s] Training 1/1 epoch (loss 2.9678): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 958/1250 [05:48<01:47, 2.72it/s] Training 1/1 epoch (loss 2.9678): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 959/1250 [05:48<01:44, 2.78it/s] Training 1/1 epoch (loss 2.4710): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 959/1250 [05:48<01:44, 2.78it/s] Training 1/1 epoch (loss 2.4710): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 960/1250 [05:48<01:44, 2.79it/s] Training 1/1 epoch (loss 2.4875): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 960/1250 [05:49<01:44, 2.79it/s] Training 1/1 epoch (loss 2.4875): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 961/1250 [05:49<01:43, 2.80it/s] Training 1/1 epoch (loss 2.4818): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 961/1250 [05:49<01:43, 2.80it/s] Training 1/1 epoch (loss 2.4818): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 962/1250 [05:49<01:41, 2.85it/s] Training 1/1 epoch (loss 2.8613): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 962/1250 [05:49<01:41, 2.85it/s] Training 1/1 epoch (loss 2.8613): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 963/1250 [05:49<01:39, 2.90it/s] Training 1/1 epoch (loss 2.8231): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 963/1250 [05:50<01:39, 2.90it/s] Training 1/1 epoch (loss 2.8231): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 964/1250 [05:50<01:38, 2.90it/s] Training 1/1 epoch (loss 2.4747): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 964/1250 [05:50<01:38, 2.90it/s] Training 1/1 epoch (loss 2.4747): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 965/1250 [05:50<01:36, 2.95it/s] Training 1/1 epoch (loss 2.6964): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 965/1250 [05:50<01:36, 2.95it/s] Training 1/1 epoch (loss 2.6964): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 966/1250 [05:51<01:36, 2.93it/s] Training 1/1 epoch (loss 2.6514): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 966/1250 [05:51<01:36, 2.93it/s] Training 1/1 epoch (loss 2.6514): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 967/1250 [05:51<01:33, 3.02it/s] Training 1/1 epoch (loss 2.7059): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 967/1250 [05:51<01:33, 3.02it/s] Training 1/1 epoch (loss 2.7059): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 968/1250 [05:51<01:35, 2.95it/s] Training 1/1 epoch (loss 3.0112): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 968/1250 [05:52<01:35, 2.95it/s] Training 1/1 epoch (loss 3.0112): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 969/1250 [05:52<01:37, 2.87it/s] Training 1/1 epoch (loss 2.6039): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 969/1250 [05:52<01:37, 2.87it/s] Training 1/1 epoch (loss 2.6039): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 970/1250 [05:52<01:35, 2.94it/s] Training 1/1 epoch (loss 3.0167): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 970/1250 [05:52<01:35, 2.94it/s] Training 1/1 epoch (loss 3.0167): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 971/1250 [05:52<01:35, 2.91it/s] Training 1/1 epoch (loss 2.7090): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 971/1250 [05:53<01:35, 2.91it/s] Training 1/1 epoch (loss 2.7090): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 972/1250 [05:53<01:39, 2.79it/s] Training 1/1 epoch (loss 2.7744): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 972/1250 [05:53<01:39, 2.79it/s] Training 1/1 epoch (loss 2.7744): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 973/1250 [05:53<01:55, 2.39it/s] Training 1/1 epoch (loss 2.5755): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 973/1250 [05:54<01:55, 2.39it/s] Training 1/1 epoch (loss 2.5755): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 974/1250 [05:54<01:51, 2.47it/s] Training 1/1 epoch (loss 2.6999): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 974/1250 [05:54<01:51, 2.47it/s] Training 1/1 epoch (loss 2.6999): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 975/1250 [05:54<01:48, 2.52it/s] Training 1/1 epoch (loss 2.5906): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 975/1250 [05:54<01:48, 2.52it/s] Training 1/1 epoch (loss 2.5906): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 976/1250 [05:54<01:50, 2.48it/s] Training 1/1 epoch (loss 2.7943): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 976/1250 [05:55<01:50, 2.48it/s] Training 1/1 epoch (loss 2.7943): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 977/1250 [05:55<01:47, 2.54it/s] Training 1/1 epoch (loss 2.7930): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 977/1250 [05:55<01:47, 2.54it/s] Training 1/1 epoch (loss 2.7930): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 978/1250 [05:55<01:43, 2.63it/s] Training 1/1 epoch (loss 2.9794): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 978/1250 [05:55<01:43, 2.63it/s] Training 1/1 epoch (loss 2.9794): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 979/1250 [05:55<01:40, 2.71it/s] Training 1/1 epoch (loss 2.7873): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 979/1250 [05:56<01:40, 2.71it/s] Training 1/1 epoch (loss 2.7873): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 980/1250 [05:56<01:39, 2.73it/s] Training 1/1 epoch (loss 2.7252): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 980/1250 [05:56<01:39, 2.73it/s] Training 1/1 epoch (loss 2.7252): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 981/1250 [05:56<01:34, 2.83it/s] Training 1/1 epoch (loss 2.7612): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 981/1250 [05:56<01:34, 2.83it/s] Training 1/1 epoch (loss 2.7612): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 982/1250 [05:56<01:35, 2.82it/s] Training 1/1 epoch (loss 2.7707): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 982/1250 [05:57<01:35, 2.82it/s] Training 1/1 epoch (loss 2.7707): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 983/1250 [05:57<01:34, 2.81it/s] Training 1/1 epoch (loss 2.7729): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 983/1250 [05:57<01:34, 2.81it/s] Training 1/1 epoch (loss 2.7729): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 984/1250 [05:57<01:31, 2.89it/s] Training 1/1 epoch (loss 2.8742): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 984/1250 [05:57<01:31, 2.89it/s] Training 1/1 epoch (loss 2.8742): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 985/1250 [05:57<01:34, 2.81it/s] Training 1/1 epoch (loss 2.5919): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 985/1250 [05:58<01:34, 2.81it/s] Training 1/1 epoch (loss 2.5919): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 986/1250 [05:58<01:53, 2.32it/s] Training 1/1 epoch (loss 2.7192): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 986/1250 [05:59<01:53, 2.32it/s] Training 1/1 epoch (loss 2.7192): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 987/1250 [05:59<01:51, 2.35it/s] Training 1/1 epoch (loss 2.6991): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 987/1250 [05:59<01:51, 2.35it/s] Training 1/1 epoch (loss 2.6991): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 988/1250 [05:59<01:44, 2.51it/s] Training 1/1 epoch (loss 2.9396): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 988/1250 [05:59<01:44, 2.51it/s] Training 1/1 epoch (loss 2.9396): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 989/1250 [05:59<01:35, 2.72it/s] Training 1/1 epoch (loss 2.6126): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 989/1250 [06:00<01:35, 2.72it/s] Training 1/1 epoch (loss 2.6126): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 990/1250 [06:00<01:35, 2.72it/s] Training 1/1 epoch (loss 2.6447): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 990/1250 [06:00<01:35, 2.72it/s] Training 1/1 epoch (loss 2.6447): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 991/1250 [06:00<01:32, 2.79it/s] Training 1/1 epoch (loss 2.3736): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 991/1250 [06:00<01:32, 2.79it/s] Training 1/1 epoch (loss 2.3736): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 992/1250 [06:00<01:31, 2.82it/s] Training 1/1 epoch (loss 2.6367): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 992/1250 [06:01<01:31, 2.82it/s] Training 1/1 epoch (loss 2.6367): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 993/1250 [06:01<01:29, 2.88it/s] Training 1/1 epoch (loss 2.5600): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 993/1250 [06:01<01:29, 2.88it/s] Training 1/1 epoch (loss 2.5600): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 994/1250 [06:01<01:27, 2.94it/s] Training 1/1 epoch (loss 2.4748): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 994/1250 [06:01<01:27, 2.94it/s] Training 1/1 epoch (loss 2.4748): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 995/1250 [06:01<01:26, 2.94it/s] Training 1/1 epoch (loss 2.6628): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 995/1250 [06:02<01:26, 2.94it/s] Training 1/1 epoch (loss 2.6628): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 996/1250 [06:02<01:25, 2.98it/s] Training 1/1 epoch (loss 2.8017): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 996/1250 [06:02<01:25, 2.98it/s] Training 1/1 epoch (loss 2.8017): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 997/1250 [06:02<01:24, 2.99it/s] Training 1/1 epoch (loss 2.7231): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 997/1250 [06:02<01:24, 2.99it/s] Training 1/1 epoch (loss 2.7231): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 998/1250 [06:02<01:25, 2.95it/s] Training 1/1 epoch (loss 2.6633): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 998/1250 [06:03<01:25, 2.95it/s] Training 1/1 epoch (loss 2.6633): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 999/1250 [06:03<01:26, 2.90it/s] Training 1/1 epoch (loss 2.6786): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 999/1250 [06:03<01:26, 2.90it/s] Training 1/1 epoch (loss 2.6786): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1000/1250 [06:03<01:26, 2.88it/s] Training 1/1 epoch (loss 2.7935): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1000/1250 [06:03<01:26, 2.88it/s] Training 1/1 epoch (loss 2.7935): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1001/1250 [06:03<01:34, 2.63it/s] Training 1/1 epoch (loss 2.5360): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1001/1250 [06:04<01:34, 2.63it/s] Training 1/1 epoch (loss 2.5360): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1002/1250 [06:04<01:30, 2.76it/s] Training 1/1 epoch (loss 2.6398): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1002/1250 [06:04<01:30, 2.76it/s] Training 1/1 epoch (loss 2.6398): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1003/1250 [06:04<01:29, 2.77it/s] Training 1/1 epoch (loss 2.5722): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1003/1250 [06:04<01:29, 2.77it/s] Training 1/1 epoch (loss 2.5722): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1004/1250 [06:04<01:30, 2.73it/s] Training 1/1 epoch (loss 2.8345): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1004/1250 [06:05<01:30, 2.73it/s] Training 1/1 epoch (loss 2.8345): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1005/1250 [06:05<01:27, 2.79it/s] Training 1/1 epoch (loss 2.7330): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1005/1250 [06:05<01:27, 2.79it/s] Training 1/1 epoch (loss 2.7330): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1006/1250 [06:05<01:26, 2.83it/s] Training 1/1 epoch (loss 2.7918): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1006/1250 [06:05<01:26, 2.83it/s] Training 1/1 epoch (loss 2.7918): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1007/1250 [06:05<01:24, 2.87it/s] Training 1/1 epoch (loss 2.7334): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1007/1250 [06:06<01:24, 2.87it/s] Training 1/1 epoch (loss 2.7334): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1008/1250 [06:06<01:22, 2.92it/s] Training 1/1 epoch (loss 2.8525): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1008/1250 [06:06<01:22, 2.92it/s] Training 1/1 epoch (loss 2.8525): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1009/1250 [06:06<01:21, 2.94it/s] Training 1/1 epoch (loss 2.7950): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1009/1250 [06:06<01:21, 2.94it/s] Training 1/1 epoch (loss 2.7950): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1010/1250 [06:06<01:21, 2.94it/s] Training 1/1 epoch (loss 2.8791): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1010/1250 [06:07<01:21, 2.94it/s] Training 1/1 epoch (loss 2.8791): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1011/1250 [06:07<01:22, 2.90it/s] Training 1/1 epoch (loss 2.7453): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1011/1250 [06:07<01:22, 2.90it/s] Training 1/1 epoch (loss 2.7453): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1012/1250 [06:07<01:21, 2.91it/s] Training 1/1 epoch (loss 2.5717): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1012/1250 [06:07<01:21, 2.91it/s] Training 1/1 epoch (loss 2.5717): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1013/1250 [06:07<01:21, 2.90it/s] Training 1/1 epoch (loss 2.5567): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1013/1250 [06:08<01:21, 2.90it/s] Training 1/1 epoch (loss 2.5567): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1014/1250 [06:08<01:23, 2.82it/s] Training 1/1 epoch (loss 2.6503): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1014/1250 [06:08<01:23, 2.82it/s] Training 1/1 epoch (loss 2.6503): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1015/1250 [06:08<01:22, 2.85it/s] Training 1/1 epoch (loss 2.7187): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1015/1250 [06:09<01:22, 2.85it/s] Training 1/1 epoch (loss 2.7187): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1016/1250 [06:09<01:27, 2.67it/s] Training 1/1 epoch (loss 2.7622): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1016/1250 [06:09<01:27, 2.67it/s] Training 1/1 epoch (loss 2.7622): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1017/1250 [06:09<01:24, 2.75it/s] Training 1/1 epoch (loss 2.7984): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1017/1250 [06:09<01:24, 2.75it/s] Training 1/1 epoch (loss 2.7984): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1018/1250 [06:09<01:20, 2.88it/s] Training 1/1 epoch (loss 2.7525): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1018/1250 [06:10<01:20, 2.88it/s] Training 1/1 epoch (loss 2.7525): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1019/1250 [06:10<01:21, 2.84it/s] Training 1/1 epoch (loss 2.9155): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1019/1250 [06:10<01:21, 2.84it/s] Training 1/1 epoch (loss 2.9155): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1020/1250 [06:10<01:22, 2.78it/s] Training 1/1 epoch (loss 2.6640): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1020/1250 [06:10<01:22, 2.78it/s] Training 1/1 epoch (loss 2.6640): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1021/1250 [06:10<01:21, 2.81it/s] Training 1/1 epoch (loss 2.9314): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1021/1250 [06:11<01:21, 2.81it/s] Training 1/1 epoch (loss 2.9314): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1022/1250 [06:11<01:24, 2.69it/s] Training 1/1 epoch (loss 2.6192): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1022/1250 [06:11<01:24, 2.69it/s] Training 1/1 epoch (loss 2.6192): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1023/1250 [06:11<01:22, 2.74it/s] Training 1/1 epoch (loss 2.6696): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1023/1250 [06:12<01:22, 2.74it/s] Training 1/1 epoch (loss 2.6696): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1024/1250 [06:12<01:23, 2.71it/s] Training 1/1 epoch (loss 2.5941): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1024/1250 [06:12<01:23, 2.71it/s] Training 1/1 epoch (loss 2.5941): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1025/1250 [06:12<01:24, 2.67it/s] Training 1/1 epoch (loss 2.7293): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1025/1250 [06:12<01:24, 2.67it/s] Training 1/1 epoch (loss 2.7293): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1026/1250 [06:12<01:24, 2.66it/s] Training 1/1 epoch (loss 2.6182): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1026/1250 [06:13<01:24, 2.66it/s] Training 1/1 epoch (loss 2.6182): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1027/1250 [06:13<01:23, 2.67it/s] Training 1/1 epoch (loss 2.6089): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1027/1250 [06:13<01:23, 2.67it/s] Training 1/1 epoch (loss 2.6089): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1028/1250 [06:13<01:23, 2.66it/s] Training 1/1 epoch (loss 2.7368): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1028/1250 [06:13<01:23, 2.66it/s] Training 1/1 epoch (loss 2.7368): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1029/1250 [06:13<01:25, 2.60it/s] Training 1/1 epoch (loss 2.7922): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1029/1250 [06:14<01:25, 2.60it/s] Training 1/1 epoch (loss 2.7922): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1030/1250 [06:14<01:27, 2.51it/s] Training 1/1 epoch (loss 2.7183): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1030/1250 [06:14<01:27, 2.51it/s] Training 1/1 epoch (loss 2.7183): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1031/1250 [06:14<01:25, 2.55it/s] Training 1/1 epoch (loss 2.7416): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1031/1250 [06:15<01:25, 2.55it/s] Training 1/1 epoch (loss 2.7416): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1032/1250 [06:15<01:29, 2.44it/s] Training 1/1 epoch (loss 2.5828): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1032/1250 [06:15<01:29, 2.44it/s] Training 1/1 epoch (loss 2.5828): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1033/1250 [06:15<01:28, 2.44it/s] Training 1/1 epoch (loss 2.5607): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1033/1250 [06:15<01:28, 2.44it/s] Training 1/1 epoch (loss 2.5607): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1034/1250 [06:15<01:25, 2.54it/s] Training 1/1 epoch (loss 2.5121): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1034/1250 [06:16<01:25, 2.54it/s] Training 1/1 epoch (loss 2.5121): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1035/1250 [06:16<01:20, 2.68it/s] Training 1/1 epoch (loss 2.6123): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1035/1250 [06:16<01:20, 2.68it/s] Training 1/1 epoch (loss 2.6123): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1036/1250 [06:16<01:16, 2.81it/s] Training 1/1 epoch (loss 2.8178): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1036/1250 [06:16<01:16, 2.81it/s] Training 1/1 epoch (loss 2.8178): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1037/1250 [06:16<01:12, 2.95it/s] Training 1/1 epoch (loss 2.5709): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1037/1250 [06:17<01:12, 2.95it/s] Training 1/1 epoch (loss 2.5709): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1038/1250 [06:17<01:14, 2.86it/s] Training 1/1 epoch (loss 2.4821): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1038/1250 [06:17<01:14, 2.86it/s] Training 1/1 epoch (loss 2.4821): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1039/1250 [06:17<01:13, 2.87it/s] Training 1/1 epoch (loss 2.5646): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1039/1250 [06:18<01:13, 2.87it/s] Training 1/1 epoch (loss 2.5646): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1040/1250 [06:18<01:15, 2.77it/s] Training 1/1 epoch (loss 2.4441): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1040/1250 [06:18<01:15, 2.77it/s] Training 1/1 epoch (loss 2.4441): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1041/1250 [06:18<01:17, 2.69it/s] Training 1/1 epoch (loss 2.8763): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1041/1250 [06:18<01:17, 2.69it/s] Training 1/1 epoch (loss 2.8763): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1042/1250 [06:18<01:12, 2.85it/s] Training 1/1 epoch (loss 2.7807): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1042/1250 [06:19<01:12, 2.85it/s] Training 1/1 epoch (loss 2.7807): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1043/1250 [06:19<01:14, 2.76it/s] Training 1/1 epoch (loss 2.7071): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1043/1250 [06:19<01:14, 2.76it/s] Training 1/1 epoch (loss 2.7071): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1044/1250 [06:19<01:16, 2.71it/s] Training 1/1 epoch (loss 2.8064): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1044/1250 [06:19<01:16, 2.71it/s] Training 1/1 epoch (loss 2.8064): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1045/1250 [06:19<01:14, 2.74it/s] Training 1/1 epoch (loss 2.7755): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1045/1250 [06:20<01:14, 2.74it/s] Training 1/1 epoch (loss 2.7755): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1046/1250 [06:20<01:09, 2.93it/s] Training 1/1 epoch (loss 2.7773): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1046/1250 [06:20<01:09, 2.93it/s] Training 1/1 epoch (loss 2.7773): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1047/1250 [06:20<01:09, 2.91it/s] Training 1/1 epoch (loss 2.7112): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1047/1250 [06:20<01:09, 2.91it/s] Training 1/1 epoch (loss 2.7112): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1048/1250 [06:20<01:10, 2.85it/s] Training 1/1 epoch (loss 2.6231): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1048/1250 [06:21<01:10, 2.85it/s] Training 1/1 epoch (loss 2.6231): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1049/1250 [06:21<01:12, 2.78it/s] Training 1/1 epoch (loss 2.8800): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1049/1250 [06:21<01:12, 2.78it/s] Training 1/1 epoch (loss 2.8800): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1050/1250 [06:21<01:09, 2.87it/s] Training 1/1 epoch (loss 2.7105): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1050/1250 [06:21<01:09, 2.87it/s] Training 1/1 epoch (loss 2.7105): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1051/1250 [06:21<01:10, 2.80it/s] Training 1/1 epoch (loss 2.7094): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1051/1250 [06:22<01:10, 2.80it/s] Training 1/1 epoch (loss 2.7094): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1052/1250 [06:22<01:09, 2.86it/s] Training 1/1 epoch (loss 2.5874): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1052/1250 [06:22<01:09, 2.86it/s] Training 1/1 epoch (loss 2.5874): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1053/1250 [06:22<01:06, 2.98it/s] Training 1/1 epoch (loss 2.5746): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1053/1250 [06:22<01:06, 2.98it/s] Training 1/1 epoch (loss 2.5746): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1054/1250 [06:22<01:05, 3.02it/s] Training 1/1 epoch (loss 2.5310): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1054/1250 [06:23<01:05, 3.02it/s] Training 1/1 epoch (loss 2.5310): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1055/1250 [06:23<01:13, 2.66it/s] Training 1/1 epoch (loss 2.6827): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1055/1250 [06:23<01:13, 2.66it/s] Training 1/1 epoch (loss 2.6827): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1056/1250 [06:23<01:23, 2.33it/s] Training 1/1 epoch (loss 2.7653): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1056/1250 [06:24<01:23, 2.33it/s] Training 1/1 epoch (loss 2.7653): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1057/1250 [06:24<01:17, 2.48it/s] Training 1/1 epoch (loss 2.7596): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1057/1250 [06:24<01:17, 2.48it/s] Training 1/1 epoch (loss 2.7596): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1058/1250 [06:24<01:13, 2.60it/s] Training 1/1 epoch (loss 2.7429): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1058/1250 [06:24<01:13, 2.60it/s] Training 1/1 epoch (loss 2.7429): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1059/1250 [06:24<01:13, 2.59it/s] Training 1/1 epoch (loss 2.8435): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1059/1250 [06:25<01:13, 2.59it/s] Training 1/1 epoch (loss 2.8435): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1060/1250 [06:25<01:11, 2.67it/s] Training 1/1 epoch (loss 2.5270): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1060/1250 [06:25<01:11, 2.67it/s] Training 1/1 epoch (loss 2.5270): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1061/1250 [06:25<01:07, 2.78it/s] Training 1/1 epoch (loss 2.9176): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1061/1250 [06:25<01:07, 2.78it/s] Training 1/1 epoch (loss 2.9176): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1062/1250 [06:25<01:05, 2.88it/s] Training 1/1 epoch (loss 2.8435): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1062/1250 [06:26<01:05, 2.88it/s] Training 1/1 epoch (loss 2.8435): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1063/1250 [06:26<01:04, 2.89it/s] Training 1/1 epoch (loss 2.7510): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1063/1250 [06:26<01:04, 2.89it/s] Training 1/1 epoch (loss 2.7510): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1064/1250 [06:26<01:05, 2.84it/s] Training 1/1 epoch (loss 2.6658): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1064/1250 [06:27<01:05, 2.84it/s] Training 1/1 epoch (loss 2.6658): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1065/1250 [06:27<01:04, 2.85it/s] Training 1/1 epoch (loss 2.8722): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1065/1250 [06:27<01:04, 2.85it/s] Training 1/1 epoch (loss 2.8722): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1066/1250 [06:27<01:05, 2.80it/s] Training 1/1 epoch (loss 2.6778): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1066/1250 [06:27<01:05, 2.80it/s] Training 1/1 epoch (loss 2.6778): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1067/1250 [06:27<01:04, 2.83it/s] Training 1/1 epoch (loss 2.6186): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1067/1250 [06:28<01:04, 2.83it/s] Training 1/1 epoch (loss 2.6186): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1068/1250 [06:28<01:05, 2.78it/s] Training 1/1 epoch (loss 2.6356): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1068/1250 [06:28<01:05, 2.78it/s] Training 1/1 epoch (loss 2.6356): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1069/1250 [06:28<01:07, 2.68it/s] Training 1/1 epoch (loss 2.7030): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1069/1250 [06:29<01:07, 2.68it/s] Training 1/1 epoch (loss 2.7030): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1070/1250 [06:29<01:16, 2.35it/s] Training 1/1 epoch (loss 2.7921): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1070/1250 [06:29<01:16, 2.35it/s] Training 1/1 epoch (loss 2.7921): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1071/1250 [06:29<01:11, 2.52it/s] Training 1/1 epoch (loss 2.7193): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1071/1250 [06:29<01:11, 2.52it/s] Training 1/1 epoch (loss 2.7193): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1072/1250 [06:29<01:09, 2.57it/s] Training 1/1 epoch (loss 2.6913): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1072/1250 [06:30<01:09, 2.57it/s] Training 1/1 epoch (loss 2.6913): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1073/1250 [06:30<01:13, 2.41it/s] Training 1/1 epoch (loss 2.7797): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1073/1250 [06:30<01:13, 2.41it/s] Training 1/1 epoch (loss 2.7797): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1074/1250 [06:30<01:08, 2.55it/s] Training 1/1 epoch (loss 2.7718): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1074/1250 [06:30<01:08, 2.55it/s] Training 1/1 epoch (loss 2.7718): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1075/1250 [06:30<01:05, 2.69it/s] Training 1/1 epoch (loss 2.7740): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1075/1250 [06:31<01:05, 2.69it/s] Training 1/1 epoch (loss 2.7740): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1076/1250 [06:31<01:01, 2.81it/s] Training 1/1 epoch (loss 2.7415): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1076/1250 [06:31<01:01, 2.81it/s] Training 1/1 epoch (loss 2.7415): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1077/1250 [06:31<01:02, 2.75it/s] Training 1/1 epoch (loss 2.7145): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1077/1250 [06:31<01:02, 2.75it/s] Training 1/1 epoch (loss 2.7145): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1078/1250 [06:31<01:01, 2.80it/s] Training 1/1 epoch (loss 2.7126): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1078/1250 [06:32<01:01, 2.80it/s] Training 1/1 epoch (loss 2.7126): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1079/1250 [06:32<01:00, 2.84it/s] Training 1/1 epoch (loss 2.8995): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1079/1250 [06:32<01:00, 2.84it/s] Training 1/1 epoch (loss 2.8995): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1080/1250 [06:32<01:00, 2.81it/s] Training 1/1 epoch (loss 2.8633): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1080/1250 [06:32<01:00, 2.81it/s] Training 1/1 epoch (loss 2.8633): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1081/1250 [06:32<00:59, 2.85it/s] Training 1/1 epoch (loss 2.4616): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1081/1250 [06:33<00:59, 2.85it/s] Training 1/1 epoch (loss 2.4616): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1082/1250 [06:33<01:01, 2.74it/s] Training 1/1 epoch (loss 2.7447): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1082/1250 [06:33<01:01, 2.74it/s] Training 1/1 epoch (loss 2.7447): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1083/1250 [06:33<01:00, 2.78it/s] Training 1/1 epoch (loss 2.8334): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1083/1250 [06:34<01:00, 2.78it/s] Training 1/1 epoch (loss 2.8334): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1084/1250 [06:34<00:57, 2.88it/s] Training 1/1 epoch (loss 2.6201): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1084/1250 [06:34<00:57, 2.88it/s] Training 1/1 epoch (loss 2.6201): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1085/1250 [06:34<00:57, 2.86it/s] Training 1/1 epoch (loss 2.6761): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1085/1250 [06:34<00:57, 2.86it/s] Training 1/1 epoch (loss 2.6761): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1086/1250 [06:34<01:00, 2.72it/s] Training 1/1 epoch (loss 2.8060): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1086/1250 [06:35<01:00, 2.72it/s] Training 1/1 epoch (loss 2.8060): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1087/1250 [06:35<00:57, 2.83it/s] Training 1/1 epoch (loss 2.7887): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1087/1250 [06:35<00:57, 2.83it/s] Training 1/1 epoch (loss 2.7887): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1088/1250 [06:35<00:58, 2.75it/s] Training 1/1 epoch (loss 2.5960): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1088/1250 [06:35<00:58, 2.75it/s] Training 1/1 epoch (loss 2.5960): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1089/1250 [06:35<00:58, 2.76it/s] Training 1/1 epoch (loss 2.7048): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1089/1250 [06:36<00:58, 2.76it/s] Training 1/1 epoch (loss 2.7048): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1090/1250 [06:36<00:56, 2.83it/s] Training 1/1 epoch (loss 2.7767): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1090/1250 [06:36<00:56, 2.83it/s] Training 1/1 epoch (loss 2.7767): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1091/1250 [06:36<00:54, 2.91it/s] Training 1/1 epoch (loss 2.5785): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1091/1250 [06:36<00:54, 2.91it/s] Training 1/1 epoch (loss 2.5785): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1092/1250 [06:36<00:52, 3.03it/s] Training 1/1 epoch (loss 2.6283): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1092/1250 [06:37<00:52, 3.03it/s] Training 1/1 epoch (loss 2.6283): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1093/1250 [06:37<00:53, 2.93it/s] Training 1/1 epoch (loss 2.6387): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1093/1250 [06:37<00:53, 2.93it/s] Training 1/1 epoch (loss 2.6387): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1094/1250 [06:37<00:52, 2.97it/s] Training 1/1 epoch (loss 2.7890): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1094/1250 [06:37<00:52, 2.97it/s] Training 1/1 epoch (loss 2.7890): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1095/1250 [06:37<00:51, 2.99it/s] Training 1/1 epoch (loss 2.6037): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1095/1250 [06:38<00:51, 2.99it/s] Training 1/1 epoch (loss 2.6037): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1096/1250 [06:38<00:50, 3.03it/s] Training 1/1 epoch (loss 2.9103): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1096/1250 [06:38<00:50, 3.03it/s] Training 1/1 epoch (loss 2.9103): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1097/1250 [06:38<00:52, 2.91it/s] Training 1/1 epoch (loss 2.8141): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1097/1250 [06:38<00:52, 2.91it/s] Training 1/1 epoch (loss 2.8141): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1098/1250 [06:38<00:50, 3.03it/s] Training 1/1 epoch (loss 2.8171): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1098/1250 [06:39<00:50, 3.03it/s] Training 1/1 epoch (loss 2.8171): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1099/1250 [06:39<00:52, 2.88it/s] Training 1/1 epoch (loss 2.8308): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1099/1250 [06:39<00:52, 2.88it/s] Training 1/1 epoch (loss 2.8308): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1100/1250 [06:39<00:51, 2.92it/s] Training 1/1 epoch (loss 2.5585): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1100/1250 [06:39<00:51, 2.92it/s] Training 1/1 epoch (loss 2.5585): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1101/1250 [06:39<00:50, 2.95it/s] Training 1/1 epoch (loss 2.8185): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1101/1250 [06:40<00:50, 2.95it/s] Training 1/1 epoch (loss 2.8185): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1102/1250 [06:40<00:53, 2.78it/s] Training 1/1 epoch (loss 2.7450): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1102/1250 [06:40<00:53, 2.78it/s] Training 1/1 epoch (loss 2.7450): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1103/1250 [06:40<00:52, 2.78it/s] Training 1/1 epoch (loss 2.5875): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1103/1250 [06:40<00:52, 2.78it/s] Training 1/1 epoch (loss 2.5875): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1104/1250 [06:40<00:50, 2.89it/s] Training 1/1 epoch (loss 2.6279): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1104/1250 [06:41<00:50, 2.89it/s] Training 1/1 epoch (loss 2.6279): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1105/1250 [06:41<00:52, 2.76it/s] Training 1/1 epoch (loss 2.6561): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1105/1250 [06:41<00:52, 2.76it/s] Training 1/1 epoch (loss 2.6561): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1106/1250 [06:41<00:50, 2.88it/s] Training 1/1 epoch (loss 2.5517): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1106/1250 [06:42<00:50, 2.88it/s] Training 1/1 epoch (loss 2.5517): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1107/1250 [06:42<00:49, 2.91it/s] Training 1/1 epoch (loss 2.8169): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1107/1250 [06:42<00:49, 2.91it/s] Training 1/1 epoch (loss 2.8169): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1108/1250 [06:42<00:48, 2.91it/s] Training 1/1 epoch (loss 2.7932): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1108/1250 [06:42<00:48, 2.91it/s] Training 1/1 epoch (loss 2.7932): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1109/1250 [06:42<00:47, 2.94it/s] Training 1/1 epoch (loss 2.8644): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1109/1250 [06:43<00:47, 2.94it/s] Training 1/1 epoch (loss 2.8644): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1110/1250 [06:43<00:48, 2.88it/s] Training 1/1 epoch (loss 2.8404): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1110/1250 [06:43<00:48, 2.88it/s] Training 1/1 epoch (loss 2.8404): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1111/1250 [06:43<00:46, 2.98it/s] Training 1/1 epoch (loss 2.7183): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1111/1250 [06:43<00:46, 2.98it/s] Training 1/1 epoch (loss 2.7183): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1112/1250 [06:43<00:47, 2.90it/s] Training 1/1 epoch (loss 2.7744): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1112/1250 [06:44<00:47, 2.90it/s] Training 1/1 epoch (loss 2.7744): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1113/1250 [06:44<00:46, 2.92it/s] Training 1/1 epoch (loss 2.6834): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1113/1250 [06:44<00:46, 2.92it/s] Training 1/1 epoch (loss 2.6834): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1114/1250 [06:44<00:46, 2.95it/s] Training 1/1 epoch (loss 2.5831): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1114/1250 [06:44<00:46, 2.95it/s] Training 1/1 epoch (loss 2.5831): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1115/1250 [06:44<00:47, 2.85it/s] Training 1/1 epoch (loss 2.6497): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1115/1250 [06:45<00:47, 2.85it/s] Training 1/1 epoch (loss 2.6497): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1116/1250 [06:45<00:48, 2.75it/s] Training 1/1 epoch (loss 2.7793): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1116/1250 [06:45<00:48, 2.75it/s] Training 1/1 epoch (loss 2.7793): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1117/1250 [06:45<00:47, 2.79it/s] Training 1/1 epoch (loss 2.5217): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1117/1250 [06:45<00:47, 2.79it/s] Training 1/1 epoch (loss 2.5217): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1118/1250 [06:45<00:49, 2.68it/s] Training 1/1 epoch (loss 2.5990): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1118/1250 [06:46<00:49, 2.68it/s] Training 1/1 epoch (loss 2.5990): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1119/1250 [06:46<00:47, 2.78it/s] Training 1/1 epoch (loss 2.5312): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1119/1250 [06:46<00:47, 2.78it/s] Training 1/1 epoch (loss 2.5312): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1120/1250 [06:46<00:46, 2.78it/s] Training 1/1 epoch (loss 2.7128): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1120/1250 [06:46<00:46, 2.78it/s] Training 1/1 epoch (loss 2.7128): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1121/1250 [06:46<00:45, 2.83it/s] Training 1/1 epoch (loss 2.6886): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1121/1250 [06:47<00:45, 2.83it/s] Training 1/1 epoch (loss 2.6886): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1122/1250 [06:47<00:43, 2.92it/s] Training 1/1 epoch (loss 2.7570): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1122/1250 [06:47<00:43, 2.92it/s] Training 1/1 epoch (loss 2.7570): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1123/1250 [06:47<00:43, 2.92it/s] Training 1/1 epoch (loss 2.6605): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1123/1250 [06:47<00:43, 2.92it/s] Training 1/1 epoch (loss 2.6605): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1124/1250 [06:47<00:42, 2.96it/s] Training 1/1 epoch (loss 2.8243): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1124/1250 [06:48<00:42, 2.96it/s] Training 1/1 epoch (loss 2.8243): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1125/1250 [06:48<00:41, 3.02it/s] Training 1/1 epoch (loss 2.8103): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1125/1250 [06:48<00:41, 3.02it/s] Training 1/1 epoch (loss 2.8103): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1126/1250 [06:48<00:40, 3.06it/s] Training 1/1 epoch (loss 2.6230): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1126/1250 [06:48<00:40, 3.06it/s] Training 1/1 epoch (loss 2.6230): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1127/1250 [06:48<00:40, 3.03it/s] Training 1/1 epoch (loss 2.7427): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1127/1250 [06:49<00:40, 3.03it/s] Training 1/1 epoch (loss 2.7427): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1128/1250 [06:49<00:40, 2.99it/s] Training 1/1 epoch (loss 2.6496): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1128/1250 [06:49<00:40, 2.99it/s] Training 1/1 epoch (loss 2.6496): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1129/1250 [06:49<00:41, 2.90it/s] Training 1/1 epoch (loss 2.7239): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1129/1250 [06:49<00:41, 2.90it/s] Training 1/1 epoch (loss 2.7239): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1130/1250 [06:49<00:41, 2.91it/s] Training 1/1 epoch (loss 2.5236): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1130/1250 [06:50<00:41, 2.91it/s] Training 1/1 epoch (loss 2.5236): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1131/1250 [06:50<00:40, 2.96it/s] Training 1/1 epoch (loss 2.6549): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1131/1250 [06:50<00:40, 2.96it/s] Training 1/1 epoch (loss 2.6549): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1132/1250 [06:50<00:38, 3.06it/s] Training 1/1 epoch (loss 2.5997): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1132/1250 [06:51<00:38, 3.06it/s] Training 1/1 epoch (loss 2.5997): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1133/1250 [06:51<00:40, 2.86it/s] Training 1/1 epoch (loss 2.7109): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1133/1250 [06:51<00:40, 2.86it/s] Training 1/1 epoch (loss 2.7109): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1134/1250 [06:51<00:40, 2.85it/s] Training 1/1 epoch (loss 2.7163): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1134/1250 [06:51<00:40, 2.85it/s] Training 1/1 epoch (loss 2.7163): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1135/1250 [06:51<00:38, 2.97it/s] Training 1/1 epoch (loss 2.6398): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1135/1250 [06:52<00:38, 2.97it/s] Training 1/1 epoch (loss 2.6398): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1136/1250 [06:52<00:39, 2.85it/s] Training 1/1 epoch (loss 2.6974): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1136/1250 [06:52<00:39, 2.85it/s] Training 1/1 epoch (loss 2.6974): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1137/1250 [06:52<00:39, 2.87it/s] Training 1/1 epoch (loss 2.7106): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1137/1250 [06:52<00:39, 2.87it/s] Training 1/1 epoch (loss 2.7106): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1138/1250 [06:52<00:38, 2.92it/s] Training 1/1 epoch (loss 2.5012): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1138/1250 [06:53<00:38, 2.92it/s] Training 1/1 epoch (loss 2.5012): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1139/1250 [06:53<00:37, 2.96it/s] Training 1/1 epoch (loss 2.8785): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1139/1250 [06:53<00:37, 2.96it/s] Training 1/1 epoch (loss 2.8785): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1140/1250 [06:53<00:42, 2.61it/s] Training 1/1 epoch (loss 2.5789): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1140/1250 [06:53<00:42, 2.61it/s] Training 1/1 epoch (loss 2.5789): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1141/1250 [06:53<00:43, 2.51it/s] Training 1/1 epoch (loss 2.6396): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1141/1250 [06:54<00:43, 2.51it/s] Training 1/1 epoch (loss 2.6396): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1142/1250 [06:54<00:41, 2.61it/s] Training 1/1 epoch (loss 2.7244): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1142/1250 [06:54<00:41, 2.61it/s] Training 1/1 epoch (loss 2.7244): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1143/1250 [06:54<00:41, 2.59it/s] Training 1/1 epoch (loss 2.5499): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1143/1250 [06:55<00:41, 2.59it/s] Training 1/1 epoch (loss 2.5499): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1144/1250 [06:55<00:41, 2.54it/s] Training 1/1 epoch (loss 2.6689): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1144/1250 [06:55<00:41, 2.54it/s] Training 1/1 epoch (loss 2.6689): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1145/1250 [06:55<00:40, 2.60it/s] Training 1/1 epoch (loss 2.7423): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1145/1250 [06:55<00:40, 2.60it/s] Training 1/1 epoch (loss 2.7423): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1146/1250 [06:55<00:37, 2.77it/s] Training 1/1 epoch (loss 2.5494): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1146/1250 [06:56<00:37, 2.77it/s] Training 1/1 epoch (loss 2.5494): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1147/1250 [06:56<00:36, 2.83it/s] Training 1/1 epoch (loss 2.7938): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1147/1250 [06:56<00:36, 2.83it/s] Training 1/1 epoch (loss 2.7938): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1148/1250 [06:56<00:36, 2.78it/s] Training 1/1 epoch (loss 2.5668): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1148/1250 [06:56<00:36, 2.78it/s] Training 1/1 epoch (loss 2.5668): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1149/1250 [06:56<00:35, 2.84it/s] Training 1/1 epoch (loss 2.5124): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1149/1250 [06:57<00:35, 2.84it/s] Training 1/1 epoch (loss 2.5124): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1150/1250 [06:57<00:35, 2.81it/s] Training 1/1 epoch (loss 2.8845): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1150/1250 [06:57<00:35, 2.81it/s] Training 1/1 epoch (loss 2.8845): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1151/1250 [06:57<00:35, 2.77it/s] Training 1/1 epoch (loss 2.7269): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1151/1250 [06:57<00:35, 2.77it/s] Training 1/1 epoch (loss 2.7269): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1152/1250 [06:57<00:35, 2.80it/s] Training 1/1 epoch (loss 2.7051): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1152/1250 [06:58<00:35, 2.80it/s] Training 1/1 epoch (loss 2.7051): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1153/1250 [06:58<00:37, 2.56it/s] Training 1/1 epoch (loss 2.6192): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1153/1250 [06:58<00:37, 2.56it/s] Training 1/1 epoch (loss 2.6192): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1154/1250 [06:58<00:39, 2.44it/s] Training 1/1 epoch (loss 2.6444): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1154/1250 [06:59<00:39, 2.44it/s] Training 1/1 epoch (loss 2.6444): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1155/1250 [06:59<00:37, 2.56it/s] Training 1/1 epoch (loss 2.6967): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1155/1250 [06:59<00:37, 2.56it/s] Training 1/1 epoch (loss 2.6967): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1156/1250 [06:59<00:35, 2.66it/s] Training 1/1 epoch (loss 2.6281): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1156/1250 [06:59<00:35, 2.66it/s] Training 1/1 epoch (loss 2.6281): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1157/1250 [06:59<00:34, 2.73it/s] Training 1/1 epoch (loss 2.6783): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1157/1250 [07:00<00:34, 2.73it/s] Training 1/1 epoch (loss 2.6783): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1158/1250 [07:00<00:33, 2.76it/s] Training 1/1 epoch (loss 2.6472): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1158/1250 [07:00<00:33, 2.76it/s] Training 1/1 epoch (loss 2.6472): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1159/1250 [07:00<00:31, 2.88it/s] Training 1/1 epoch (loss 2.6331): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1159/1250 [07:00<00:31, 2.88it/s] Training 1/1 epoch (loss 2.6331): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1160/1250 [07:00<00:31, 2.88it/s] Training 1/1 epoch (loss 2.5052): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1160/1250 [07:01<00:31, 2.88it/s] Training 1/1 epoch (loss 2.5052): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1161/1250 [07:01<00:33, 2.69it/s] Training 1/1 epoch (loss 2.6407): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1161/1250 [07:01<00:33, 2.69it/s] Training 1/1 epoch (loss 2.6407): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1162/1250 [07:01<00:31, 2.83it/s] Training 1/1 epoch (loss 2.5739): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1162/1250 [07:01<00:31, 2.83it/s] Training 1/1 epoch (loss 2.5739): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1163/1250 [07:01<00:30, 2.86it/s] Training 1/1 epoch (loss 2.4780): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1163/1250 [07:02<00:30, 2.86it/s] Training 1/1 epoch (loss 2.4780): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1164/1250 [07:02<00:28, 3.00it/s] Training 1/1 epoch (loss 2.6826): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1164/1250 [07:02<00:28, 3.00it/s] Training 1/1 epoch (loss 2.6826): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1165/1250 [07:02<00:28, 3.02it/s] Training 1/1 epoch (loss 2.7977): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1165/1250 [07:02<00:28, 3.02it/s] Training 1/1 epoch (loss 2.7977): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1166/1250 [07:02<00:27, 3.06it/s] Training 1/1 epoch (loss 2.8533): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1166/1250 [07:03<00:27, 3.06it/s] Training 1/1 epoch (loss 2.8533): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1167/1250 [07:03<00:27, 2.99it/s] Training 1/1 epoch (loss 2.7859): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1167/1250 [07:03<00:27, 2.99it/s] Training 1/1 epoch (loss 2.7859): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1168/1250 [07:03<00:30, 2.72it/s] Training 1/1 epoch (loss 2.5100): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1168/1250 [07:04<00:30, 2.72it/s] Training 1/1 epoch (loss 2.5100): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1169/1250 [07:04<00:29, 2.73it/s] Training 1/1 epoch (loss 2.6529): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1169/1250 [07:04<00:29, 2.73it/s] Training 1/1 epoch (loss 2.6529): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1170/1250 [07:04<00:28, 2.79it/s] Training 1/1 epoch (loss 2.6889): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1170/1250 [07:04<00:28, 2.79it/s] Training 1/1 epoch (loss 2.6889): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1171/1250 [07:04<00:27, 2.84it/s] Training 1/1 epoch (loss 2.8419): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1171/1250 [07:05<00:27, 2.84it/s] Training 1/1 epoch (loss 2.8419): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1172/1250 [07:05<00:27, 2.87it/s] Training 1/1 epoch (loss 2.8788): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1172/1250 [07:05<00:27, 2.87it/s] Training 1/1 epoch (loss 2.8788): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1173/1250 [07:05<00:27, 2.84it/s] Training 1/1 epoch (loss 2.7401): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1173/1250 [07:05<00:27, 2.84it/s] Training 1/1 epoch (loss 2.7401): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1174/1250 [07:05<00:25, 2.95it/s] Training 1/1 epoch (loss 2.7389): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1174/1250 [07:06<00:25, 2.95it/s] Training 1/1 epoch (loss 2.7389): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1175/1250 [07:06<00:25, 2.97it/s] Training 1/1 epoch (loss 2.7504): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1175/1250 [07:06<00:25, 2.97it/s] Training 1/1 epoch (loss 2.7504): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1176/1250 [07:06<00:24, 2.99it/s] Training 1/1 epoch (loss 2.7947): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1176/1250 [07:06<00:24, 2.99it/s] Training 1/1 epoch (loss 2.7947): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1177/1250 [07:06<00:26, 2.79it/s] Training 1/1 epoch (loss 2.7881): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1177/1250 [07:07<00:26, 2.79it/s] Training 1/1 epoch (loss 2.7881): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1178/1250 [07:07<00:26, 2.70it/s] Training 1/1 epoch (loss 2.6393): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1178/1250 [07:07<00:26, 2.70it/s] Training 1/1 epoch (loss 2.6393): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1179/1250 [07:07<00:25, 2.73it/s] Training 1/1 epoch (loss 2.7907): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1179/1250 [07:07<00:25, 2.73it/s] Training 1/1 epoch (loss 2.7907): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1180/1250 [07:07<00:24, 2.81it/s] Training 1/1 epoch (loss 2.7758): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1180/1250 [07:08<00:24, 2.81it/s] Training 1/1 epoch (loss 2.7758): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1181/1250 [07:08<00:24, 2.87it/s] Training 1/1 epoch (loss 2.7540): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1181/1250 [07:08<00:24, 2.87it/s] Training 1/1 epoch (loss 2.7540): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1182/1250 [07:08<00:23, 2.95it/s] Training 1/1 epoch (loss 2.7213): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1182/1250 [07:08<00:23, 2.95it/s] Training 1/1 epoch (loss 2.7213): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1183/1250 [07:08<00:22, 3.02it/s] Training 1/1 epoch (loss 2.6067): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1183/1250 [07:09<00:22, 3.02it/s] Training 1/1 epoch (loss 2.6067): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1184/1250 [07:09<00:23, 2.84it/s] Training 1/1 epoch (loss 2.7273): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1184/1250 [07:09<00:23, 2.84it/s] Training 1/1 epoch (loss 2.7273): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1185/1250 [07:09<00:22, 2.84it/s] Training 1/1 epoch (loss 2.7254): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1185/1250 [07:10<00:22, 2.84it/s] Training 1/1 epoch (loss 2.7254): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1186/1250 [07:10<00:23, 2.67it/s] Training 1/1 epoch (loss 2.8663): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1186/1250 [07:10<00:23, 2.67it/s] Training 1/1 epoch (loss 2.8663): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1187/1250 [07:10<00:22, 2.80it/s] Training 1/1 epoch (loss 2.4725): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1187/1250 [07:10<00:22, 2.80it/s] Training 1/1 epoch (loss 2.4725): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1188/1250 [07:10<00:21, 2.93it/s] Training 1/1 epoch (loss 2.5905): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1188/1250 [07:10<00:21, 2.93it/s] Training 1/1 epoch (loss 2.5905): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1189/1250 [07:10<00:20, 3.02it/s] Training 1/1 epoch (loss 2.9385): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1189/1250 [07:11<00:20, 3.02it/s] Training 1/1 epoch (loss 2.9385): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1190/1250 [07:11<00:19, 3.03it/s] Training 1/1 epoch (loss 2.8558): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1190/1250 [07:11<00:19, 3.03it/s] Training 1/1 epoch (loss 2.8558): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1191/1250 [07:11<00:20, 2.95it/s] Training 1/1 epoch (loss 2.7171): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1191/1250 [07:12<00:20, 2.95it/s] Training 1/1 epoch (loss 2.7171): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1192/1250 [07:12<00:21, 2.75it/s] Training 1/1 epoch (loss 2.7325): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1192/1250 [07:12<00:21, 2.75it/s] Training 1/1 epoch (loss 2.7325): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1193/1250 [07:12<00:20, 2.76it/s] Training 1/1 epoch (loss 2.7073): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1193/1250 [07:12<00:20, 2.76it/s] Training 1/1 epoch (loss 2.7073): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1194/1250 [07:12<00:19, 2.88it/s] Training 1/1 epoch (loss 2.7972): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1194/1250 [07:13<00:19, 2.88it/s] Training 1/1 epoch (loss 2.7972): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1195/1250 [07:13<00:18, 2.97it/s] Training 1/1 epoch (loss 2.9516): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1195/1250 [07:13<00:18, 2.97it/s] Training 1/1 epoch (loss 2.9516): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1196/1250 [07:13<00:18, 2.99it/s] Training 1/1 epoch (loss 2.7760): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1196/1250 [07:13<00:18, 2.99it/s] Training 1/1 epoch (loss 2.7760): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1197/1250 [07:13<00:17, 3.03it/s] Training 1/1 epoch (loss 2.6236): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1197/1250 [07:14<00:17, 3.03it/s] Training 1/1 epoch (loss 2.6236): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1198/1250 [07:14<00:17, 3.00it/s] Training 1/1 epoch (loss 2.6223): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1198/1250 [07:14<00:17, 3.00it/s] Training 1/1 epoch (loss 2.6223): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1199/1250 [07:14<00:17, 2.95it/s] Training 1/1 epoch (loss 2.6392): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1199/1250 [07:14<00:17, 2.95it/s] Training 1/1 epoch (loss 2.6392): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1200/1250 [07:14<00:18, 2.64it/s] Training 1/1 epoch (loss 2.5985): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1200/1250 [07:15<00:18, 2.64it/s] Training 1/1 epoch (loss 2.5985): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1201/1250 [07:15<00:18, 2.67it/s] Training 1/1 epoch (loss 2.5396): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1201/1250 [07:15<00:18, 2.67it/s] Training 1/1 epoch (loss 2.5396): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1202/1250 [07:15<00:18, 2.65it/s] Training 1/1 epoch (loss 2.7128): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1202/1250 [07:15<00:18, 2.65it/s] Training 1/1 epoch (loss 2.7128): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1203/1250 [07:15<00:17, 2.75it/s] Training 1/1 epoch (loss 2.5919): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1203/1250 [07:16<00:17, 2.75it/s] Training 1/1 epoch (loss 2.5919): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1204/1250 [07:16<00:15, 2.91it/s] Training 1/1 epoch (loss 2.9678): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1204/1250 [07:16<00:15, 2.91it/s] Training 1/1 epoch (loss 2.9678): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1205/1250 [07:16<00:15, 2.99it/s] Training 1/1 epoch (loss 2.6221): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1205/1250 [07:16<00:15, 2.99it/s] Training 1/1 epoch (loss 2.6221): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1206/1250 [07:16<00:14, 3.03it/s] Training 1/1 epoch (loss 2.6957): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1206/1250 [07:17<00:14, 3.03it/s] Training 1/1 epoch (loss 2.6957): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1207/1250 [07:17<00:15, 2.84it/s] Training 1/1 epoch (loss 2.6710): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1207/1250 [07:17<00:15, 2.84it/s] Training 1/1 epoch (loss 2.6710): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1208/1250 [07:17<00:14, 2.86it/s] Training 1/1 epoch (loss 2.7345): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1208/1250 [07:18<00:14, 2.86it/s] Training 1/1 epoch (loss 2.7345): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1209/1250 [07:18<00:14, 2.78it/s] Training 1/1 epoch (loss 2.8447): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1209/1250 [07:18<00:14, 2.78it/s] Training 1/1 epoch (loss 2.8447): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1210/1250 [07:18<00:14, 2.83it/s] Training 1/1 epoch (loss 2.7053): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1210/1250 [07:18<00:14, 2.83it/s] Training 1/1 epoch (loss 2.7053): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1211/1250 [07:18<00:13, 2.90it/s] Training 1/1 epoch (loss 2.6631): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1211/1250 [07:18<00:13, 2.90it/s] Training 1/1 epoch (loss 2.6631): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1212/1250 [07:18<00:12, 3.02it/s] Training 1/1 epoch (loss 2.7798): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1212/1250 [07:19<00:12, 3.02it/s] Training 1/1 epoch (loss 2.7798): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1213/1250 [07:19<00:12, 2.97it/s] Training 1/1 epoch (loss 2.5544): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1213/1250 [07:19<00:12, 2.97it/s] Training 1/1 epoch (loss 2.5544): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1214/1250 [07:19<00:12, 2.92it/s] Training 1/1 epoch (loss 2.6673): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1214/1250 [07:20<00:12, 2.92it/s] Training 1/1 epoch (loss 2.6673): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1215/1250 [07:20<00:12, 2.81it/s] Training 1/1 epoch (loss 3.0018): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1215/1250 [07:20<00:12, 2.81it/s] Training 1/1 epoch (loss 3.0018): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1216/1250 [07:20<00:11, 2.87it/s] Training 1/1 epoch (loss 2.5760): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1216/1250 [07:20<00:11, 2.87it/s] Training 1/1 epoch (loss 2.5760): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1217/1250 [07:20<00:11, 2.90it/s] Training 1/1 epoch (loss 2.6415): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1217/1250 [07:21<00:11, 2.90it/s] Training 1/1 epoch (loss 2.6415): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1218/1250 [07:21<00:10, 2.95it/s] Training 1/1 epoch (loss 2.7668): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1218/1250 [07:21<00:10, 2.95it/s] Training 1/1 epoch (loss 2.7668): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1219/1250 [07:21<00:10, 2.98it/s] Training 1/1 epoch (loss 2.6507): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1219/1250 [07:21<00:10, 2.98it/s] Training 1/1 epoch (loss 2.6507): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1220/1250 [07:21<00:10, 2.90it/s] Training 1/1 epoch (loss 2.5687): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1220/1250 [07:22<00:10, 2.90it/s] Training 1/1 epoch (loss 2.5687): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1221/1250 [07:22<00:10, 2.89it/s] Training 1/1 epoch (loss 2.5386): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1221/1250 [07:22<00:10, 2.89it/s] Training 1/1 epoch (loss 2.5386): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1222/1250 [07:22<00:09, 2.81it/s] Training 1/1 epoch (loss 2.5924): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1222/1250 [07:22<00:09, 2.81it/s] Training 1/1 epoch (loss 2.5924): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1223/1250 [07:22<00:09, 2.88it/s] Training 1/1 epoch (loss 2.7281): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1223/1250 [07:23<00:09, 2.88it/s] Training 1/1 epoch (loss 2.7281): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1224/1250 [07:23<00:09, 2.73it/s] Training 1/1 epoch (loss 2.6974): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1224/1250 [07:23<00:09, 2.73it/s] Training 1/1 epoch (loss 2.6974): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1225/1250 [07:23<00:10, 2.36it/s] Training 1/1 epoch (loss 2.7985): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1225/1250 [07:24<00:10, 2.36it/s] Training 1/1 epoch (loss 2.7985): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1226/1250 [07:24<00:09, 2.52it/s] Training 1/1 epoch (loss 2.6792): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1226/1250 [07:24<00:09, 2.52it/s] Training 1/1 epoch (loss 2.6792): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1227/1250 [07:24<00:08, 2.61it/s] Training 1/1 epoch (loss 2.8331): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1227/1250 [07:24<00:08, 2.61it/s] Training 1/1 epoch (loss 2.8331): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1228/1250 [07:24<00:08, 2.62it/s] Training 1/1 epoch (loss 2.4746): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1228/1250 [07:25<00:08, 2.62it/s] Training 1/1 epoch (loss 2.4746): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1229/1250 [07:25<00:07, 2.68it/s] Training 1/1 epoch (loss 2.7689): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1229/1250 [07:25<00:07, 2.68it/s] Training 1/1 epoch (loss 2.7689): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1230/1250 [07:25<00:07, 2.79it/s] Training 1/1 epoch (loss 2.6478): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1230/1250 [07:25<00:07, 2.79it/s] Training 1/1 epoch (loss 2.6478): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1231/1250 [07:25<00:06, 2.73it/s] Training 1/1 epoch (loss 2.7677): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1231/1250 [07:26<00:06, 2.73it/s] Training 1/1 epoch (loss 2.7677): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1232/1250 [07:26<00:06, 2.78it/s] Training 1/1 epoch (loss 2.7679): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1232/1250 [07:26<00:06, 2.78it/s] Training 1/1 epoch (loss 2.7679): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1233/1250 [07:26<00:06, 2.81it/s] Training 1/1 epoch (loss 2.7595): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1233/1250 [07:26<00:06, 2.81it/s] Training 1/1 epoch (loss 2.7595): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1234/1250 [07:26<00:05, 2.98it/s] Training 1/1 epoch (loss 2.6532): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1234/1250 [07:27<00:05, 2.98it/s] Training 1/1 epoch (loss 2.6532): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1235/1250 [07:27<00:05, 2.81it/s] Training 1/1 epoch (loss 2.5893): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1235/1250 [07:27<00:05, 2.81it/s] Training 1/1 epoch (loss 2.5893): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1236/1250 [07:27<00:05, 2.79it/s] Training 1/1 epoch (loss 2.6252): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1236/1250 [07:28<00:05, 2.79it/s] Training 1/1 epoch (loss 2.6252): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1237/1250 [07:28<00:04, 2.75it/s] Training 1/1 epoch (loss 2.6661): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1237/1250 [07:28<00:04, 2.75it/s] Training 1/1 epoch (loss 2.6661): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1238/1250 [07:28<00:04, 2.42it/s] Training 1/1 epoch (loss 2.8040): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1238/1250 [07:29<00:04, 2.42it/s] Training 1/1 epoch (loss 2.8040): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1239/1250 [07:29<00:04, 2.36it/s] Training 1/1 epoch (loss 2.6784): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1239/1250 [07:29<00:04, 2.36it/s] Training 1/1 epoch (loss 2.6784): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1240/1250 [07:29<00:03, 2.55it/s] Training 1/1 epoch (loss 2.7301): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1240/1250 [07:29<00:03, 2.55it/s] Training 1/1 epoch (loss 2.7301): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1241/1250 [07:29<00:03, 2.54it/s] Training 1/1 epoch (loss 2.7583): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1241/1250 [07:30<00:03, 2.54it/s] Training 1/1 epoch (loss 2.7583): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1242/1250 [07:30<00:03, 2.46it/s] Training 1/1 epoch (loss 2.8828): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1242/1250 [07:30<00:03, 2.46it/s] Training 1/1 epoch (loss 2.8828): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1243/1250 [07:30<00:02, 2.53it/s] Training 1/1 epoch (loss 2.7951): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1243/1250 [07:30<00:02, 2.53it/s] Training 1/1 epoch (loss 2.7951): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1244/1250 [07:30<00:02, 2.60it/s] Training 1/1 epoch (loss 2.9376): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1244/1250 [07:31<00:02, 2.60it/s] Training 1/1 epoch (loss 2.9376): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1245/1250 [07:31<00:01, 2.76it/s] Training 1/1 epoch (loss 2.5708): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1245/1250 [07:31<00:01, 2.76it/s] Training 1/1 epoch (loss 2.5708): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1246/1250 [07:31<00:01, 2.87it/s] Training 1/1 epoch (loss 2.8824): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1246/1250 [07:31<00:01, 2.87it/s] Training 1/1 epoch (loss 2.8824): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1247/1250 [07:31<00:01, 2.82it/s] Training 1/1 epoch (loss 2.7327): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1247/1250 [07:32<00:01, 2.82it/s] Training 1/1 epoch (loss 2.7327): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1248/1250 [07:32<00:00, 2.78it/s] Training 1/1 epoch (loss 2.8448): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1248/1250 [07:32<00:00, 2.78it/s] Training 1/1 epoch (loss 2.8448): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1249/1250 [07:32<00:00, 2.86it/s] Training 1/1 epoch (loss 2.5225): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1249/1250 [07:33<00:00, 2.86it/s] Training 1/1 epoch (loss 2.5225): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1250/1250 [07:33<00:00, 2.64it/s] Training 1/1 epoch (loss 2.5225): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1250/1250 [07:33<00:00, 2.76it/s]
tokenizer config file saved in /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-1T/tinyllama-1T-s3-Q1-10000/tokenizer_config.json
Special tokens file saved in /aifs4su/hansirui_1st/jiayi/setting3-imdb/tinyllama-1T/tinyllama-1T-s3-Q1-10000/special_tokens_map.json
wandb: ERROR Problem finishing run
Exception ignored in atexit callback: <bound method rank_zero_only.<locals>.wrapper of <safe_rlhf.logger.Logger object at 0x1551f23ba750>>
Traceback (most recent call last):
File "/home/hansirui_1st/jiayi/resist/setting3/safe_rlhf/utils.py", line 212, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/hansirui_1st/jiayi/resist/setting3/safe_rlhf/logger.py", line 183, in close
self.wandb.finish()
File "/aifs4su/hansirui_1st/miniconda3/envs/jy-resist/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 406, in wrapper
return func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/aifs4su/hansirui_1st/miniconda3/envs/jy-resist/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 503, in wrapper
return func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/aifs4su/hansirui_1st/miniconda3/envs/jy-resist/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 451, in wrapper
return func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/aifs4su/hansirui_1st/miniconda3/envs/jy-resist/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 2309, in finish
return self._finish(exit_code)
^^^^^^^^^^^^^^^^^^^^^^^
File "/aifs4su/hansirui_1st/miniconda3/envs/jy-resist/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 406, in wrapper
return func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/aifs4su/hansirui_1st/miniconda3/envs/jy-resist/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 2337, in _finish
self._atexit_cleanup(exit_code=exit_code)
File "/aifs4su/hansirui_1st/miniconda3/envs/jy-resist/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 2550, in _atexit_cleanup
self._on_finish()
File "/aifs4su/hansirui_1st/miniconda3/envs/jy-resist/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 2806, in _on_finish
wait_with_progress(
File "/aifs4su/hansirui_1st/miniconda3/envs/jy-resist/lib/python3.11/site-packages/wandb/sdk/mailbox/wait_with_progress.py", line 24, in wait_with_progress
return wait_all_with_progress(
^^^^^^^^^^^^^^^^^^^^^^^
File "/aifs4su/hansirui_1st/miniconda3/envs/jy-resist/lib/python3.11/site-packages/wandb/sdk/mailbox/wait_with_progress.py", line 87, in wait_all_with_progress
return asyncio_compat.run(progress_loop_with_timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/aifs4su/hansirui_1st/miniconda3/envs/jy-resist/lib/python3.11/site-packages/wandb/sdk/lib/asyncio_compat.py", line 27, in run
future = executor.submit(runner.run, fn)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/aifs4su/hansirui_1st/miniconda3/envs/jy-resist/lib/python3.11/concurrent/futures/thread.py", line 169, in submit
raise RuntimeError('cannot schedule new futures after '
RuntimeError: cannot schedule new futures after interpreter shutdown