EN3S commited on Apr 5, 2025

Commit

981200b

verified ·

1 Parent(s): 4b2b0f3

Training in progress, epoch 3

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

README.md +65 -0
config.json +26 -0
model.safetensors +3 -0
run-0/checkpoint-117/config.json +26 -0
run-0/checkpoint-117/model.safetensors +3 -0
run-0/checkpoint-117/optimizer.pt +3 -0
run-0/checkpoint-117/rng_state.pth +3 -0
run-0/checkpoint-117/scheduler.pt +3 -0
run-0/checkpoint-117/special_tokens_map.json +7 -0
run-0/checkpoint-117/tokenizer.json +0 -0
run-0/checkpoint-117/tokenizer_config.json +56 -0
run-0/checkpoint-117/trainer_state.json +144 -0
run-0/checkpoint-117/training_args.bin +3 -0
run-0/checkpoint-117/vocab.txt +0 -0
run-0/checkpoint-156/config.json +26 -0
run-0/checkpoint-156/model.safetensors +3 -0
run-0/checkpoint-156/optimizer.pt +3 -0
run-0/checkpoint-156/rng_state.pth +3 -0
run-0/checkpoint-156/scheduler.pt +3 -0
run-0/checkpoint-156/special_tokens_map.json +7 -0
run-0/checkpoint-156/tokenizer.json +0 -0
run-0/checkpoint-156/tokenizer_config.json +56 -0
run-0/checkpoint-156/trainer_state.json +181 -0
run-0/checkpoint-156/training_args.bin +3 -0
run-0/checkpoint-156/vocab.txt +0 -0
run-0/checkpoint-195/config.json +26 -0
run-0/checkpoint-195/model.safetensors +3 -0
run-0/checkpoint-195/optimizer.pt +3 -0
run-0/checkpoint-195/rng_state.pth +3 -0
run-0/checkpoint-195/scheduler.pt +3 -0
run-0/checkpoint-195/special_tokens_map.json +7 -0
run-0/checkpoint-195/tokenizer.json +0 -0
run-0/checkpoint-195/tokenizer_config.json +56 -0
run-0/checkpoint-195/trainer_state.json +218 -0
run-0/checkpoint-195/training_args.bin +3 -0
run-0/checkpoint-195/vocab.txt +0 -0
run-0/checkpoint-39/config.json +26 -0
run-0/checkpoint-39/model.safetensors +3 -0
run-0/checkpoint-39/optimizer.pt +3 -0
run-0/checkpoint-39/rng_state.pth +3 -0
run-0/checkpoint-39/scheduler.pt +3 -0
run-0/checkpoint-39/special_tokens_map.json +7 -0
run-0/checkpoint-39/tokenizer.json +0 -0
run-0/checkpoint-39/tokenizer_config.json +56 -0
run-0/checkpoint-39/trainer_state.json +70 -0
run-0/checkpoint-39/training_args.bin +3 -0
run-0/checkpoint-39/vocab.txt +0 -0
run-0/checkpoint-78/config.json +26 -0
run-0/checkpoint-78/model.safetensors +3 -0
run-0/checkpoint-78/optimizer.pt +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,65 @@

+---
+library_name: transformers
+license: apache-2.0
+base_model: bert-base-uncased
+tags:
+- generated_from_trainer
+metrics:
+- accuracy
+model-index:
+- name: NLP_Assignment_2
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# NLP_Assignment_2
+This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.7257
+- Accuracy: 0.6570
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0001
+- train_batch_size: 64
+- eval_batch_size: 64
+- seed: 42
+- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: linear
+- num_epochs: 5
+### Training results
+| Training Loss | Epoch | Step | Validation Loss | Accuracy |
+|:-------------:|:-----:|:----:|:---------------:|:--------:|
+| 0.6777        | 1.0   | 39   | 0.6829          | 0.5343   |
+| 0.5889        | 2.0   | 78   | 0.6329          | 0.6318   |
+| 0.3605        | 3.0   | 117  | 0.7257          | 0.6570   |
+| 0.1758        | 4.0   | 156  | 1.0552          | 0.6354   |
+| 0.079         | 5.0   | 195  | 1.2655          | 0.6570   |
+### Framework versions
+- Transformers 4.50.3
+- Pytorch 2.6.0+cu124
+- Datasets 3.5.0
+- Tokenizers 0.21.1

config.json ADDED Viewed

	@@ -0,0 +1,26 @@

+{
+  "architectures": [
+    "BertForSequenceClassification"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "classifier_dropout": null,
+  "gradient_checkpointing": false,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "problem_type": "single_label_classification",
+  "torch_dtype": "float32",
+  "transformers_version": "4.50.3",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 30522
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e5720038bc45e8ba58f1d28a144da3b7cf0c5e5809347780927dc650d2ccebed
+size 437958648

run-0/checkpoint-117/config.json ADDED Viewed

	@@ -0,0 +1,26 @@

+{
+  "architectures": [
+    "BertForSequenceClassification"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "classifier_dropout": null,
+  "gradient_checkpointing": false,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "problem_type": "single_label_classification",
+  "torch_dtype": "float32",
+  "transformers_version": "4.50.3",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 30522
+}

run-0/checkpoint-117/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:17e0c9d1a3cf96b438c626370ec0758ae280b04631d45470154b5eca1293573f
+size 437958648

run-0/checkpoint-117/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:02f3dffe3a080aec80d4aa45517d6cb1c8020dc49d3393ae96f05506fb56d8d1
+size 876038394

run-0/checkpoint-117/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:066817b2001cdf2cab3204d72b7658f8308ed56a8eab94345bd5ce0742b9b7f7
+size 14244

run-0/checkpoint-117/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5b52c2b12734a8e47563cebc4f66b329836ea028b2a85fbfd91dadd377531bfe
+size 1064

run-0/checkpoint-117/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "cls_token": "[CLS]",
+  "mask_token": "[MASK]",
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "unk_token": "[UNK]"
+}

run-0/checkpoint-117/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

run-0/checkpoint-117/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,56 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": false,
+  "cls_token": "[CLS]",
+  "do_lower_case": true,
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "model_max_length": 512,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "unk_token": "[UNK]"
+}

run-0/checkpoint-117/trainer_state.json ADDED Viewed

	@@ -0,0 +1,144 @@

+{
+  "best_global_step": 78,
+  "best_metric": 0.6714801444043321,
+  "best_model_checkpoint": "bert-base-uncased-finetuned-rte-run_14/run-0/checkpoint-78",
+  "epoch": 3.0,
+  "eval_steps": 500,
+  "global_step": 117,
+  "is_hyper_param_search": true,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.2564102564102564,
+      "grad_norm": 1.662625789642334,
+      "learning_rate": 9.487179487179487e-05,
+      "loss": 0.696,
+      "step": 10
+    },
+    {
+      "epoch": 0.5128205128205128,
+      "grad_norm": 2.0300352573394775,
+      "learning_rate": 8.974358974358975e-05,
+      "loss": 0.6793,
+      "step": 20
+    },
+    {
+      "epoch": 0.7692307692307693,
+      "grad_norm": 4.492157936096191,
+      "learning_rate": 8.461538461538461e-05,
+      "loss": 0.6499,
+      "step": 30
+    },
+    {
+      "epoch": 1.0,
+      "eval_accuracy": 0.631768953068592,
+      "eval_loss": 0.6312768459320068,
+      "eval_runtime": 0.6623,
+      "eval_samples_per_second": 418.266,
+      "eval_steps_per_second": 7.55,
+      "step": 39
+    },
+    {
+      "epoch": 1.0256410256410255,
+      "grad_norm": 3.4885644912719727,
+      "learning_rate": 7.948717948717948e-05,
+      "loss": 0.6793,
+      "step": 40
+    },
+    {
+      "epoch": 1.282051282051282,
+      "grad_norm": 5.2225494384765625,
+      "learning_rate": 7.435897435897436e-05,
+      "loss": 0.5596,
+      "step": 50
+    },
+    {
+      "epoch": 1.5384615384615383,
+      "grad_norm": 6.484560489654541,
+      "learning_rate": 6.923076923076924e-05,
+      "loss": 0.5713,
+      "step": 60
+    },
+    {
+      "epoch": 1.7948717948717947,
+      "grad_norm": 4.836739540100098,
+      "learning_rate": 6.410256410256412e-05,
+      "loss": 0.545,
+      "step": 70
+    },
+    {
+      "epoch": 2.0,
+      "eval_accuracy": 0.6714801444043321,
+      "eval_loss": 0.658456563949585,
+      "eval_runtime": 0.6622,
+      "eval_samples_per_second": 418.306,
+      "eval_steps_per_second": 7.551,
+      "step": 78
+    },
+    {
+      "epoch": 2.051282051282051,
+      "grad_norm": 6.515610218048096,
+      "learning_rate": 5.897435897435898e-05,
+      "loss": 0.4786,
+      "step": 80
+    },
+    {
+      "epoch": 2.3076923076923075,
+      "grad_norm": 5.974998950958252,
+      "learning_rate": 5.384615384615385e-05,
+      "loss": 0.3373,
+      "step": 90
+    },
+    {
+      "epoch": 2.564102564102564,
+      "grad_norm": 2.976608991622925,
+      "learning_rate": 4.871794871794872e-05,
+      "loss": 0.3314,
+      "step": 100
+    },
+    {
+      "epoch": 2.8205128205128203,
+      "grad_norm": 3.50764799118042,
+      "learning_rate": 4.358974358974359e-05,
+      "loss": 0.3235,
+      "step": 110
+    },
+    {
+      "epoch": 3.0,
+      "eval_accuracy": 0.6714801444043321,
+      "eval_loss": 0.7251453399658203,
+      "eval_runtime": 0.6621,
+      "eval_samples_per_second": 418.365,
+      "eval_steps_per_second": 7.552,
+      "step": 117
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 195,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 5,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 718007458971120.0,
+  "train_batch_size": 64,
+  "trial_name": null,
+  "trial_params": {
+    "dropout_rate": 0.01,
+    "learning_rate": 0.0001,
+    "max_length": 32,
+    "num_train_epochs": 5,
+    "per_device_train_batch_size": 64
+  }
+}

run-0/checkpoint-117/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b2aa20791cd3401b748110a053f719d6902e4d9ccc845f2f5d2ff250a3d27441
+size 5432

run-0/checkpoint-117/vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

run-0/checkpoint-156/config.json ADDED Viewed

	@@ -0,0 +1,26 @@

+{
+  "architectures": [
+    "BertForSequenceClassification"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "classifier_dropout": null,
+  "gradient_checkpointing": false,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "problem_type": "single_label_classification",
+  "torch_dtype": "float32",
+  "transformers_version": "4.50.3",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 30522
+}

run-0/checkpoint-156/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:45363ce679a8dfd6a6ce8f3513e67b5693b6d30b7c4329ec9c084a47504e9ba8
+size 437958648

run-0/checkpoint-156/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8ebd079703cd72b12a422caae45df454bcdc3dda626ba153bd836afb84b1093d
+size 876038394

run-0/checkpoint-156/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0f61eb961c8bdfdb65315b87a5752740304715f4131aaf57d9e9514dcd94c88a
+size 14244

run-0/checkpoint-156/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:64871ea17abfaf974175c856702e9195f2d949b9a3207a0265bff73135f4adeb
+size 1064

run-0/checkpoint-156/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "cls_token": "[CLS]",
+  "mask_token": "[MASK]",
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "unk_token": "[UNK]"
+}

run-0/checkpoint-156/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

run-0/checkpoint-156/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,56 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": false,
+  "cls_token": "[CLS]",
+  "do_lower_case": true,
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "model_max_length": 512,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "unk_token": "[UNK]"
+}

run-0/checkpoint-156/trainer_state.json ADDED Viewed

	@@ -0,0 +1,181 @@

+{
+  "best_global_step": 156,
+  "best_metric": 0.7003610108303249,
+  "best_model_checkpoint": "bert-base-uncased-finetuned-rte-run_14/run-0/checkpoint-156",
+  "epoch": 4.0,
+  "eval_steps": 500,
+  "global_step": 156,
+  "is_hyper_param_search": true,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.2564102564102564,
+      "grad_norm": 1.662625789642334,
+      "learning_rate": 9.487179487179487e-05,
+      "loss": 0.696,
+      "step": 10
+    },
+    {
+      "epoch": 0.5128205128205128,
+      "grad_norm": 2.0300352573394775,
+      "learning_rate": 8.974358974358975e-05,
+      "loss": 0.6793,
+      "step": 20
+    },
+    {
+      "epoch": 0.7692307692307693,
+      "grad_norm": 4.492157936096191,
+      "learning_rate": 8.461538461538461e-05,
+      "loss": 0.6499,
+      "step": 30
+    },
+    {
+      "epoch": 1.0,
+      "eval_accuracy": 0.631768953068592,
+      "eval_loss": 0.6312768459320068,
+      "eval_runtime": 0.6623,
+      "eval_samples_per_second": 418.266,
+      "eval_steps_per_second": 7.55,
+      "step": 39
+    },
+    {
+      "epoch": 1.0256410256410255,
+      "grad_norm": 3.4885644912719727,
+      "learning_rate": 7.948717948717948e-05,
+      "loss": 0.6793,
+      "step": 40
+    },
+    {
+      "epoch": 1.282051282051282,
+      "grad_norm": 5.2225494384765625,
+      "learning_rate": 7.435897435897436e-05,
+      "loss": 0.5596,
+      "step": 50
+    },
+    {
+      "epoch": 1.5384615384615383,
+      "grad_norm": 6.484560489654541,
+      "learning_rate": 6.923076923076924e-05,
+      "loss": 0.5713,
+      "step": 60
+    },
+    {
+      "epoch": 1.7948717948717947,
+      "grad_norm": 4.836739540100098,
+      "learning_rate": 6.410256410256412e-05,
+      "loss": 0.545,
+      "step": 70
+    },
+    {
+      "epoch": 2.0,
+      "eval_accuracy": 0.6714801444043321,
+      "eval_loss": 0.658456563949585,
+      "eval_runtime": 0.6622,
+      "eval_samples_per_second": 418.306,
+      "eval_steps_per_second": 7.551,
+      "step": 78
+    },
+    {
+      "epoch": 2.051282051282051,
+      "grad_norm": 6.515610218048096,
+      "learning_rate": 5.897435897435898e-05,
+      "loss": 0.4786,
+      "step": 80
+    },
+    {
+      "epoch": 2.3076923076923075,
+      "grad_norm": 5.974998950958252,
+      "learning_rate": 5.384615384615385e-05,
+      "loss": 0.3373,
+      "step": 90
+    },
+    {
+      "epoch": 2.564102564102564,
+      "grad_norm": 2.976608991622925,
+      "learning_rate": 4.871794871794872e-05,
+      "loss": 0.3314,
+      "step": 100
+    },
+    {
+      "epoch": 2.8205128205128203,
+      "grad_norm": 3.50764799118042,
+      "learning_rate": 4.358974358974359e-05,
+      "loss": 0.3235,
+      "step": 110
+    },
+    {
+      "epoch": 3.0,
+      "eval_accuracy": 0.6714801444043321,
+      "eval_loss": 0.7251453399658203,
+      "eval_runtime": 0.6621,
+      "eval_samples_per_second": 418.365,
+      "eval_steps_per_second": 7.552,
+      "step": 117
+    },
+    {
+      "epoch": 3.076923076923077,
+      "grad_norm": 3.907212495803833,
+      "learning_rate": 3.846153846153846e-05,
+      "loss": 0.2728,
+      "step": 120
+    },
+    {
+      "epoch": 3.3333333333333335,
+      "grad_norm": 7.000370979309082,
+      "learning_rate": 3.3333333333333335e-05,
+      "loss": 0.1829,
+      "step": 130
+    },
+    {
+      "epoch": 3.58974358974359,
+      "grad_norm": 7.436763763427734,
+      "learning_rate": 2.8205128205128207e-05,
+      "loss": 0.1877,
+      "step": 140
+    },
+    {
+      "epoch": 3.8461538461538463,
+      "grad_norm": 7.767152786254883,
+      "learning_rate": 2.307692307692308e-05,
+      "loss": 0.1335,
+      "step": 150
+    },
+    {
+      "epoch": 4.0,
+      "eval_accuracy": 0.7003610108303249,
+      "eval_loss": 0.9089646935462952,
+      "eval_runtime": 0.6606,
+      "eval_samples_per_second": 419.294,
+      "eval_steps_per_second": 7.568,
+      "step": 156
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 195,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 5,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 971430683050560.0,
+  "train_batch_size": 64,
+  "trial_name": null,
+  "trial_params": {
+    "dropout_rate": 0.01,
+    "learning_rate": 0.0001,
+    "max_length": 32,
+    "num_train_epochs": 5,
+    "per_device_train_batch_size": 64
+  }
+}

run-0/checkpoint-156/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b2aa20791cd3401b748110a053f719d6902e4d9ccc845f2f5d2ff250a3d27441
+size 5432

run-0/checkpoint-156/vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

run-0/checkpoint-195/config.json ADDED Viewed

	@@ -0,0 +1,26 @@

+{
+  "architectures": [
+    "BertForSequenceClassification"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "classifier_dropout": null,
+  "gradient_checkpointing": false,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "problem_type": "single_label_classification",
+  "torch_dtype": "float32",
+  "transformers_version": "4.50.3",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 30522
+}

run-0/checkpoint-195/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:907a61c5110bff68ed9f0caef889798fd8ce40f6a82b7804de5b168400b570ac
+size 437958648

run-0/checkpoint-195/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:17f4a91051c39b0ba32801cccb51b0f2db3668fce2d1a4d70c6963b3b7cc3efe
+size 876038394

run-0/checkpoint-195/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6bbb6e5a1853917bf71d3d48a24e968159b0799ccecda9429d3e1eac0a721ce5
+size 14244

run-0/checkpoint-195/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7535d6d6d3346211338a559c66a34e5433ea456734f0f5c94e8703828d95ba57
+size 1064

run-0/checkpoint-195/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "cls_token": "[CLS]",
+  "mask_token": "[MASK]",
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "unk_token": "[UNK]"
+}

run-0/checkpoint-195/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

run-0/checkpoint-195/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,56 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": false,
+  "cls_token": "[CLS]",
+  "do_lower_case": true,
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "model_max_length": 512,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "unk_token": "[UNK]"
+}

run-0/checkpoint-195/trainer_state.json ADDED Viewed

	@@ -0,0 +1,218 @@

+{
+  "best_global_step": 195,
+  "best_metric": 0.7111913357400722,
+  "best_model_checkpoint": "bert-base-uncased-finetuned-rte-run_14/run-0/checkpoint-195",
+  "epoch": 5.0,
+  "eval_steps": 500,
+  "global_step": 195,
+  "is_hyper_param_search": true,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.2564102564102564,
+      "grad_norm": 1.662625789642334,
+      "learning_rate": 9.487179487179487e-05,
+      "loss": 0.696,
+      "step": 10
+    },
+    {
+      "epoch": 0.5128205128205128,
+      "grad_norm": 2.0300352573394775,
+      "learning_rate": 8.974358974358975e-05,
+      "loss": 0.6793,
+      "step": 20
+    },
+    {
+      "epoch": 0.7692307692307693,
+      "grad_norm": 4.492157936096191,
+      "learning_rate": 8.461538461538461e-05,
+      "loss": 0.6499,
+      "step": 30
+    },
+    {
+      "epoch": 1.0,
+      "eval_accuracy": 0.631768953068592,
+      "eval_loss": 0.6312768459320068,
+      "eval_runtime": 0.6623,
+      "eval_samples_per_second": 418.266,
+      "eval_steps_per_second": 7.55,
+      "step": 39
+    },
+    {
+      "epoch": 1.0256410256410255,
+      "grad_norm": 3.4885644912719727,
+      "learning_rate": 7.948717948717948e-05,
+      "loss": 0.6793,
+      "step": 40
+    },
+    {
+      "epoch": 1.282051282051282,
+      "grad_norm": 5.2225494384765625,
+      "learning_rate": 7.435897435897436e-05,
+      "loss": 0.5596,
+      "step": 50
+    },
+    {
+      "epoch": 1.5384615384615383,
+      "grad_norm": 6.484560489654541,
+      "learning_rate": 6.923076923076924e-05,
+      "loss": 0.5713,
+      "step": 60
+    },
+    {
+      "epoch": 1.7948717948717947,
+      "grad_norm": 4.836739540100098,
+      "learning_rate": 6.410256410256412e-05,
+      "loss": 0.545,
+      "step": 70
+    },
+    {
+      "epoch": 2.0,
+      "eval_accuracy": 0.6714801444043321,
+      "eval_loss": 0.658456563949585,
+      "eval_runtime": 0.6622,
+      "eval_samples_per_second": 418.306,
+      "eval_steps_per_second": 7.551,
+      "step": 78
+    },
+    {
+      "epoch": 2.051282051282051,
+      "grad_norm": 6.515610218048096,
+      "learning_rate": 5.897435897435898e-05,
+      "loss": 0.4786,
+      "step": 80
+    },
+    {
+      "epoch": 2.3076923076923075,
+      "grad_norm": 5.974998950958252,
+      "learning_rate": 5.384615384615385e-05,
+      "loss": 0.3373,
+      "step": 90
+    },
+    {
+      "epoch": 2.564102564102564,
+      "grad_norm": 2.976608991622925,
+      "learning_rate": 4.871794871794872e-05,
+      "loss": 0.3314,
+      "step": 100
+    },
+    {
+      "epoch": 2.8205128205128203,
+      "grad_norm": 3.50764799118042,
+      "learning_rate": 4.358974358974359e-05,
+      "loss": 0.3235,
+      "step": 110
+    },
+    {
+      "epoch": 3.0,
+      "eval_accuracy": 0.6714801444043321,
+      "eval_loss": 0.7251453399658203,
+      "eval_runtime": 0.6621,
+      "eval_samples_per_second": 418.365,
+      "eval_steps_per_second": 7.552,
+      "step": 117
+    },
+    {
+      "epoch": 3.076923076923077,
+      "grad_norm": 3.907212495803833,
+      "learning_rate": 3.846153846153846e-05,
+      "loss": 0.2728,
+      "step": 120
+    },
+    {
+      "epoch": 3.3333333333333335,
+      "grad_norm": 7.000370979309082,
+      "learning_rate": 3.3333333333333335e-05,
+      "loss": 0.1829,
+      "step": 130
+    },
+    {
+      "epoch": 3.58974358974359,
+      "grad_norm": 7.436763763427734,
+      "learning_rate": 2.8205128205128207e-05,
+      "loss": 0.1877,
+      "step": 140
+    },
+    {
+      "epoch": 3.8461538461538463,
+      "grad_norm": 7.767152786254883,
+      "learning_rate": 2.307692307692308e-05,
+      "loss": 0.1335,
+      "step": 150
+    },
+    {
+      "epoch": 4.0,
+      "eval_accuracy": 0.7003610108303249,
+      "eval_loss": 0.9089646935462952,
+      "eval_runtime": 0.6606,
+      "eval_samples_per_second": 419.294,
+      "eval_steps_per_second": 7.568,
+      "step": 156
+    },
+    {
+      "epoch": 4.102564102564102,
+      "grad_norm": 2.6948187351226807,
+      "learning_rate": 1.794871794871795e-05,
+      "loss": 0.1229,
+      "step": 160
+    },
+    {
+      "epoch": 4.358974358974359,
+      "grad_norm": 3.5418930053710938,
+      "learning_rate": 1.282051282051282e-05,
+      "loss": 0.0868,
+      "step": 170
+    },
+    {
+      "epoch": 4.615384615384615,
+      "grad_norm": 6.394577980041504,
+      "learning_rate": 7.692307692307694e-06,
+      "loss": 0.0624,
+      "step": 180
+    },
+    {
+      "epoch": 4.871794871794872,
+      "grad_norm": 7.906170845031738,
+      "learning_rate": 2.564102564102564e-06,
+      "loss": 0.0608,
+      "step": 190
+    },
+    {
+      "epoch": 5.0,
+      "eval_accuracy": 0.7111913357400722,
+      "eval_loss": 1.0780714750289917,
+      "eval_runtime": 0.6628,
+      "eval_samples_per_second": 417.893,
+      "eval_steps_per_second": 7.543,
+      "step": 195
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 195,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 5,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 1230830433641400.0,
+  "train_batch_size": 64,
+  "trial_name": null,
+  "trial_params": {
+    "dropout_rate": 0.01,
+    "learning_rate": 0.0001,
+    "max_length": 32,
+    "num_train_epochs": 5,
+    "per_device_train_batch_size": 64
+  }
+}

run-0/checkpoint-195/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b2aa20791cd3401b748110a053f719d6902e4d9ccc845f2f5d2ff250a3d27441
+size 5432

run-0/checkpoint-195/vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

run-0/checkpoint-39/config.json ADDED Viewed

	@@ -0,0 +1,26 @@

+{
+  "architectures": [
+    "BertForSequenceClassification"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "classifier_dropout": null,
+  "gradient_checkpointing": false,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "problem_type": "single_label_classification",
+  "torch_dtype": "float32",
+  "transformers_version": "4.50.3",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 30522
+}

run-0/checkpoint-39/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f8ec7f7b3ec4f4e47c64e07300ea6845153110e55de857fc61b53b22abee3d62
+size 437958648

run-0/checkpoint-39/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3d5cab3efc7529f533d1e9c8138407beef77cc54df5f93dea4c9b2ef07d9646c
+size 876038394

run-0/checkpoint-39/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9ce2001d6c41d462c4a530df5214c4ba6ac04088f8883ec9b91629a00a7da50d
+size 14244

run-0/checkpoint-39/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d99f0741d1b8c0fb2ef672037883ae1152cbbf2c3bb454d16b7df9a7ccf7f447
+size 1064

run-0/checkpoint-39/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "cls_token": "[CLS]",
+  "mask_token": "[MASK]",
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "unk_token": "[UNK]"
+}

run-0/checkpoint-39/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

run-0/checkpoint-39/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,56 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": false,
+  "cls_token": "[CLS]",
+  "do_lower_case": true,
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "model_max_length": 512,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "unk_token": "[UNK]"
+}

run-0/checkpoint-39/trainer_state.json ADDED Viewed

	@@ -0,0 +1,70 @@

+{
+  "best_global_step": 39,
+  "best_metric": 0.631768953068592,
+  "best_model_checkpoint": "bert-base-uncased-finetuned-rte-run_14/run-0/checkpoint-39",
+  "epoch": 1.0,
+  "eval_steps": 500,
+  "global_step": 39,
+  "is_hyper_param_search": true,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.2564102564102564,
+      "grad_norm": 1.662625789642334,
+      "learning_rate": 9.487179487179487e-05,
+      "loss": 0.696,
+      "step": 10
+    },
+    {
+      "epoch": 0.5128205128205128,
+      "grad_norm": 2.0300352573394775,
+      "learning_rate": 8.974358974358975e-05,
+      "loss": 0.6793,
+      "step": 20
+    },
+    {
+      "epoch": 0.7692307692307693,
+      "grad_norm": 4.492157936096191,
+      "learning_rate": 8.461538461538461e-05,
+      "loss": 0.6499,
+      "step": 30
+    },
+    {
+      "epoch": 1.0,
+      "eval_accuracy": 0.631768953068592,
+      "eval_loss": 0.6312768459320068,
+      "eval_runtime": 0.6623,
+      "eval_samples_per_second": 418.266,
+      "eval_steps_per_second": 7.55,
+      "step": 39
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 195,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 5,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 194932403139840.0,
+  "train_batch_size": 64,
+  "trial_name": null,
+  "trial_params": {
+    "dropout_rate": 0.01,
+    "learning_rate": 0.0001,
+    "max_length": 32,
+    "num_train_epochs": 5,
+    "per_device_train_batch_size": 64
+  }
+}

run-0/checkpoint-39/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b2aa20791cd3401b748110a053f719d6902e4d9ccc845f2f5d2ff250a3d27441
+size 5432

run-0/checkpoint-39/vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

run-0/checkpoint-78/config.json ADDED Viewed

	@@ -0,0 +1,26 @@

+{
+  "architectures": [
+    "BertForSequenceClassification"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "classifier_dropout": null,
+  "gradient_checkpointing": false,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "problem_type": "single_label_classification",
+  "torch_dtype": "float32",
+  "transformers_version": "4.50.3",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 30522
+}

run-0/checkpoint-78/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:87efa456251f68adc0b2b9363c9086483d78108b5a3a35553d7869669813f8d9
+size 437958648

run-0/checkpoint-78/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:552491f5bb81693240c1212a8d55a754eab07995de8d771ad0c53d9454e1384d
+size 876038394