End of training

Browse files

Files changed (9) hide show

.ipynb_checkpoints/README-checkpoint.md +87 -0
.ipynb_checkpoints/config-checkpoint.json +39 -0
README.md +37 -64
config.json +7 -7
generation_config.json +2 -2
model.safetensors +2 -2
tokenizer.json +0 -0
tokenizer_config.json +1 -1
training_args.bin +1 -1

.ipynb_checkpoints/README-checkpoint.md ADDED Viewed

	@@ -0,0 +1,87 @@

+---
+license: mit
+base_model: gpt2-large
+tags:
+- generated_from_trainer
+model-index:
+- name: tiq
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# tiq
+This model is a fine-tuned version of [gpt2-large](https://huggingface.co/gpt2-large) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 5.5477
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 5e-05
+- train_batch_size: 8
+- eval_batch_size: 8
+- seed: 42
+- gradient_accumulation_steps: 8
+- total_train_batch_size: 64
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 200
+- num_epochs: 1
+- mixed_precision_training: Native AMP
+### Training results
+| Training Loss | Epoch | Step | Validation Loss |
+|:-------------:|:-----:|:----:|:---------------:|
+| 6.2342        | 0.04  | 100  | 6.1857          |
+| 5.7599        | 0.07  | 200  | 5.7751          |
+| 5.7433        | 0.11  | 300  | 5.7142          |
+| 5.6021        | 0.15  | 400  | 5.6776          |
+| 5.5084        | 0.18  | 500  | 5.6349          |
+| 5.3825        | 0.22  | 600  | 5.6201          |
+| 5.6698        | 0.26  | 700  | 5.5831          |
+| 5.4089        | 0.29  | 800  | 5.5687          |
+| 5.601         | 0.33  | 900  | 5.5574          |
+| 5.4708        | 0.37  | 1000 | 5.5555          |
+| 5.5956        | 0.4   | 1100 | 5.5520          |
+| 5.4704        | 0.44  | 1200 | 5.5494          |
+| 5.4824        | 0.47  | 1300 | 5.5502          |
+| 5.589         | 0.51  | 1400 | 5.5478          |
+| 5.5612        | 0.55  | 1500 | 5.5456          |
+| 5.4741        | 0.58  | 1600 | 5.5430          |
+| 5.463         | 0.62  | 1700 | 5.5426          |
+| 5.5071        | 0.66  | 1800 | 5.5424          |
+| 5.5469        | 0.69  | 1900 | 5.5419          |
+| 5.4266        | 0.73  | 2000 | 5.5428          |
+| 5.4848        | 0.77  | 2100 | 5.5438          |
+| 5.5069        | 0.8   | 2200 | 5.5446          |
+| 5.5885        | 0.84  | 2300 | 5.5469          |
+| 5.4484        | 0.88  | 2400 | 5.5462          |
+| 5.3859        | 0.91  | 2500 | 5.5475          |
+| 5.465         | 0.95  | 2600 | 5.5476          |
+| 5.4355        | 0.99  | 2700 | 5.5477          |
+### Framework versions
+- Transformers 4.39.3
+- Pytorch 2.2.0+cu121
+- Datasets 2.18.0
+- Tokenizers 0.15.2

.ipynb_checkpoints/config-checkpoint.json ADDED Viewed

	@@ -0,0 +1,39 @@

+{
+  "_name_or_path": "gpt2-large",
+  "activation_function": "gelu_new",
+  "architectures": [
+    "GPT2LMHeadModel"
+  ],
+  "attn_pdrop": 0.1,
+  "bos_token_id": 11012,
+  "embd_pdrop": 0.1,
+  "eos_token_id": 11013,
+  "initializer_range": 0.02,
+  "layer_norm_epsilon": 1e-05,
+  "model_type": "gpt2",
+  "n_ctx": 720,
+  "n_embd": 1280,
+  "n_head": 20,
+  "n_inner": null,
+  "n_layer": 36,
+  "n_positions": 1024,
+  "reorder_and_upcast_attn": false,
+  "resid_pdrop": 0.1,
+  "scale_attn_by_inverse_layer_idx": false,
+  "scale_attn_weights": true,
+  "summary_activation": null,
+  "summary_first_dropout": 0.1,
+  "summary_proj_to_labels": true,
+  "summary_type": "cls_index",
+  "summary_use_proj": true,
+  "task_specific_params": {
+    "text-generation": {
+      "do_sample": true,
+      "max_length": 50
+    }
+  },
+  "torch_dtype": "float32",
+  "transformers_version": "4.39.3",
+  "use_cache": true,
+  "vocab_size": 11015
+}

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 license: mit
-base_model: gpt2
 tags:
 - generated_from_trainer
 model-index:
@@ -13,9 +13,9 @@ should probably proofread and complete it, then remove this comment. -->
 # tiq
-This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 6.5928
 ## Model description
@@ -34,76 +34,49 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 1e-05
-- train_batch_size: 32
-- eval_batch_size: 32
 - seed: 42
-- gradient_accumulation_steps: 4
-- total_train_batch_size: 128
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
-- lr_scheduler_warmup_steps: 500
-- num_epochs: 2
 - mixed_precision_training: Native AMP
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
-| 10.1294       | 0.04  | 100  | 10.1406         |
-| 9.5842        | 0.07  | 200  | 9.6140          |
-| 8.8547        | 0.11  | 300  | 8.8427          |
-| 8.0654        | 0.15  | 400  | 8.0300          |
-| 7.5124        | 0.18  | 500  | 7.4592          |
-| 7.3008        | 0.22  | 600  | 7.1798          |
-| 7.0833        | 0.26  | 700  | 7.0491          |
-| 7.0046        | 0.29  | 800  | 6.9639          |
-| 6.9422        | 0.33  | 900  | 6.9085          |
-| 6.9551        | 0.37  | 1000 | 6.8625          |
-| 6.8536        | 0.4   | 1100 | 6.8373          |
-| 6.8439        | 0.44  | 1200 | 6.8129          |
-| 6.7857        | 0.47  | 1300 | 6.7906          |
-| 6.8318        | 0.51  | 1400 | 6.7714          |
-| 6.7894        | 0.55  | 1500 | 6.7490          |
-| 6.6932        | 0.58  | 1600 | 6.7315          |
-| 6.7018        | 0.62  | 1700 | 6.7125          |
-| 6.671         | 0.66  | 1800 | 6.7045          |
-| 6.7686        | 0.69  | 1900 | 6.6873          |
-| 6.7236        | 0.73  | 2000 | 6.6767          |
-| 6.7334        | 0.77  | 2100 | 6.6702          |
-| 6.7135        | 0.8   | 2200 | 6.6624          |
-| 6.7133        | 0.84  | 2300 | 6.6653          |
-| 6.6295        | 0.88  | 2400 | 6.6520          |
-| 6.6343        | 0.91  | 2500 | 6.6496          |
-| 6.5874        | 0.95  | 2600 | 6.6456          |
-| 6.641         | 0.99  | 2700 | 6.6427          |
-| 6.59          | 1.02  | 2800 | 6.6377          |
-| 6.5958        | 1.06  | 2900 | 6.6378          |
-| 6.7154        | 1.1   | 3000 | 6.6313          |
-| 6.6053        | 1.13  | 3100 | 6.6305          |
-| 6.6077        | 1.17  | 3200 | 6.6242          |
-| 6.5719        | 1.21  | 3300 | 6.6202          |
-| 6.6981        | 1.24  | 3400 | 6.6228          |
-| 6.5717        | 1.28  | 3500 | 6.6177          |
-| 6.5864        | 1.31  | 3600 | 6.6139          |
-| 6.6584        | 1.35  | 3700 | 6.6109          |
-| 6.5598        | 1.39  | 3800 | 6.6103          |
-| 6.6571        | 1.42  | 3900 | 6.6063          |
-| 6.6377        | 1.46  | 4000 | 6.6039          |
-| 6.6071        | 1.5   | 4100 | 6.6025          |
-| 6.5311        | 1.53  | 4200 | 6.5994          |
-| 6.6616        | 1.57  | 4300 | 6.6000          |
-| 6.5725        | 1.61  | 4400 | 6.5976          |
-| 6.5851        | 1.64  | 4500 | 6.5963          |
-| 6.5723        | 1.68  | 4600 | 6.5952          |
-| 6.5369        | 1.72  | 4700 | 6.5951          |
-| 6.5928        | 1.75  | 4800 | 6.5950          |
-| 6.5366        | 1.79  | 4900 | 6.5940          |
-| 6.5188        | 1.83  | 5000 | 6.5932          |
-| 6.6146        | 1.86  | 5100 | 6.5929          |
-| 6.5728        | 1.9   | 5200 | 6.5931          |
-| 6.5463        | 1.94  | 5300 | 6.5931          |
-| 6.6269        | 1.97  | 5400 | 6.5928          |
 ### Framework versions

 ---
 license: mit
+base_model: gpt2-large
 tags:
 - generated_from_trainer
 model-index:
 # tiq
+This model is a fine-tuned version of [gpt2-large](https://huggingface.co/gpt2-large) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 5.5477
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 5e-05
+- train_batch_size: 8
+- eval_batch_size: 8
 - seed: 42
+- gradient_accumulation_steps: 8
+- total_train_batch_size: 64
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 200
+- num_epochs: 1
 - mixed_precision_training: Native AMP
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
+| 6.2342        | 0.04  | 100  | 6.1857          |
+| 5.7599        | 0.07  | 200  | 5.7751          |
+| 5.7433        | 0.11  | 300  | 5.7142          |
+| 5.6021        | 0.15  | 400  | 5.6776          |
+| 5.5084        | 0.18  | 500  | 5.6349          |
+| 5.3825        | 0.22  | 600  | 5.6201          |
+| 5.6698        | 0.26  | 700  | 5.5831          |
+| 5.4089        | 0.29  | 800  | 5.5687          |
+| 5.601         | 0.33  | 900  | 5.5574          |
+| 5.4708        | 0.37  | 1000 | 5.5555          |
+| 5.5956        | 0.4   | 1100 | 5.5520          |
+| 5.4704        | 0.44  | 1200 | 5.5494          |
+| 5.4824        | 0.47  | 1300 | 5.5502          |
+| 5.589         | 0.51  | 1400 | 5.5478          |
+| 5.5612        | 0.55  | 1500 | 5.5456          |
+| 5.4741        | 0.58  | 1600 | 5.5430          |
+| 5.463         | 0.62  | 1700 | 5.5426          |
+| 5.5071        | 0.66  | 1800 | 5.5424          |
+| 5.5469        | 0.69  | 1900 | 5.5419          |
+| 5.4266        | 0.73  | 2000 | 5.5428          |
+| 5.4848        | 0.77  | 2100 | 5.5438          |
+| 5.5069        | 0.8   | 2200 | 5.5446          |
+| 5.5885        | 0.84  | 2300 | 5.5469          |
+| 5.4484        | 0.88  | 2400 | 5.5462          |
+| 5.3859        | 0.91  | 2500 | 5.5475          |
+| 5.465         | 0.95  | 2600 | 5.5476          |
+| 5.4355        | 0.99  | 2700 | 5.5477          |
 ### Framework versions

config.json CHANGED Viewed

@@ -1,21 +1,21 @@
 {
-  "_name_or_path": "gpt2",
   "activation_function": "gelu_new",
   "architectures": [
     "GPT2LMHeadModel"
   ],
   "attn_pdrop": 0.1,
-  "bos_token_id": 48002,
   "embd_pdrop": 0.1,
-  "eos_token_id": 48003,
   "initializer_range": 0.02,
   "layer_norm_epsilon": 1e-05,
   "model_type": "gpt2",
   "n_ctx": 720,
-  "n_embd": 768,
-  "n_head": 12,
   "n_inner": null,
-  "n_layer": 12,
   "n_positions": 1024,
   "reorder_and_upcast_attn": false,
   "resid_pdrop": 0.1,
@@ -35,5 +35,5 @@
   "torch_dtype": "float32",
   "transformers_version": "4.39.3",
   "use_cache": true,
-  "vocab_size": 48005
 }

 {
+  "_name_or_path": "gpt2-large",
   "activation_function": "gelu_new",
   "architectures": [
     "GPT2LMHeadModel"
   ],
   "attn_pdrop": 0.1,
+  "bos_token_id": 11012,
   "embd_pdrop": 0.1,
+  "eos_token_id": 11013,
   "initializer_range": 0.02,
   "layer_norm_epsilon": 1e-05,
   "model_type": "gpt2",
   "n_ctx": 720,
+  "n_embd": 1280,
+  "n_head": 20,
   "n_inner": null,
+  "n_layer": 36,
   "n_positions": 1024,
   "reorder_and_upcast_attn": false,
   "resid_pdrop": 0.1,
   "torch_dtype": "float32",
   "transformers_version": "4.39.3",
   "use_cache": true,
+  "vocab_size": 11015
 }

generation_config.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "_from_model_config": true,
-  "bos_token_id": 48002,
-  "eos_token_id": 48003,
   "transformers_version": "4.39.3"
 }

 {
   "_from_model_config": true,
+  "bos_token_id": 11012,
+  "eos_token_id": 11013,
   "transformers_version": "4.39.3"
 }

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2ac31958c9741942f7f461ab5ec997050cd64aa30f3c26572f1355ae5a0e7617
-size 490856064

 version https://git-lfs.github.com/spec/v1
+oid sha256:7fc6d176fb87e4ec6ab3a46eb45eab6abb1b6d0178ddcb9b79bd62a496306e4f
+size 2895246888

tokenizer.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "added_tokens_decoder": {
-    "48004": {
       "content": "[PAD]",
       "lstrip": false,
       "normalized": false,

 {
   "added_tokens_decoder": {
+    "11014": {
       "content": "[PAD]",
       "lstrip": false,
       "normalized": false,

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:cecb3a937044fdfdd46dc45d08a7538f60dba734a202cfc63b0429875eee4292
 size 4856

 version https://git-lfs.github.com/spec/v1
+oid sha256:5b5327a7f264d98a7468fa2e2b0dbffc5af82843afd083bc8123aff3bd849216
 size 4856