End of training

Browse files

Files changed (5) hide show

README.md +188 -0
config.json +39 -0
generation_config.json +6 -0
model.safetensors +3 -0
training_args.bin +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,188 @@

+---
+language:
+- ko
+license: mit
+base_model: gpt2
+tags:
+- generated_from_trainer
+model-index:
+- name: gpt2-cs00
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# gpt2-cs00
+This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on the gpt2-cs00 dataset.
+It achieves the following results on the evaluation set:
+- Loss: 1.3143
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 5e-05
+- train_batch_size: 4
+- eval_batch_size: 4
+- seed: 42
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_steps: 1000
+- num_epochs: 3
+### Training results
+| Training Loss | Epoch | Step  | Validation Loss |
+|:-------------:|:-----:|:-----:|:---------------:|
+| 11.0601       | 0.02  | 200   | 1.8848          |
+| 1.8408        | 0.05  | 400   | 1.7685          |
+| 1.7219        | 0.07  | 600   | 1.6998          |
+| 1.7133        | 0.09  | 800   | 1.6720          |
+| 1.6776        | 0.12  | 1000  | 1.6420          |
+| 1.6309        | 0.14  | 1200  | 1.7187          |
+| 1.6157        | 0.16  | 1400  | 1.6025          |
+| 1.5546        | 0.18  | 1600  | 1.5661          |
+| 1.4834        | 0.21  | 1800  | 1.5589          |
+| 1.5641        | 0.23  | 2000  | 1.5451          |
+| 1.5133        | 0.25  | 2200  | 1.5195          |
+| 1.5373        | 0.28  | 2400  | 1.5099          |
+| 1.498         | 0.3   | 2600  | 1.5026          |
+| 1.4382        | 0.32  | 2800  | 1.4915          |
+| 1.4585        | 0.35  | 3000  | 1.4937          |
+| 1.4493        | 0.37  | 3200  | 1.4737          |
+| 1.403         | 0.39  | 3400  | 1.4713          |
+| 1.4216        | 0.42  | 3600  | 1.4573          |
+| 1.4204        | 0.44  | 3800  | 1.4684          |
+| 1.5143        | 0.46  | 4000  | 1.4458          |
+| 1.5003        | 0.48  | 4200  | 1.4115          |
+| 1.4828        | 0.51  | 4400  | 1.4446          |
+| 1.4098        | 0.53  | 4600  | 1.4133          |
+| 1.4208        | 0.55  | 4800  | 1.4178          |
+| 1.401         | 0.58  | 5000  | 1.3915          |
+| 1.3639        | 0.6   | 5200  | 1.4326          |
+| 1.3752        | 0.62  | 5400  | 1.3989          |
+| 1.4016        | 0.65  | 5600  | 1.3873          |
+| 1.4157        | 0.67  | 5800  | 1.3792          |
+| 1.4421        | 0.69  | 6000  | 1.3809          |
+| 1.4024        | 0.72  | 6200  | 1.3780          |
+| 1.4031        | 0.74  | 6400  | 1.4014          |
+| 1.4033        | 0.76  | 6600  | 1.4148          |
+| 1.4009        | 0.78  | 6800  | 1.3824          |
+| 1.4519        | 0.81  | 7000  | 1.3795          |
+| 1.377         | 0.83  | 7200  | 1.3762          |
+| 1.4153        | 0.85  | 7400  | 1.3608          |
+| 1.4112        | 0.88  | 7600  | 1.3853          |
+| 1.409         | 0.9   | 7800  | 1.3728          |
+| 1.4125        | 0.92  | 8000  | 1.3661          |
+| 1.3637        | 0.95  | 8200  | 1.3609          |
+| 1.3902        | 0.97  | 8400  | 1.3591          |
+| 1.4463        | 0.99  | 8600  | 1.3665          |
+| 1.3782        | 1.02  | 8800  | 1.3634          |
+| 1.3468        | 1.04  | 9000  | 1.3728          |
+| 1.3339        | 1.06  | 9200  | 1.3712          |
+| 1.3171        | 1.09  | 9400  | 1.3557          |
+| 1.357         | 1.11  | 9600  | 1.3723          |
+| 1.3791        | 1.13  | 9800  | 1.3617          |
+| 1.3888        | 1.15  | 10000 | 1.3477          |
+| 1.3923        | 1.18  | 10200 | 1.3512          |
+| 1.342         | 1.2   | 10400 | 1.3538          |
+| 1.3485        | 1.22  | 10600 | 1.3595          |
+| 1.3523        | 1.25  | 10800 | 1.3623          |
+| 1.3881        | 1.27  | 11000 | 1.3416          |
+| 1.3741        | 1.29  | 11200 | 1.3523          |
+| 1.3869        | 1.32  | 11400 | 1.3442          |
+| 1.3545        | 1.34  | 11600 | 1.3490          |
+| 1.3571        | 1.36  | 11800 | 1.3491          |
+| 1.3396        | 1.39  | 12000 | 1.3510          |
+| 1.3713        | 1.41  | 12200 | 1.3341          |
+| 1.3165        | 1.43  | 12400 | 1.3376          |
+| 1.3236        | 1.45  | 12600 | 1.3364          |
+| 1.3028        | 1.48  | 12800 | 1.3322          |
+| 1.3671        | 1.5   | 13000 | 1.3403          |
+| 1.3295        | 1.52  | 13200 | 1.3377          |
+| 1.3807        | 1.55  | 13400 | 1.3264          |
+| 1.3714        | 1.57  | 13600 | 1.3271          |
+| 1.3249        | 1.59  | 13800 | 1.3388          |
+| 1.3656        | 1.62  | 14000 | 1.3319          |
+| 1.2864        | 1.64  | 14200 | 1.3321          |
+| 1.352         | 1.66  | 14400 | 1.3497          |
+| 1.3599        | 1.69  | 14600 | 1.3268          |
+| 1.3191        | 1.71  | 14800 | 1.3339          |
+| 1.3136        | 1.73  | 15000 | 1.3336          |
+| 1.3338        | 1.75  | 15200 | 1.3265          |
+| 1.3528        | 1.78  | 15400 | 1.3363          |
+| 1.3538        | 1.8   | 15600 | 1.3196          |
+| 1.2879        | 1.82  | 15800 | 1.3335          |
+| 1.3217        | 1.85  | 16000 | 1.3376          |
+| 1.3657        | 1.87  | 16200 | 1.3257          |
+| 1.3351        | 1.89  | 16400 | 1.3262          |
+| 1.3469        | 1.92  | 16600 | 1.3299          |
+| 1.3053        | 1.94  | 16800 | 1.3329          |
+| 1.3332        | 1.96  | 17000 | 1.3212          |
+| 1.3466        | 1.99  | 17200 | 1.3317          |
+| 1.3743        | 2.01  | 17400 | 1.3302          |
+| 1.3227        | 2.03  | 17600 | 1.3332          |
+| 1.2728        | 2.05  | 17800 | 1.3450          |
+| 1.3239        | 2.08  | 18000 | 1.3414          |
+| 1.3661        | 2.1   | 18200 | 1.3243          |
+| 1.298         | 2.12  | 18400 | 1.3315          |
+| 1.2974        | 2.15  | 18600 | 1.3310          |
+| 1.3174        | 2.17  | 18800 | 1.3224          |
+| 1.3121        | 2.19  | 19000 | 1.3233          |
+| 1.3527        | 2.22  | 19200 | 1.3211          |
+| 1.3712        | 2.24  | 19400 | 1.3143          |
+| 1.2873        | 2.26  | 19600 | 1.3302          |
+| 1.306         | 2.29  | 19800 | 1.3211          |
+| 1.3161        | 2.31  | 20000 | 1.3242          |
+| 1.308         | 2.33  | 20200 | 1.3176          |
+| 1.3403        | 2.35  | 20400 | 1.3143          |
+| 1.3688        | 2.38  | 20600 | 1.3195          |
+| 1.2743        | 2.4   | 20800 | 1.3230          |
+| 1.2892        | 2.42  | 21000 | 1.3287          |
+| 1.3782        | 2.45  | 21200 | 1.3137          |
+| 1.3331        | 2.47  | 21400 | 1.3148          |
+| 1.3182        | 2.49  | 21600 | 1.3220          |
+| 1.2542        | 2.52  | 21800 | 1.3332          |
+| 1.2879        | 2.54  | 22000 | 1.3229          |
+| 1.316         | 2.56  | 22200 | 1.3181          |
+| 1.2989        | 2.59  | 22400 | 1.3155          |
+| 1.3095        | 2.61  | 22600 | 1.3218          |
+| 1.2457        | 2.63  | 22800 | 1.3185          |
+| 1.3053        | 2.65  | 23000 | 1.3168          |
+| 1.3036        | 2.68  | 23200 | 1.3180          |
+| 1.2861        | 2.7   | 23400 | 1.3117          |
+| 1.3           | 2.72  | 23600 | 1.3208          |
+| 1.3026        | 2.75  | 23800 | 1.3147          |
+| 1.3006        | 2.77  | 24000 | 1.3211          |
+| 1.3477        | 2.79  | 24200 | 1.3140          |
+| 1.2851        | 2.82  | 24400 | 1.3208          |
+| 1.2859        | 2.84  | 24600 | 1.3172          |
+| 1.3286        | 2.86  | 24800 | 1.3151          |
+| 1.3237        | 2.89  | 25000 | 1.3148          |
+| 1.3503        | 2.91  | 25200 | 1.3133          |
+| 1.27          | 2.93  | 25400 | 1.3138          |
+| 1.2998        | 2.96  | 25600 | 1.3151          |
+| 1.3461        | 2.98  | 25800 | 1.3143          |
+### Framework versions
+- Transformers 4.35.2
+- Pytorch 2.1.1+cu118
+- Datasets 2.15.0
+- Tokenizers 0.15.0

config.json ADDED Viewed

	@@ -0,0 +1,39 @@

+{
+  "_name_or_path": "gpt2",
+  "activation_function": "gelu_new",
+  "architectures": [
+    "GPT2LMHeadModel"
+  ],
+  "attn_pdrop": 0.1,
+  "bos_token_id": 50256,
+  "embd_pdrop": 0.1,
+  "eos_token_id": 50256,
+  "initializer_range": 0.02,
+  "layer_norm_epsilon": 1e-05,
+  "model_type": "gpt2",
+  "n_ctx": 1024,
+  "n_embd": 768,
+  "n_head": 12,
+  "n_inner": null,
+  "n_layer": 12,
+  "n_positions": 1024,
+  "reorder_and_upcast_attn": false,
+  "resid_pdrop": 0.1,
+  "scale_attn_by_inverse_layer_idx": false,
+  "scale_attn_weights": true,
+  "summary_activation": null,
+  "summary_first_dropout": 0.1,
+  "summary_proj_to_labels": true,
+  "summary_type": "cls_index",
+  "summary_use_proj": true,
+  "task_specific_params": {
+    "text-generation": {
+      "do_sample": true,
+      "max_length": 50
+    }
+  },
+  "torch_dtype": "float32",
+  "transformers_version": "4.35.2",
+  "use_cache": true,
+  "vocab_size": 50258
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 50256,
+  "eos_token_id": 50256,
+  "transformers_version": "4.35.2"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cfc0efc639a6bb004591e06e379574f4f68678efc62435e09aa694e9aee2c5d7
+size 497777280

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:458eb23722a24431097ea85aadf859e46a533366f3b0899d80e4db9759b5e85a
+size 4536