End of training
Browse files
README.md
CHANGED
|
@@ -16,7 +16,7 @@ should probably proofread and complete it, then remove this comment. -->
|
|
| 16 |
|
| 17 |
This model is a fine-tuned version of [distilgpt2](https://huggingface.co/distilgpt2) on an unknown dataset.
|
| 18 |
It achieves the following results on the evaluation set:
|
| 19 |
-
- Loss: 2.
|
| 20 |
|
| 21 |
## Model description
|
| 22 |
|
|
@@ -41,22 +41,62 @@ The following hyperparameters were used during training:
|
|
| 41 |
- seed: 42
|
| 42 |
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
| 43 |
- lr_scheduler_type: linear
|
| 44 |
-
- num_epochs:
|
| 45 |
|
| 46 |
### Training results
|
| 47 |
|
| 48 |
| Training Loss | Epoch | Step | Validation Loss |
|
| 49 |
|:-------------:|:-----:|:----:|:---------------:|
|
| 50 |
-
| No log | 1.0 | 140 | 3.
|
| 51 |
-
| No log | 2.0 | 280 | 3.
|
| 52 |
-
| No log | 3.0 | 420 | 3.
|
| 53 |
-
| 3.
|
| 54 |
-
| 3.
|
| 55 |
-
| 3.
|
| 56 |
-
| 3.
|
| 57 |
-
| 2.
|
| 58 |
-
| 2.
|
| 59 |
-
| 2.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
|
| 61 |
|
| 62 |
### Framework versions
|
|
|
|
| 16 |
|
| 17 |
This model is a fine-tuned version of [distilgpt2](https://huggingface.co/distilgpt2) on an unknown dataset.
|
| 18 |
It achieves the following results on the evaluation set:
|
| 19 |
+
- Loss: 2.8722
|
| 20 |
|
| 21 |
## Model description
|
| 22 |
|
|
|
|
| 41 |
- seed: 42
|
| 42 |
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
| 43 |
- lr_scheduler_type: linear
|
| 44 |
+
- num_epochs: 50
|
| 45 |
|
| 46 |
### Training results
|
| 47 |
|
| 48 |
| Training Loss | Epoch | Step | Validation Loss |
|
| 49 |
|:-------------:|:-----:|:----:|:---------------:|
|
| 50 |
+
| No log | 1.0 | 140 | 3.2358 |
|
| 51 |
+
| No log | 2.0 | 280 | 3.1312 |
|
| 52 |
+
| No log | 3.0 | 420 | 3.0747 |
|
| 53 |
+
| 3.229 | 4.0 | 560 | 3.0344 |
|
| 54 |
+
| 3.229 | 5.0 | 700 | 3.0087 |
|
| 55 |
+
| 3.229 | 6.0 | 840 | 2.9826 |
|
| 56 |
+
| 3.229 | 7.0 | 980 | 2.9612 |
|
| 57 |
+
| 2.8994 | 8.0 | 1120 | 2.9485 |
|
| 58 |
+
| 2.8994 | 9.0 | 1260 | 2.9314 |
|
| 59 |
+
| 2.8994 | 10.0 | 1400 | 2.9206 |
|
| 60 |
+
| 2.7362 | 11.0 | 1540 | 2.9058 |
|
| 61 |
+
| 2.7362 | 12.0 | 1680 | 2.8936 |
|
| 62 |
+
| 2.7362 | 13.0 | 1820 | 2.8910 |
|
| 63 |
+
| 2.7362 | 14.0 | 1960 | 2.8837 |
|
| 64 |
+
| 2.6111 | 15.0 | 2100 | 2.8820 |
|
| 65 |
+
| 2.6111 | 16.0 | 2240 | 2.8754 |
|
| 66 |
+
| 2.6111 | 17.0 | 2380 | 2.8715 |
|
| 67 |
+
| 2.5132 | 18.0 | 2520 | 2.8670 |
|
| 68 |
+
| 2.5132 | 19.0 | 2660 | 2.8645 |
|
| 69 |
+
| 2.5132 | 20.0 | 2800 | 2.8602 |
|
| 70 |
+
| 2.5132 | 21.0 | 2940 | 2.8605 |
|
| 71 |
+
| 2.4321 | 22.0 | 3080 | 2.8604 |
|
| 72 |
+
| 2.4321 | 23.0 | 3220 | 2.8552 |
|
| 73 |
+
| 2.4321 | 24.0 | 3360 | 2.8564 |
|
| 74 |
+
| 2.3645 | 25.0 | 3500 | 2.8564 |
|
| 75 |
+
| 2.3645 | 26.0 | 3640 | 2.8613 |
|
| 76 |
+
| 2.3645 | 27.0 | 3780 | 2.8560 |
|
| 77 |
+
| 2.3645 | 28.0 | 3920 | 2.8510 |
|
| 78 |
+
| 2.3077 | 29.0 | 4060 | 2.8535 |
|
| 79 |
+
| 2.3077 | 30.0 | 4200 | 2.8528 |
|
| 80 |
+
| 2.3077 | 31.0 | 4340 | 2.8585 |
|
| 81 |
+
| 2.3077 | 32.0 | 4480 | 2.8610 |
|
| 82 |
+
| 2.2607 | 33.0 | 4620 | 2.8625 |
|
| 83 |
+
| 2.2607 | 34.0 | 4760 | 2.8602 |
|
| 84 |
+
| 2.2607 | 35.0 | 4900 | 2.8643 |
|
| 85 |
+
| 2.2233 | 36.0 | 5040 | 2.8591 |
|
| 86 |
+
| 2.2233 | 37.0 | 5180 | 2.8647 |
|
| 87 |
+
| 2.2233 | 38.0 | 5320 | 2.8638 |
|
| 88 |
+
| 2.2233 | 39.0 | 5460 | 2.8657 |
|
| 89 |
+
| 2.193 | 40.0 | 5600 | 2.8644 |
|
| 90 |
+
| 2.193 | 41.0 | 5740 | 2.8620 |
|
| 91 |
+
| 2.193 | 42.0 | 5880 | 2.8676 |
|
| 92 |
+
| 2.1706 | 43.0 | 6020 | 2.8702 |
|
| 93 |
+
| 2.1706 | 44.0 | 6160 | 2.8704 |
|
| 94 |
+
| 2.1706 | 45.0 | 6300 | 2.8698 |
|
| 95 |
+
| 2.1706 | 46.0 | 6440 | 2.8716 |
|
| 96 |
+
| 2.155 | 47.0 | 6580 | 2.8714 |
|
| 97 |
+
| 2.155 | 48.0 | 6720 | 2.8726 |
|
| 98 |
+
| 2.155 | 49.0 | 6860 | 2.8718 |
|
| 99 |
+
| 2.1472 | 50.0 | 7000 | 2.8722 |
|
| 100 |
|
| 101 |
|
| 102 |
### Framework versions
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 327657928
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:733ce91b4661cdae79591e0bd3ba855380b17de16e81d1ad3112f2451ed8e4d1
|
| 3 |
size 327657928
|
runs/Dec10_22-36-52_ltrcgpu2/events.out.tfevents.1733850413.ltrcgpu2.3514808.0
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:60322d304662aaaaa7331e607976f0dbfd3f708fabdaba56dbc5f83ae390342d
|
| 3 |
+
size 22258
|
runs/Dec10_22-36-52_ltrcgpu2/events.out.tfevents.1733851675.ltrcgpu2.3514808.1
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:51be093beba53f040b22db9ba5bcc834b51cdba37a376173ecf405a5738c4e42
|
| 3 |
+
size 359
|