mtzig commited on
Commit
ca90928
·
verified ·
1 Parent(s): ca8cf0d

Model save

Browse files
Files changed (4) hide show
  1. README.md +63 -63
  2. config.json +3 -3
  3. model.safetensors +2 -2
  4. training_args.bin +1 -1
README.md CHANGED
@@ -16,8 +16,8 @@ should probably proofread and complete it, then remove this comment. -->
16
 
17
  This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
18
  It achieves the following results on the evaluation set:
19
- - Loss: 0.5281
20
- - Accuracy: 0.65
21
 
22
  ## Model description
23
 
@@ -49,67 +49,67 @@ The following hyperparameters were used during training:
49
 
50
  | Training Loss | Epoch | Step | Validation Loss | Accuracy |
51
  |:-------------:|:------:|:----:|:---------------:|:--------:|
52
- | No log | 0 | 0 | 2.8464 | 0.025 |
53
- | 2.0752 | 100.0 | 100 | 2.0603 | 0.5 |
54
- | 0.8893 | 200.0 | 200 | 0.8853 | 0.5 |
55
- | 0.7677 | 300.0 | 300 | 0.7673 | 0.5 |
56
- | 0.7337 | 400.0 | 400 | 0.7334 | 0.525 |
57
- | 0.7085 | 500.0 | 500 | 0.7082 | 0.5 |
58
- | 0.6828 | 600.0 | 600 | 0.6824 | 0.55 |
59
- | 0.66 | 700.0 | 700 | 0.6603 | 0.55 |
60
- | 0.6514 | 800.0 | 800 | 0.6516 | 0.575 |
61
- | 0.643 | 900.0 | 900 | 0.6428 | 0.55 |
62
- | 0.6375 | 1000.0 | 1000 | 0.6371 | 0.575 |
63
- | 0.6321 | 1100.0 | 1100 | 0.6325 | 0.575 |
64
- | 0.6276 | 1200.0 | 1200 | 0.6281 | 0.575 |
65
- | 0.624 | 1300.0 | 1300 | 0.6233 | 0.6 |
66
- | 0.6196 | 1400.0 | 1400 | 0.6196 | 0.6 |
67
- | 0.6156 | 1500.0 | 1500 | 0.6153 | 0.575 |
68
- | 0.6108 | 1600.0 | 1600 | 0.6110 | 0.625 |
69
- | 0.6067 | 1700.0 | 1700 | 0.6064 | 0.625 |
70
- | 0.6022 | 1800.0 | 1800 | 0.6031 | 0.625 |
71
- | 0.5993 | 1900.0 | 1900 | 0.5991 | 0.625 |
72
- | 0.5961 | 2000.0 | 2000 | 0.5961 | 0.6 |
73
- | 0.5925 | 2100.0 | 2100 | 0.5939 | 0.6 |
74
- | 0.5895 | 2200.0 | 2200 | 0.5905 | 0.6 |
75
- | 0.5866 | 2300.0 | 2300 | 0.5870 | 0.6 |
76
- | 0.584 | 2400.0 | 2400 | 0.5840 | 0.6 |
77
- | 0.5812 | 2500.0 | 2500 | 0.5813 | 0.6 |
78
- | 0.5787 | 2600.0 | 2600 | 0.5785 | 0.6 |
79
- | 0.5758 | 2700.0 | 2700 | 0.5756 | 0.65 |
80
- | 0.5706 | 2800.0 | 2800 | 0.5705 | 0.675 |
81
- | 0.5668 | 2900.0 | 2900 | 0.5666 | 0.65 |
82
- | 0.5629 | 3000.0 | 3000 | 0.5627 | 0.575 |
83
- | 0.5609 | 3100.0 | 3100 | 0.5606 | 0.65 |
84
- | 0.5594 | 3200.0 | 3200 | 0.5591 | 0.65 |
85
- | 0.5576 | 3300.0 | 3300 | 0.5574 | 0.65 |
86
- | 0.5562 | 3400.0 | 3400 | 0.5560 | 0.65 |
87
- | 0.5543 | 3500.0 | 3500 | 0.5544 | 0.675 |
88
- | 0.553 | 3600.0 | 3600 | 0.5529 | 0.7 |
89
- | 0.5512 | 3700.0 | 3700 | 0.5512 | 0.65 |
90
- | 0.5495 | 3800.0 | 3800 | 0.5493 | 0.65 |
91
- | 0.5468 | 3900.0 | 3900 | 0.5467 | 0.6 |
92
- | 0.545 | 4000.0 | 4000 | 0.5448 | 0.625 |
93
- | 0.5424 | 4100.0 | 4100 | 0.5424 | 0.65 |
94
- | 0.5407 | 4200.0 | 4200 | 0.5408 | 0.65 |
95
- | 0.5387 | 4300.0 | 4300 | 0.5387 | 0.65 |
96
- | 0.5371 | 4400.0 | 4400 | 0.5371 | 0.65 |
97
- | 0.5358 | 4500.0 | 4500 | 0.5358 | 0.65 |
98
- | 0.5346 | 4600.0 | 4600 | 0.5345 | 0.675 |
99
- | 0.5335 | 4700.0 | 4700 | 0.5334 | 0.65 |
100
- | 0.5325 | 4800.0 | 4800 | 0.5326 | 0.7 |
101
- | 0.5316 | 4900.0 | 4900 | 0.5315 | 0.675 |
102
- | 0.5307 | 5000.0 | 5000 | 0.5308 | 0.65 |
103
- | 0.5301 | 5100.0 | 5100 | 0.5301 | 0.65 |
104
- | 0.5295 | 5200.0 | 5200 | 0.5295 | 0.65 |
105
- | 0.5291 | 5300.0 | 5300 | 0.5291 | 0.65 |
106
- | 0.5288 | 5400.0 | 5400 | 0.5288 | 0.65 |
107
- | 0.5285 | 5500.0 | 5500 | 0.5285 | 0.65 |
108
- | 0.5283 | 5600.0 | 5600 | 0.5283 | 0.65 |
109
- | 0.5282 | 5700.0 | 5700 | 0.5282 | 0.65 |
110
- | 0.5281 | 5800.0 | 5800 | 0.5281 | 0.65 |
111
- | 0.5281 | 5900.0 | 5900 | 0.5281 | 0.65 |
112
- | 0.5281 | 6000.0 | 6000 | 0.5281 | 0.65 |
113
 
114
 
115
  ### Framework versions
 
16
 
17
  This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
18
  It achieves the following results on the evaluation set:
19
+ - Loss: 0.2541
20
+ - Accuracy: 0.85
21
 
22
  ## Model description
23
 
 
49
 
50
  | Training Loss | Epoch | Step | Validation Loss | Accuracy |
51
  |:-------------:|:------:|:----:|:---------------:|:--------:|
52
+ | No log | 0 | 0 | 2.7136 | 0.0 |
53
+ | 0.706 | 100.0 | 100 | 0.7050 | 0.5 |
54
+ | 0.6611 | 200.0 | 200 | 0.6612 | 0.55 |
55
+ | 0.6492 | 300.0 | 300 | 0.6500 | 0.575 |
56
+ | 0.6439 | 400.0 | 400 | 0.6421 | 0.55 |
57
+ | 0.6361 | 500.0 | 500 | 0.6336 | 0.55 |
58
+ | 0.627 | 600.0 | 600 | 0.6189 | 0.575 |
59
+ | 0.6182 | 700.0 | 700 | 0.6307 | 0.575 |
60
+ | 0.5771 | 800.0 | 800 | 0.5775 | 0.6 |
61
+ | 0.5633 | 900.0 | 900 | 0.5664 | 0.625 |
62
+ | 0.5517 | 1000.0 | 1000 | 0.5497 | 0.625 |
63
+ | 0.5317 | 1100.0 | 1100 | 0.5323 | 0.65 |
64
+ | 0.5331 | 1200.0 | 1200 | 0.5204 | 0.65 |
65
+ | 0.6811 | 1300.0 | 1300 | 0.5536 | 0.6 |
66
+ | 0.523 | 1400.0 | 1400 | 0.5144 | 0.65 |
67
+ | 0.4899 | 1500.0 | 1500 | 0.4920 | 0.65 |
68
+ | 0.4893 | 1600.0 | 1600 | 0.4854 | 0.675 |
69
+ | 0.5072 | 1700.0 | 1700 | 0.4797 | 0.675 |
70
+ | 0.4647 | 1800.0 | 1800 | 0.4675 | 0.675 |
71
+ | 0.6787 | 1900.0 | 1900 | 0.5977 | 0.6 |
72
+ | 0.4529 | 2000.0 | 2000 | 0.4521 | 0.7 |
73
+ | 0.4423 | 2100.0 | 2100 | 0.4567 | 0.7 |
74
+ | 0.4773 | 2200.0 | 2200 | 0.4749 | 0.675 |
75
+ | 0.4376 | 2300.0 | 2300 | 0.4358 | 0.7 |
76
+ | 0.4268 | 2400.0 | 2400 | 0.4237 | 0.7 |
77
+ | 0.4209 | 2500.0 | 2500 | 0.4199 | 0.7 |
78
+ | 0.4186 | 2600.0 | 2600 | 0.4201 | 0.725 |
79
+ | 0.4003 | 2700.0 | 2700 | 0.3993 | 0.7 |
80
+ | 0.3971 | 2800.0 | 2800 | 0.3942 | 0.725 |
81
+ | 0.4315 | 2900.0 | 2900 | 0.4076 | 0.725 |
82
+ | 0.3946 | 3000.0 | 3000 | 0.3889 | 0.7 |
83
+ | 0.4415 | 3100.0 | 3100 | 0.4902 | 0.675 |
84
+ | 0.3844 | 3200.0 | 3200 | 0.3857 | 0.75 |
85
+ | 0.368 | 3300.0 | 3300 | 0.3683 | 0.75 |
86
+ | 0.3581 | 3400.0 | 3400 | 0.3578 | 0.775 |
87
+ | 0.3529 | 3500.0 | 3500 | 0.3477 | 0.775 |
88
+ | 0.4454 | 3600.0 | 3600 | 0.3698 | 0.75 |
89
+ | 0.3518 | 3700.0 | 3700 | 0.3645 | 0.75 |
90
+ | 0.3441 | 3800.0 | 3800 | 0.3424 | 0.75 |
91
+ | 0.4046 | 3900.0 | 3900 | 0.3657 | 0.75 |
92
+ | 0.3285 | 4000.0 | 4000 | 0.3271 | 0.775 |
93
+ | 0.3245 | 4100.0 | 4100 | 0.3212 | 0.775 |
94
+ | 0.3265 | 4200.0 | 4200 | 0.3187 | 0.8 |
95
+ | 0.3302 | 4300.0 | 4300 | 0.3496 | 0.775 |
96
+ | 0.3266 | 4400.0 | 4400 | 0.3087 | 0.825 |
97
+ | 0.3803 | 4500.0 | 4500 | 0.4303 | 0.775 |
98
+ | 0.2938 | 4600.0 | 4600 | 0.2937 | 0.825 |
99
+ | 0.2908 | 4700.0 | 4700 | 0.2899 | 0.85 |
100
+ | 0.343 | 4800.0 | 4800 | 0.3310 | 0.775 |
101
+ | 0.2851 | 4900.0 | 4900 | 0.2857 | 0.85 |
102
+ | 0.2808 | 5000.0 | 5000 | 0.2803 | 0.85 |
103
+ | 0.2748 | 5100.0 | 5100 | 0.2767 | 0.85 |
104
+ | 0.271 | 5200.0 | 5200 | 0.2708 | 0.85 |
105
+ | 0.2683 | 5300.0 | 5300 | 0.2680 | 0.85 |
106
+ | 0.2654 | 5400.0 | 5400 | 0.2652 | 0.85 |
107
+ | 0.2619 | 5500.0 | 5500 | 0.2619 | 0.85 |
108
+ | 0.258 | 5600.0 | 5600 | 0.2579 | 0.85 |
109
+ | 0.2556 | 5700.0 | 5700 | 0.2556 | 0.85 |
110
+ | 0.2545 | 5800.0 | 5800 | 0.2545 | 0.85 |
111
+ | 0.2541 | 5900.0 | 5900 | 0.2541 | 0.85 |
112
+ | 0.2541 | 6000.0 | 6000 | 0.2541 | 0.85 |
113
 
114
 
115
  ### Framework versions
config.json CHANGED
@@ -7,9 +7,9 @@
7
  "dropout": 0.0,
8
  "mlp_dim": 4,
9
  "model_type": "nanogpt",
10
- "n_embd": 256,
11
- "n_head": 2,
12
- "n_layer": 1,
13
  "nonlinearity": "RELU",
14
  "torch_dtype": "float32",
15
  "transformers_version": "4.46.0",
 
7
  "dropout": 0.0,
8
  "mlp_dim": 4,
9
  "model_type": "nanogpt",
10
+ "n_embd": 384,
11
+ "n_head": 6,
12
+ "n_layer": 6,
13
  "nonlinearity": "RELU",
14
  "torch_dtype": "float32",
15
  "transformers_version": "4.46.0",
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0c811e0da51de04bcad857c635b13c08e09748a338efd1f646ebd7a4d7735098
3
- size 3191304
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d297afc52b876a3e5ce7da0205feb241df4dd02c00abf78cc3be56ae00bc70aa
3
+ size 42640744
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b9e0e7c8f7dba1698df41a20d39893814c33030511a5859d3d30335215530ff0
3
  size 5240
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:baeee4ae44209e439f6b3f235dc22dedab457718d76a2353f692b6e98c17b18e
3
  size 5240