Model save

Browse files

Files changed (4) hide show

README.md +215 -0
config.json +19 -0
model.safetensors +3 -0
training_args.bin +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,215 @@

+---
+library_name: transformers
+tags:
+- generated_from_trainer
+metrics:
+- accuracy
+model-index:
+- name: reverseadd_lr5e-4_batch128_train1-16_eval17
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# reverseadd_lr5e-4_batch128_train1-16_eval17
+This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.0001
+- Accuracy: 1.0
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0005
+- train_batch_size: 128
+- eval_batch_size: 512
+- seed: 23452399
+- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_ratio: 0.1
+- num_epochs: 1
+### Training results
+| Training Loss | Epoch  | Step  | Validation Loss | Accuracy |
+|:-------------:|:------:|:-----:|:---------------:|:--------:|
+| No log        | 0      | 0     | 2.8155          | 0.0      |
+| 2.2946        | 0.0064 | 100   | 2.3351          | 0.0      |
+| 2.1902        | 0.0128 | 200   | 2.2386          | 0.0      |
+| 2.1098        | 0.0192 | 300   | 2.1977          | 0.0      |
+| 2.0555        | 0.0256 | 400   | 2.1978          | 0.0      |
+| 2.0039        | 0.032  | 500   | 2.1589          | 0.0      |
+| 1.8206        | 0.0384 | 600   | 1.9316          | 0.0      |
+| 1.7494        | 0.0448 | 700   | 1.8544          | 0.0      |
+| 1.6857        | 0.0512 | 800   | 1.8468          | 0.0      |
+| 1.4657        | 0.0576 | 900   | 1.6150          | 0.0      |
+| 1.4123        | 0.064  | 1000  | 1.6095          | 0.0      |
+| 1.4308        | 0.0704 | 1100  | 1.4342          | 0.0      |
+| 1.4203        | 0.0768 | 1200  | 1.4498          | 0.0014   |
+| 1.2564        | 0.0832 | 1300  | 1.3778          | 0.0      |
+| 1.4974        | 0.0896 | 1400  | 1.4512          | 0.0      |
+| 1.2751        | 0.096  | 1500  | 1.3121          | 0.0008   |
+| 1.3045        | 0.1024 | 1600  | 1.3020          | 0.0004   |
+| 1.4441        | 0.1088 | 1700  | 1.2960          | 0.0019   |
+| 1.2986        | 0.1152 | 1800  | 1.3620          | 0.0023   |
+| 1.202         | 0.1216 | 1900  | 1.3917          | 0.0      |
+| 1.0536        | 0.128  | 2000  | 1.2194          | 0.003    |
+| 1.1365        | 0.1344 | 2100  | 1.2133          | 0.002    |
+| 1.1016        | 0.1408 | 2200  | 1.3385          | 0.0012   |
+| 1.0749        | 0.1472 | 2300  | 1.2830          | 0.0002   |
+| 1.1077        | 0.1536 | 2400  | 1.2144          | 0.0018   |
+| 1.2018        | 0.16   | 2500  | 1.3076          | 0.0001   |
+| 1.0792        | 0.1664 | 2600  | 1.1722          | 0.0035   |
+| 1.1767        | 0.1728 | 2700  | 1.4412          | 0.0006   |
+| 1.1921        | 0.1792 | 2800  | 1.4885          | 0.0009   |
+| 1.1296        | 0.1856 | 2900  | 1.1856          | 0.0044   |
+| 1.1497        | 0.192  | 3000  | 1.2461          | 0.0003   |
+| 1.1368        | 0.1984 | 3100  | 1.2033          | 0.0019   |
+| 1.0723        | 0.2048 | 3200  | 1.1648          | 0.0044   |
+| 1.2805        | 0.2112 | 3300  | 1.3671          | 0.0012   |
+| 1.1424        | 0.2176 | 3400  | 1.2261          | 0.0034   |
+| 1.0507        | 0.224  | 3500  | 1.6901          | 0.0012   |
+| 1.1209        | 0.2304 | 3600  | 1.1904          | 0.0037   |
+| 1.0004        | 0.2368 | 3700  | 1.1728          | 0.0052   |
+| 1.0119        | 0.2432 | 3800  | 1.3097          | 0.0014   |
+| 0.8694        | 0.2496 | 3900  | 1.2167          | 0.0021   |
+| 0.7671        | 0.256  | 4000  | 0.9645          | 0.0063   |
+| 0.6854        | 0.2624 | 4100  | 0.8259          | 0.0103   |
+| 0.6831        | 0.2688 | 4200  | 1.8649          | 0.0052   |
+| 0.7282        | 0.2752 | 4300  | 1.0701          | 0.0129   |
+| 0.5102        | 0.2816 | 4400  | 0.7336          | 0.0177   |
+| 0.504         | 0.288  | 4500  | 0.9905          | 0.0037   |
+| 0.6358        | 0.2944 | 4600  | 0.9810          | 0.0056   |
+| 0.4155        | 0.3008 | 4700  | 0.7320          | 0.0203   |
+| 0.5209        | 0.3072 | 4800  | 0.7939          | 0.0143   |
+| 0.4059        | 0.3136 | 4900  | 0.7709          | 0.0224   |
+| 1.1919        | 0.32   | 5000  | 1.2763          | 0.0095   |
+| 0.4233        | 0.3264 | 5100  | 0.9546          | 0.0332   |
+| 0.4587        | 0.3328 | 5200  | 0.5571          | 0.024    |
+| 0.357         | 0.3392 | 5300  | 0.6538          | 0.0319   |
+| 0.1951        | 0.3456 | 5400  | 0.7499          | 0.0794   |
+| 0.0627        | 0.352  | 5500  | 0.1778          | 0.4928   |
+| 0.1673        | 0.3584 | 5600  | 0.4276          | 0.3864   |
+| 0.108         | 0.3648 | 5700  | 0.2546          | 0.475    |
+| 0.303         | 0.3712 | 5800  | 0.8399          | 0.1792   |
+| 0.0955        | 0.3776 | 5900  | 0.1259          | 0.6165   |
+| 0.3565        | 0.384  | 6000  | 0.4181          | 0.2727   |
+| 0.1023        | 0.3904 | 6100  | 0.3169          | 0.2621   |
+| 0.088         | 0.3968 | 6200  | 0.5066          | 0.4139   |
+| 0.1426        | 0.4032 | 6300  | 0.2009          | 0.4383   |
+| 0.0982        | 0.4096 | 6400  | 0.3973          | 0.4288   |
+| 0.0761        | 0.416  | 6500  | 0.1796          | 0.5228   |
+| 0.0535        | 0.4224 | 6600  | 0.1319          | 0.554    |
+| 0.067         | 0.4288 | 6700  | 0.1653          | 0.5779   |
+| 0.0809        | 0.4352 | 6800  | 0.2924          | 0.3632   |
+| 0.0287        | 0.4416 | 6900  | 0.2800          | 0.4756   |
+| 0.2862        | 0.448  | 7000  | 0.4914          | 0.4814   |
+| 0.1426        | 0.4544 | 7100  | 0.1894          | 0.4845   |
+| 0.5235        | 0.4608 | 7200  | 0.4767          | 0.3179   |
+| 0.1059        | 0.4672 | 7300  | 0.0758          | 0.7329   |
+| 0.0569        | 0.4736 | 7400  | 0.0930          | 0.6479   |
+| 0.2759        | 0.48   | 7500  | 0.6940          | 0.2162   |
+| 0.0365        | 0.4864 | 7600  | 0.1118          | 0.7336   |
+| 0.1459        | 0.4928 | 7700  | 0.3021          | 0.5704   |
+| 0.0642        | 0.4992 | 7800  | 0.3398          | 0.3859   |
+| 0.0475        | 0.5056 | 7900  | 0.1293          | 0.6375   |
+| 0.2529        | 0.512  | 8000  | 0.1974          | 0.5196   |
+| 0.0545        | 0.5184 | 8100  | 0.5865          | 0.0885   |
+| 0.023         | 0.5248 | 8200  | 0.2984          | 0.5018   |
+| 0.1067        | 0.5312 | 8300  | 0.1996          | 0.419    |
+| 0.0251        | 0.5376 | 8400  | 0.0553          | 0.7663   |
+| 0.0202        | 0.544  | 8500  | 0.0368          | 0.8618   |
+| 0.0487        | 0.5504 | 8600  | 0.3866          | 0.4186   |
+| 0.0637        | 0.5568 | 8700  | 0.1638          | 0.681    |
+| 0.0237        | 0.5632 | 8800  | 0.0808          | 0.6553   |
+| 0.0278        | 0.5696 | 8900  | 0.0793          | 0.6923   |
+| 0.0181        | 0.576  | 9000  | 0.0814          | 0.7107   |
+| 0.0385        | 0.5824 | 9100  | 0.0419          | 0.8253   |
+| 0.1081        | 0.5888 | 9200  | 0.2834          | 0.4629   |
+| 0.0539        | 0.5952 | 9300  | 0.1097          | 0.6756   |
+| 0.0059        | 0.6016 | 9400  | 0.0250          | 0.8896   |
+| 0.0109        | 0.608  | 9500  | 0.2312          | 0.4467   |
+| 0.0091        | 0.6144 | 9600  | 0.0257          | 0.9123   |
+| 0.0158        | 0.6208 | 9700  | 0.0657          | 0.7171   |
+| 0.0093        | 0.6272 | 9800  | 0.0579          | 0.7496   |
+| 0.0046        | 0.6336 | 9900  | 0.0079          | 0.9606   |
+| 0.0006        | 0.64   | 10000 | 0.3415          | 0.5193   |
+| 0.0284        | 0.6464 | 10100 | 0.1142          | 0.7089   |
+| 0.0011        | 0.6528 | 10200 | 0.0064          | 0.9759   |
+| 0.0035        | 0.6592 | 10300 | 0.1225          | 0.7545   |
+| 0.0081        | 0.6656 | 10400 | 0.0127          | 0.9453   |
+| 0.0003        | 0.672  | 10500 | 0.0052          | 0.9788   |
+| 0.0015        | 0.6784 | 10600 | 0.0175          | 0.9311   |
+| 0.0001        | 0.6848 | 10700 | 0.0066          | 0.9688   |
+| 0.0012        | 0.6912 | 10800 | 0.0459          | 0.877    |
+| 0.0002        | 0.6976 | 10900 | 0.0725          | 0.7628   |
+| 0.0002        | 0.704  | 11000 | 0.3270          | 0.4999   |
+| 0.0003        | 0.7104 | 11100 | 0.0195          | 0.8962   |
+| 0.0001        | 0.7168 | 11200 | 0.0041          | 0.9819   |
+| 0.0           | 0.7232 | 11300 | 0.0029          | 0.9864   |
+| 0.0           | 0.7296 | 11400 | 0.0010          | 0.9952   |
+| 0.0           | 0.736  | 11500 | 0.0017          | 0.9913   |
+| 0.0           | 0.7424 | 11600 | 0.0015          | 0.992    |
+| 0.0           | 0.7488 | 11700 | 0.0011          | 0.9941   |
+| 0.0002        | 0.7552 | 11800 | 0.0091          | 0.9548   |
+| 0.0           | 0.7616 | 11900 | 0.0002          | 0.9994   |
+| 0.0           | 0.768  | 12000 | 0.0003          | 0.9993   |
+| 0.0           | 0.7744 | 12100 | 0.0001          | 0.9997   |
+| 0.0           | 0.7808 | 12200 | 0.0005          | 0.9981   |
+| 0.0           | 0.7872 | 12300 | 0.0002          | 0.9995   |
+| 0.0           | 0.7936 | 12400 | 0.0002          | 0.9995   |
+| 0.0           | 0.8    | 12500 | 0.0001          | 0.9999   |
+| 0.0           | 0.8064 | 12600 | 0.0000          | 1.0      |
+| 0.0           | 0.8128 | 12700 | 0.0000          | 1.0      |
+| 0.0           | 0.8192 | 12800 | 0.0000          | 1.0      |
+| 0.0           | 0.8256 | 12900 | 0.0000          | 1.0      |
+| 0.0           | 0.832  | 13000 | 0.0000          | 1.0      |
+| 0.0           | 0.8384 | 13100 | 0.0000          | 1.0      |
+| 0.0           | 0.8448 | 13200 | 0.0000          | 1.0      |
+| 0.0           | 0.8512 | 13300 | 0.0000          | 1.0      |
+| 0.0           | 0.8576 | 13400 | 0.0000          | 1.0      |
+| 0.0           | 0.864  | 13500 | 0.0000          | 1.0      |
+| 0.0           | 0.8704 | 13600 | 0.0000          | 1.0      |
+| 0.0           | 0.8768 | 13700 | 0.0000          | 1.0      |
+| 0.0           | 0.8832 | 13800 | 0.0000          | 1.0      |
+| 0.0           | 0.8896 | 13900 | 0.0000          | 1.0      |
+| 0.0           | 0.896  | 14000 | 0.0000          | 1.0      |
+| 0.0           | 0.9024 | 14100 | 0.0000          | 1.0      |
+| 0.0           | 0.9088 | 14200 | 0.0000          | 1.0      |
+| 0.0           | 0.9152 | 14300 | 0.0001          | 0.9998   |
+| 0.0           | 0.9216 | 14400 | 0.0000          | 0.9999   |
+| 0.0           | 0.928  | 14500 | 0.0001          | 1.0      |
+| 0.0           | 0.9344 | 14600 | 0.0001          | 1.0      |
+| 0.0           | 0.9408 | 14700 | 0.0001          | 1.0      |
+| 0.0           | 0.9472 | 14800 | 0.0001          | 1.0      |
+| 0.0           | 0.9536 | 14900 | 0.0001          | 1.0      |
+| 0.0           | 0.96   | 15000 | 0.0001          | 1.0      |
+| 0.0           | 0.9664 | 15100 | 0.0001          | 1.0      |
+| 0.0           | 0.9728 | 15200 | 0.0001          | 1.0      |
+| 0.0           | 0.9792 | 15300 | 0.0001          | 1.0      |
+| 0.0           | 0.9856 | 15400 | 0.0001          | 1.0      |
+| 0.0           | 0.992  | 15500 | 0.0001          | 1.0      |
+| 0.0           | 0.9984 | 15600 | 0.0001          | 1.0      |
+### Framework versions
+- Transformers 4.50.3
+- Pytorch 2.6.0+cu124
+- Tokenizers 0.21.1

config.json ADDED Viewed

	@@ -0,0 +1,19 @@

+{
+  "architectures": [
+    "NanoGPT"
+  ],
+  "bias": true,
+  "block_size": 256,
+  "dropout": 0.0,
+  "mlp_dim": 4,
+  "model_type": "nanogpt",
+  "n_embd": 384,
+  "n_head": 6,
+  "n_layer": 6,
+  "nonlinearity": "RELU",
+  "torch_dtype": "float32",
+  "transformers_version": "4.50.3",
+  "use_NoPE": true,
+  "use_layernorm": true,
+  "vocab_size": 14
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b91c22f10a97ddc01172f18430123188f70ff048930fe9182dead1fb8b5c32d3
+size 42640744

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c61c8a26dc92f0ab4088af36a6234e5861944bbfeaafc8c8fb74517eb6391c81
+size 5368