Model save

Browse files

Files changed (4) hide show

README.md +215 -0
config.json +19 -0
model.safetensors +3 -0
training_args.bin +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,215 @@

+---
+library_name: transformers
+tags:
+- generated_from_trainer
+metrics:
+- accuracy
+model-index:
+- name: reverseadd_lr5e-4_batch128_train1-16_eval19
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# reverseadd_lr5e-4_batch128_train1-16_eval19
+This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.0282
+- Accuracy: 0.9071
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0005
+- train_batch_size: 128
+- eval_batch_size: 512
+- seed: 23452399
+- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_ratio: 0.1
+- num_epochs: 1
+### Training results
+| Training Loss | Epoch  | Step  | Validation Loss | Accuracy |
+|:-------------:|:------:|:-----:|:---------------:|:--------:|
+| No log        | 0      | 0     | 2.6749          | 0.0      |
+| 2.2847        | 0.0064 | 100   | 2.3390          | 0.0      |
+| 2.2888        | 0.0128 | 200   | 2.2967          | 0.0      |
+| 2.1023        | 0.0192 | 300   | 2.2337          | 0.0      |
+| 2.2718        | 0.0256 | 400   | 2.2355          | 0.0      |
+| 2.0874        | 0.032  | 500   | 2.2691          | 0.0      |
+| 1.9163        | 0.0384 | 600   | 2.1325          | 0.0      |
+| 1.862         | 0.0448 | 700   | 2.2135          | 0.0      |
+| 1.6661        | 0.0512 | 800   | 2.0404          | 0.0      |
+| 1.5156        | 0.0576 | 900   | 2.4042          | 0.0      |
+| 1.4395        | 0.064  | 1000  | 1.5664          | 0.0      |
+| 1.8082        | 0.0704 | 1100  | 1.8056          | 0.0      |
+| 1.4478        | 0.0768 | 1200  | 1.4878          | 0.0003   |
+| 1.328         | 0.0832 | 1300  | 1.4281          | 0.0      |
+| 1.4386        | 0.0896 | 1400  | 1.5305          | 0.0      |
+| 1.2565        | 0.096  | 1500  | 1.4687          | 0.0      |
+| 1.3237        | 0.1024 | 1600  | 1.4644          | 0.0      |
+| 1.398         | 0.1088 | 1700  | 1.4478          | 0.0      |
+| 1.2262        | 0.1152 | 1800  | 1.3935          | 0.0      |
+| 1.1314        | 0.1216 | 1900  | 1.3781          | 0.0      |
+| 1.1959        | 0.128  | 2000  | 1.6554          | 0.0      |
+| 1.1514        | 0.1344 | 2100  | 1.6594          | 0.0      |
+| 1.0965        | 0.1408 | 2200  | 1.4454          | 0.0      |
+| 1.0504        | 0.1472 | 2300  | 1.2525          | 0.0      |
+| 1.1241        | 0.1536 | 2400  | 1.2877          | 0.0002   |
+| 1.0926        | 0.16   | 2500  | 1.2335          | 0.0031   |
+| 1.0979        | 0.1664 | 2600  | 1.2177          | 0.0013   |
+| 1.1338        | 0.1728 | 2700  | 1.3293          | 0.0      |
+| 1.0675        | 0.1792 | 2800  | 1.3052          | 0.0012   |
+| 1.1648        | 0.1856 | 2900  | 1.3016          | 0.0      |
+| 1.23          | 0.192  | 3000  | 1.4616          | 0.0001   |
+| 1.0822        | 0.1984 | 3100  | 1.2298          | 0.0006   |
+| 1.0924        | 0.2048 | 3200  | 1.2667          | 0.0023   |
+| 1.2845        | 0.2112 | 3300  | 2.0913          | 0.0      |
+| 0.9362        | 0.2176 | 3400  | 1.1774          | 0.0009   |
+| 0.7342        | 0.224  | 3500  | 1.0416          | 0.0101   |
+| 0.4629        | 0.2304 | 3600  | 1.1162          | 0.0029   |
+| 0.2339        | 0.2368 | 3700  | 1.0761          | 0.0113   |
+| 0.3522        | 0.2432 | 3800  | 0.9133          | 0.021    |
+| 0.1289        | 0.2496 | 3900  | 0.7999          | 0.0334   |
+| 0.2369        | 0.256  | 4000  | 1.2702          | 0.0273   |
+| 0.2972        | 0.2624 | 4100  | 0.8842          | 0.0024   |
+| 0.328         | 0.2688 | 4200  | 1.4525          | 0.0038   |
+| 0.1071        | 0.2752 | 4300  | 0.4093          | 0.1764   |
+| 0.2717        | 0.2816 | 4400  | 0.4844          | 0.1486   |
+| 0.1862        | 0.288  | 4500  | 0.7234          | 0.1626   |
+| 0.1448        | 0.2944 | 4600  | 0.6820          | 0.0589   |
+| 0.1097        | 0.3008 | 4700  | 0.5829          | 0.1356   |
+| 0.8399        | 0.3072 | 4800  | 1.1572          | 0.0952   |
+| 0.0855        | 0.3136 | 4900  | 0.6682          | 0.1126   |
+| 0.3689        | 0.32   | 5000  | 0.6060          | 0.1014   |
+| 0.0803        | 0.3264 | 5100  | 0.5115          | 0.1708   |
+| 0.1108        | 0.3328 | 5200  | 0.6770          | 0.2847   |
+| 0.1968        | 0.3392 | 5300  | 0.7723          | 0.066    |
+| 0.0271        | 0.3456 | 5400  | 0.3863          | 0.2103   |
+| 0.249         | 0.352  | 5500  | 1.0287          | 0.2347   |
+| 0.4778        | 0.3584 | 5600  | 0.8103          | 0.0657   |
+| 0.1905        | 0.3648 | 5700  | 0.9973          | 0.0024   |
+| 0.0915        | 0.3712 | 5800  | 0.3592          | 0.2475   |
+| 0.4359        | 0.3776 | 5900  | 0.7012          | 0.0446   |
+| 0.2581        | 0.384  | 6000  | 1.0053          | 0.016    |
+| 0.0725        | 0.3904 | 6100  | 0.9440          | 0.1805   |
+| 0.0314        | 0.3968 | 6200  | 0.8534          | 0.0391   |
+| 0.1078        | 0.4032 | 6300  | 0.8485          | 0.0766   |
+| 0.1192        | 0.4096 | 6400  | 0.7595          | 0.081    |
+| 0.0313        | 0.416  | 6500  | 0.7632          | 0.1392   |
+| 0.0913        | 0.4224 | 6600  | 0.9090          | 0.0457   |
+| 0.0698        | 0.4288 | 6700  | 0.6446          | 0.1377   |
+| 0.0634        | 0.4352 | 6800  | 0.5044          | 0.1722   |
+| 0.0206        | 0.4416 | 6900  | 0.4350          | 0.1596   |
+| 0.0541        | 0.448  | 7000  | 0.5126          | 0.1993   |
+| 0.0309        | 0.4544 | 7100  | 0.3665          | 0.2323   |
+| 0.014         | 0.4608 | 7200  | 0.2903          | 0.3326   |
+| 0.6508        | 0.4672 | 7300  | 1.3151          | 0.0301   |
+| 0.2519        | 0.4736 | 7400  | 0.8930          | 0.0963   |
+| 0.0093        | 0.48   | 7500  | 0.8081          | 0.2188   |
+| 0.0184        | 0.4864 | 7600  | 0.4988          | 0.1794   |
+| 0.037         | 0.4928 | 7700  | 0.7907          | 0.0773   |
+| 0.0141        | 0.4992 | 7800  | 0.5505          | 0.1646   |
+| 0.0076        | 0.5056 | 7900  | 0.2600          | 0.3208   |
+| 0.0111        | 0.512  | 8000  | 0.4490          | 0.2233   |
+| 0.0167        | 0.5184 | 8100  | 1.8546          | 0.0175   |
+| 0.0054        | 0.5248 | 8200  | 0.7066          | 0.0855   |
+| 0.0023        | 0.5312 | 8300  | 0.3227          | 0.194    |
+| 0.0027        | 0.5376 | 8400  | 0.3176          | 0.3663   |
+| 0.0019        | 0.544  | 8500  | 0.3129          | 0.2532   |
+| 0.0095        | 0.5504 | 8600  | 0.4997          | 0.3292   |
+| 0.0042        | 0.5568 | 8700  | 0.5050          | 0.3324   |
+| 0.0507        | 0.5632 | 8800  | 0.2728          | 0.432    |
+| 0.0021        | 0.5696 | 8900  | 0.1853          | 0.4227   |
+| 0.0098        | 0.576  | 9000  | 0.1531          | 0.4567   |
+| 0.0021        | 0.5824 | 9100  | 0.1294          | 0.563    |
+| 0.0003        | 0.5888 | 9200  | 0.1399          | 0.6031   |
+| 0.0027        | 0.5952 | 9300  | 0.5910          | 0.0974   |
+| 0.0019        | 0.6016 | 9400  | 0.3085          | 0.3276   |
+| 0.0001        | 0.608  | 9500  | 0.1920          | 0.5483   |
+| 0.0073        | 0.6144 | 9600  | 0.3350          | 0.1231   |
+| 0.0011        | 0.6208 | 9700  | 0.1745          | 0.6341   |
+| 0.0006        | 0.6272 | 9800  | 0.2882          | 0.3489   |
+| 0.0017        | 0.6336 | 9900  | 0.3170          | 0.2057   |
+| 0.0007        | 0.64   | 10000 | 0.2683          | 0.3683   |
+| 0.0006        | 0.6464 | 10100 | 0.0489          | 0.8182   |
+| 0.036         | 0.6528 | 10200 | 0.0854          | 0.5987   |
+| 0.0001        | 0.6592 | 10300 | 0.5204          | 0.3131   |
+| 0.0003        | 0.6656 | 10400 | 0.4601          | 0.154    |
+| 0.0001        | 0.672  | 10500 | 0.0270          | 0.8961   |
+| 0.0           | 0.6784 | 10600 | 0.1030          | 0.5611   |
+| 0.0           | 0.6848 | 10700 | 0.0703          | 0.6982   |
+| 0.0           | 0.6912 | 10800 | 0.0639          | 0.763    |
+| 0.0           | 0.6976 | 10900 | 0.0443          | 0.8296   |
+| 0.0           | 0.704  | 11000 | 0.0430          | 0.8349   |
+| 0.0           | 0.7104 | 11100 | 0.0416          | 0.843    |
+| 0.0           | 0.7168 | 11200 | 0.0432          | 0.8373   |
+| 0.0           | 0.7232 | 11300 | 0.0436          | 0.8393   |
+| 0.0           | 0.7296 | 11400 | 0.0473          | 0.8207   |
+| 0.0           | 0.736  | 11500 | 0.0453          | 0.8337   |
+| 0.0           | 0.7424 | 11600 | 0.0434          | 0.8434   |
+| 0.0           | 0.7488 | 11700 | 0.0379          | 0.8742   |
+| 0.0           | 0.7552 | 11800 | 0.0386          | 0.8712   |
+| 0.0           | 0.7616 | 11900 | 0.0375          | 0.8765   |
+| 0.0           | 0.768  | 12000 | 0.0424          | 0.8793   |
+| 0.0           | 0.7744 | 12100 | 0.0409          | 0.8846   |
+| 0.0           | 0.7808 | 12200 | 0.0384          | 0.8901   |
+| 0.0           | 0.7872 | 12300 | 0.0374          | 0.8927   |
+| 0.0           | 0.7936 | 12400 | 0.0369          | 0.8938   |
+| 0.0           | 0.8    | 12500 | 0.0366          | 0.8945   |
+| 0.0           | 0.8064 | 12600 | 0.0363          | 0.8956   |
+| 0.0           | 0.8128 | 12700 | 0.0358          | 0.8975   |
+| 0.0           | 0.8192 | 12800 | 0.0362          | 0.897    |
+| 0.0           | 0.8256 | 12900 | 0.0355          | 0.8991   |
+| 0.0           | 0.832  | 13000 | 0.0346          | 0.9008   |
+| 0.0           | 0.8384 | 13100 | 0.0343          | 0.9016   |
+| 0.0           | 0.8448 | 13200 | 0.0314          | 0.8925   |
+| 0.0           | 0.8512 | 13300 | 0.0316          | 0.8908   |
+| 0.0           | 0.8576 | 13400 | 0.0318          | 0.8912   |
+| 0.0           | 0.864  | 13500 | 0.0315          | 0.892    |
+| 0.0           | 0.8704 | 13600 | 0.0312          | 0.8937   |
+| 0.0           | 0.8768 | 13700 | 0.0296          | 0.9001   |
+| 0.0           | 0.8832 | 13800 | 0.0292          | 0.9017   |
+| 0.0           | 0.8896 | 13900 | 0.0290          | 0.9026   |
+| 0.0           | 0.896  | 14000 | 0.0292          | 0.9018   |
+| 0.0           | 0.9024 | 14100 | 0.0290          | 0.9027   |
+| 0.0           | 0.9088 | 14200 | 0.0295          | 0.8995   |
+| 0.0           | 0.9152 | 14300 | 0.0294          | 0.9002   |
+| 0.0           | 0.9216 | 14400 | 0.0293          | 0.9006   |
+| 0.0           | 0.928  | 14500 | 0.0292          | 0.9009   |
+| 0.0           | 0.9344 | 14600 | 0.0291          | 0.9013   |
+| 0.0           | 0.9408 | 14700 | 0.0288          | 0.9031   |
+| 0.0           | 0.9472 | 14800 | 0.0287          | 0.9032   |
+| 0.0           | 0.9536 | 14900 | 0.0286          | 0.9035   |
+| 0.0           | 0.96   | 15000 | 0.0286          | 0.9037   |
+| 0.0           | 0.9664 | 15100 | 0.0285          | 0.9044   |
+| 0.0           | 0.9728 | 15200 | 0.0285          | 0.9047   |
+| 0.0           | 0.9792 | 15300 | 0.0285          | 0.9047   |
+| 0.0           | 0.9856 | 15400 | 0.0282          | 0.9072   |
+| 0.0           | 0.992  | 15500 | 0.0282          | 0.9071   |
+| 0.0           | 0.9984 | 15600 | 0.0282          | 0.9071   |
+### Framework versions
+- Transformers 4.50.3
+- Pytorch 2.6.0+cu124
+- Tokenizers 0.21.1

config.json ADDED Viewed

	@@ -0,0 +1,19 @@

+{
+  "architectures": [
+    "NanoGPT"
+  ],
+  "bias": true,
+  "block_size": 256,
+  "dropout": 0.0,
+  "mlp_dim": 4,
+  "model_type": "nanogpt",
+  "n_embd": 384,
+  "n_head": 6,
+  "n_layer": 6,
+  "nonlinearity": "RELU",
+  "torch_dtype": "float32",
+  "transformers_version": "4.50.3",
+  "use_NoPE": true,
+  "use_layernorm": true,
+  "vocab_size": 14
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5d617b9e6f20f2ff74a32eb29afd1bcffd8e83e1ff367fea6b7ca390ea5bf078
+size 42640744

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6c8235c7d025b4a366f5621754f6f0e226cebb9921555e8d379361dd404967e1
+size 5368