End of training

Browse files

Files changed (7) hide show

README.md +53 -53
adapter_config.json +2 -2
adapter_model.safetensors +1 -1
runs/Nov28_11-07-15_localhost/events.out.tfevents.1732772236.localhost +2 -2
runs/Nov29_03-49-51_localhost/events.out.tfevents.1732832393.localhost +3 -0
runs/Nov29_06-05-36_localhost/events.out.tfevents.1732840538.localhost +3 -0
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [bigcode/starcoderbase-1b](https://huggingface.co/bigcode/starcoderbase-1b) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 1.0382
 ## Model description
@@ -43,63 +43,63 @@ The following hyperparameters were used during training:
 - total_train_batch_size: 16
 - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
-- lr_scheduler_warmup_steps: 10
-- training_steps: 500
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
-| 1.276         | 0.0416 | 10   | 1.0832          |
-| 1.2151        | 0.0833 | 20   | 1.0789          |
-| 1.2213        | 0.1249 | 30   | 1.0736          |
-| 1.1694        | 0.1666 | 40   | 1.0718          |
-| 1.2627        | 0.2082 | 50   | 1.0696          |
-| 1.1801        | 0.2499 | 60   | 1.0690          |
-| 1.1013        | 0.2915 | 70   | 1.0637          |
-| 1.082         | 0.3332 | 80   | 1.0643          |
-| 1.0783        | 0.3748 | 90   | 1.0638          |
-| 1.1166        | 0.4164 | 100  | 1.0615          |
-| 1.1207        | 0.4581 | 110  | 1.0581          |
-| 1.197         | 0.4997 | 120  | 1.0562          |
-| 1.012         | 0.5414 | 130  | 1.0583          |
-| 1.1291        | 0.5830 | 140  | 1.0515          |
-| 1.0695        | 0.6247 | 150  | 1.0520          |
-| 1.0924        | 0.6663 | 160  | 1.0514          |
-| 1.1287        | 0.7080 | 170  | 1.0536          |
-| 1.0514        | 0.7496 | 180  | 1.0508          |
-| 1.1101        | 0.7913 | 190  | 1.0491          |
-| 1.1474        | 0.8329 | 200  | 1.0489          |
-| 1.1451        | 0.8745 | 210  | 1.0476          |
-| 1.1688        | 0.9162 | 220  | 1.0434          |
-| 1.053         | 0.9578 | 230  | 1.0447          |
-| 1.0146        | 0.9995 | 240  | 1.0438          |
-| 1.1127        | 1.0411 | 250  | 1.0442          |
-| 0.9734        | 1.0828 | 260  | 1.0420          |
-| 1.0315        | 1.1244 | 270  | 1.0445          |
-| 1.0803        | 1.1661 | 280  | 1.0435          |
-| 1.0892        | 1.2077 | 290  | 1.0440          |
-| 1.0191        | 1.2493 | 300  | 1.0427          |
-| 1.034         | 1.2910 | 310  | 1.0416          |
-| 1.1136        | 1.3326 | 320  | 1.0413          |
-| 0.9837        | 1.3743 | 330  | 1.0413          |
-| 1.0659        | 1.4159 | 340  | 1.0405          |
-| 0.9931        | 1.4576 | 350  | 1.0409          |
-| 1.1141        | 1.4992 | 360  | 1.0403          |
-| 1.0851        | 1.5409 | 370  | 1.0399          |
-| 1.053         | 1.5825 | 380  | 1.0390          |
-| 1.0652        | 1.6242 | 390  | 1.0395          |
-| 1.0998        | 1.6658 | 400  | 1.0396          |
-| 0.9909        | 1.7074 | 410  | 1.0390          |
-| 1.0946        | 1.7491 | 420  | 1.0386          |
-| 1.0471        | 1.7907 | 430  | 1.0382          |
-| 0.9719        | 1.8324 | 440  | 1.0382          |
-| 1.0641        | 1.8740 | 450  | 1.0382          |
-| 1.0003        | 1.9157 | 460  | 1.0383          |
-| 1.0128        | 1.9573 | 470  | 1.0383          |
-| 1.0637        | 1.9990 | 480  | 1.0384          |
-| 1.0583        | 2.0406 | 490  | 1.0383          |
-| 0.991         | 2.0822 | 500  | 1.0382          |
 ### Framework versions

 This model is a fine-tuned version of [bigcode/starcoderbase-1b](https://huggingface.co/bigcode/starcoderbase-1b) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.8901
 ## Model description
 - total_train_batch_size: 16
 - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 20
+- training_steps: 1000
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
+| 1.0733        | 0.1631 | 20   | 0.9622          |
+| 1.0649        | 0.3262 | 40   | 0.9528          |
+| 1.0324        | 0.4893 | 60   | 0.9462          |
+| 1.0216        | 0.6524 | 80   | 0.9424          |
+| 1.0067        | 0.8155 | 100  | 0.9368          |
+| 0.9977        | 0.9786 | 120  | 0.9329          |
+| 0.97          | 1.1458 | 140  | 0.9302          |
+| 0.9085        | 1.3089 | 160  | 0.9279          |
+| 0.934         | 1.4720 | 180  | 0.9233          |
+| 1.0061        | 1.6351 | 200  | 0.9184          |
+| 0.9564        | 1.7982 | 220  | 0.9165          |
+| 0.9738        | 1.9613 | 240  | 0.9126          |
+| 0.8864        | 2.1284 | 260  | 0.9114          |
+| 0.9144        | 2.2915 | 280  | 0.9113          |
+| 0.9443        | 2.4546 | 300  | 0.9098          |
+| 0.9444        | 2.6177 | 320  | 0.9083          |
+| 0.887         | 2.7808 | 340  | 0.9058          |
+| 0.9398        | 2.9439 | 360  | 0.9052          |
+| 0.9015        | 3.1111 | 380  | 0.9031          |
+| 0.8536        | 3.2742 | 400  | 0.9024          |
+| 0.8765        | 3.4373 | 420  | 0.9002          |
+| 0.9198        | 3.6004 | 440  | 0.8997          |
+| 0.9468        | 3.7635 | 460  | 0.8989          |
+| 0.8631        | 3.9266 | 480  | 0.8978          |
+| 0.8777        | 4.0938 | 500  | 0.8977          |
+| 0.9006        | 4.2569 | 520  | 0.8959          |
+| 0.8768        | 4.4200 | 540  | 0.8957          |
+| 0.8477        | 4.5831 | 560  | 0.8951          |
+| 0.9061        | 4.7462 | 580  | 0.8937          |
+| 0.8837        | 4.9093 | 600  | 0.8930          |
+| 0.8402        | 5.0765 | 620  | 0.8939          |
+| 0.8608        | 5.2396 | 640  | 0.8931          |
+| 0.879         | 5.4027 | 660  | 0.8928          |
+| 0.8562        | 5.5657 | 680  | 0.8922          |
+| 0.8776        | 5.7288 | 700  | 0.8913          |
+| 0.8464        | 5.8919 | 720  | 0.8910          |
+| 0.8528        | 6.0591 | 740  | 0.8914          |
+| 0.8538        | 6.2222 | 760  | 0.8910          |
+| 0.8844        | 6.3853 | 780  | 0.8905          |
+| 0.8652        | 6.5484 | 800  | 0.8906          |
+| 0.8443        | 6.7115 | 820  | 0.8905          |
+| 0.8546        | 6.8746 | 840  | 0.8899          |
+| 0.8094        | 7.0418 | 860  | 0.8904          |
+| 0.863         | 7.2049 | 880  | 0.8899          |
+| 0.8642        | 7.3680 | 900  | 0.8902          |
+| 0.8413        | 7.5311 | 920  | 0.8901          |
+| 0.8119        | 7.6942 | 940  | 0.8903          |
+| 0.8909        | 7.8573 | 960  | 0.8901          |
+| 0.8516        | 8.0245 | 980  | 0.8900          |
+| 0.8834        | 8.1876 | 1000 | 0.8901          |
 ### Framework versions

adapter_config.json CHANGED Viewed

@@ -20,10 +20,10 @@
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
     "c_fc",
     "c_proj",
-    "q_attn",
-    "c_attn"
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,

   "rank_pattern": {},
   "revision": null,
   "target_modules": [
+    "c_attn",
     "c_fc",
     "c_proj",
+    "q_attn"
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2a07364278dad56148a64a1d6090cece2cfebb54c2b19abda81ec2c23624e004
 size 22241240

 version https://git-lfs.github.com/spec/v1
+oid sha256:fc0158b656e9ec52b86ca457fa79f91c332e6618c8a24b37f1f483c135cf7e19
 size 22241240

runs/Nov28_11-07-15_localhost/events.out.tfevents.1732772236.localhost CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3307e2d2895063fba13f56743ffd53e7cbba468184df79cf89f8c7bd448f627b
-size 7806

 version https://git-lfs.github.com/spec/v1
+oid sha256:2e8835ce323b1c98186a2d2a3d23e76f3a71fe6da6f30e2003f069b61e8198da
+size 8981

runs/Nov29_03-49-51_localhost/events.out.tfevents.1732832393.localhost ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2fb4bd1982f4a3e39e67bc96304edf322a32de0443dc34dac24180eb17bb4799
+size 14235

runs/Nov29_06-05-36_localhost/events.out.tfevents.1732840538.localhost ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:17e2a4f2f826a82638e45c46b974fc20ea0c219f8a0ef8fdc6a8fdcb10192365
+size 29802

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0b9059703c87f76bcdacfceb1d2e4b0a2d11afc55e0b8d777466b9c9548bd6ff
 size 5304

 version https://git-lfs.github.com/spec/v1
+oid sha256:6ffd811870fb31121203e9dd263ecdbd436864079d4bfaa819d4f05f9d45fd4e
 size 5304