End of training

Browse files

Files changed (5) hide show

README.md +18 -19
config.json +2 -3
model.safetensors +2 -2
tokenizer.json +2 -2
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -1,6 +1,5 @@
 ---
 library_name: transformers
-base_model: Ellio98/mistral-0.5B-base
 tags:
 - generated_from_trainer
 model-index:
@@ -13,9 +12,9 @@ should probably proofread and complete it, then remove this comment. -->
 # mistral-0.5B-base
-This model is a fine-tuned version of [Ellio98/mistral-0.5B-base](https://huggingface.co/Ellio98/mistral-0.5B-base) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 2.0625
 ## Model description
@@ -35,30 +34,30 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 0.0005
-- train_batch_size: 4
-- eval_batch_size: 4
 - seed: 42
 - gradient_accumulation_steps: 4
-- total_train_batch_size: 16
 - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 100
-- num_epochs: 1
 ### Training results
-| Training Loss | Epoch | Step | Validation Loss |
-|:-------------:|:-----:|:----:|:---------------:|
-| 2.4315        | 0.1   | 129  | 2.4184          |
-| 2.3834        | 0.2   | 258  | 2.3990          |
-| 2.4535        | 0.3   | 387  | 2.3923          |
-| 2.2178        | 0.4   | 516  | 2.3198          |
-| 2.3863        | 0.5   | 645  | 2.2612          |
-| 2.2739        | 0.6   | 774  | 2.2014          |
-| 2.0353        | 0.7   | 903  | 2.1402          |
-| 2.1386        | 0.8   | 1032 | 2.0911          |
-| 2.0759        | 0.9   | 1161 | 2.0672          |
-| 2.1736        | 1.0   | 1290 | 2.0625          |
 ### Framework versions

 ---
 library_name: transformers
 tags:
 - generated_from_trainer
 model-index:
 # mistral-0.5B-base
+This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 3.8893
 ## Model description
 The following hyperparameters were used during training:
 - learning_rate: 0.0005
+- train_batch_size: 16
+- eval_batch_size: 16
 - seed: 42
 - gradient_accumulation_steps: 4
+- total_train_batch_size: 64
 - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 100
+- num_epochs: 4
 ### Training results
+| Training Loss | Epoch  | Step | Validation Loss |
+|:-------------:|:------:|:----:|:---------------:|
+| 5.9695        | 0.3989 | 74   | 6.0154          |
+| 5.0148        | 0.7978 | 148  | 5.1781          |
+| 4.459         | 1.1941 | 222  | 4.7603          |
+| 4.0585        | 1.5930 | 296  | 4.4421          |
+| 3.8404        | 1.9919 | 370  | 4.2375          |
+| 3.1164        | 2.3881 | 444  | 4.0944          |
+| 2.4948        | 2.7871 | 518  | 3.9528          |
+| 1.6126        | 3.1833 | 592  | 3.9094          |
+| 1.5873        | 3.5822 | 666  | 3.8964          |
+| 1.4401        | 3.9811 | 740  | 3.8893          |
 ### Framework versions

config.json CHANGED Viewed

@@ -1,5 +1,4 @@
 {
-  "_name_or_path": "Ellio98/mistral-0.5B-base",
   "architectures": [
     "MistralForCausalLM"
   ],
@@ -16,7 +15,7 @@
   "hidden_size": 1536,
   "initializer_range": 0.02,
   "intermediate_size": 4096,
-  "max_position_embeddings": 2048,
   "model_type": "mistral",
   "num_attention_heads": 8,
   "num_hidden_layers": 16,
@@ -29,5 +28,5 @@
   "torch_dtype": "float32",
   "transformers_version": "4.47.0",
   "use_cache": true,
-  "vocab_size": 32000
 }

 {
   "architectures": [
     "MistralForCausalLM"
   ],
   "hidden_size": 1536,
   "initializer_range": 0.02,
   "intermediate_size": 4096,
+  "max_position_embeddings": 4096,
   "model_type": "mistral",
   "num_attention_heads": 8,
   "num_hidden_layers": 16,
   "torch_dtype": "float32",
   "transformers_version": "4.47.0",
   "use_cache": true,
+  "vocab_size": 32768
 }

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:7c00fb6335ef2b3b089ee315a68c6a89ee11c84075715fa298e20e22ddd1a3c3
-size 2054379856

 version https://git-lfs.github.com/spec/v1
+oid sha256:8839c68dc06b4ea5c30fe89f930c3952588ba17896a32c25ac0a30c518977b79
+size 2063817040

tokenizer.json CHANGED Viewed

@@ -2,13 +2,13 @@
   "version": "1.0",
   "truncation": {
     "direction": "Right",
-    "max_length": 512,
     "strategy": "LongestFirst",
     "stride": 0
   },
   "padding": {
     "strategy": {
-      "Fixed": 512
     },
     "direction": "Left",
     "pad_to_multiple_of": null,

   "version": "1.0",
   "truncation": {
     "direction": "Right",
+    "max_length": 256,
     "strategy": "LongestFirst",
     "stride": 0
   },
   "padding": {
     "strategy": {
+      "Fixed": 256
     },
     "direction": "Left",
     "pad_to_multiple_of": null,

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9ee1aca07e2c000d45f1af350f62cf6b3c2038c85f5c343accf1e2064ef3cee1
 size 5304

 version https://git-lfs.github.com/spec/v1
+oid sha256:79aead2ffe48c9db3adaebd26efae9c67c63713bea0729b7b90665938d5c5a3c
 size 5304