Model save

Files changed (4) hide show

README.md CHANGED Viewed

@@ -14,7 +14,7 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 3.4368
 ## Model description
@@ -46,9 +46,21 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch | Step  | Validation Loss |
 |:-------------:|:-----:|:-----:|:---------------:|
-| 3.8209        | 0.32  | 5000  | 3.7695          |
-| 3.6028        | 0.64  | 10000 | 3.5545          |
-| 3.4972        | 0.96  | 15000 | 3.4368          |
 ### Framework versions
@@ -56,4 +68,4 @@ The following hyperparameters were used during training:
 - Transformers 4.57.3
 - Pytorch 2.8.0+cu128
 - Datasets 4.4.2
-- Tokenizers 0.22.1

 This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 3.2775
 ## Model description
 | Training Loss | Epoch | Step  | Validation Loss |
 |:-------------:|:-----:|:-----:|:---------------:|
+| 3.8791        | 0.064 | 5000  | 3.8161          |
+| 3.6503        | 0.128 | 10000 | 3.6013          |
+| 3.5604        | 0.192 | 15000 | 3.5130          |
+| 3.5135        | 0.256 | 20000 | 3.4653          |
+| 3.4816        | 0.32  | 25000 | 3.4333          |
+| 3.4655        | 0.384 | 30000 | 3.4093          |
+| 3.4452        | 0.448 | 35000 | 3.3904          |
+| 3.4262        | 0.512 | 40000 | 3.3756          |
+| 3.419         | 0.576 | 45000 | 3.3654          |
+| 3.4132        | 0.64  | 50000 | 3.3561          |
+| 3.4032        | 0.704 | 55000 | 3.3452          |
+| 3.3956        | 0.768 | 60000 | 3.3338          |
+| 3.3719        | 0.832 | 65000 | 3.3136          |
+| 3.3482        | 0.896 | 70000 | 3.2922          |
+| 3.3375        | 0.96  | 75000 | 3.2775          |
 ### Framework versions
 - Transformers 4.57.3
 - Pytorch 2.8.0+cu128
 - Datasets 4.4.2
+- Tokenizers 0.22.2

config.json CHANGED Viewed

@@ -19,25 +19,6 @@
   "hidden_size": 512,
   "initializer_range": 0.02,
   "intermediate_size": 1536,
-  "layer_types": [
-    "linear_attention",
-    "linear_attention",
-    "linear_attention",
-    "full_attention",
-    "linear_attention",
-    "linear_attention",
-    "linear_attention",
-    "full_attention",
-    "linear_attention",
-    "linear_attention",
-    "linear_attention",
-    "full_attention"
-  ],
-  "linear_conv_kernel_dim": 4,
-  "linear_key_head_dim": 32,
-  "linear_num_key_heads": 8,
-  "linear_num_value_heads": 16,
-  "linear_value_head_dim": 32,
   "max_position_embeddings": 512,
   "model_type": "neollm",
   "num_attention_heads": 8,

   "hidden_size": 512,
   "initializer_range": 0.02,
   "intermediate_size": 1536,
   "max_position_embeddings": 512,
   "model_type": "neollm",
   "num_attention_heads": 8,

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a0f6ad44388434d4fe9ef84940511ab9a2485efe43a14e57790e76a8c77aa893
-size 253937864

 version https://git-lfs.github.com/spec/v1
+oid sha256:afc4d41bac48f2feada9e94c86628da5ad57dd2b598894587f3e670a8829ead9
+size 251027488

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:950fb1b5260277fb5b4fbd5519494da796a7eb3cdae7696248c9d342a80a4151
 size 6033

 version https://git-lfs.github.com/spec/v1
+oid sha256:91e9caacc299cd4001e307f245f5aedad5d2d18695dcc7365ced582b4aab68a6
 size 6033