KitsuVp commited on
Commit
a76060b
·
verified ·
1 Parent(s): 0249b1a

Model save

Browse files
Files changed (4) hide show
  1. README.md +17 -5
  2. config.json +0 -19
  3. model.safetensors +2 -2
  4. training_args.bin +1 -1
README.md CHANGED
@@ -14,7 +14,7 @@ should probably proofread and complete it, then remove this comment. -->
14
 
15
  This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
16
  It achieves the following results on the evaluation set:
17
- - Loss: 3.4368
18
 
19
  ## Model description
20
 
@@ -46,9 +46,21 @@ The following hyperparameters were used during training:
46
 
47
  | Training Loss | Epoch | Step | Validation Loss |
48
  |:-------------:|:-----:|:-----:|:---------------:|
49
- | 3.8209 | 0.32 | 5000 | 3.7695 |
50
- | 3.6028 | 0.64 | 10000 | 3.5545 |
51
- | 3.4972 | 0.96 | 15000 | 3.4368 |
 
 
 
 
 
 
 
 
 
 
 
 
52
 
53
 
54
  ### Framework versions
@@ -56,4 +68,4 @@ The following hyperparameters were used during training:
56
  - Transformers 4.57.3
57
  - Pytorch 2.8.0+cu128
58
  - Datasets 4.4.2
59
- - Tokenizers 0.22.1
 
14
 
15
  This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
16
  It achieves the following results on the evaluation set:
17
+ - Loss: 3.2775
18
 
19
  ## Model description
20
 
 
46
 
47
  | Training Loss | Epoch | Step | Validation Loss |
48
  |:-------------:|:-----:|:-----:|:---------------:|
49
+ | 3.8791 | 0.064 | 5000 | 3.8161 |
50
+ | 3.6503 | 0.128 | 10000 | 3.6013 |
51
+ | 3.5604 | 0.192 | 15000 | 3.5130 |
52
+ | 3.5135 | 0.256 | 20000 | 3.4653 |
53
+ | 3.4816 | 0.32 | 25000 | 3.4333 |
54
+ | 3.4655 | 0.384 | 30000 | 3.4093 |
55
+ | 3.4452 | 0.448 | 35000 | 3.3904 |
56
+ | 3.4262 | 0.512 | 40000 | 3.3756 |
57
+ | 3.419 | 0.576 | 45000 | 3.3654 |
58
+ | 3.4132 | 0.64 | 50000 | 3.3561 |
59
+ | 3.4032 | 0.704 | 55000 | 3.3452 |
60
+ | 3.3956 | 0.768 | 60000 | 3.3338 |
61
+ | 3.3719 | 0.832 | 65000 | 3.3136 |
62
+ | 3.3482 | 0.896 | 70000 | 3.2922 |
63
+ | 3.3375 | 0.96 | 75000 | 3.2775 |
64
 
65
 
66
  ### Framework versions
 
68
  - Transformers 4.57.3
69
  - Pytorch 2.8.0+cu128
70
  - Datasets 4.4.2
71
+ - Tokenizers 0.22.2
config.json CHANGED
@@ -19,25 +19,6 @@
19
  "hidden_size": 512,
20
  "initializer_range": 0.02,
21
  "intermediate_size": 1536,
22
- "layer_types": [
23
- "linear_attention",
24
- "linear_attention",
25
- "linear_attention",
26
- "full_attention",
27
- "linear_attention",
28
- "linear_attention",
29
- "linear_attention",
30
- "full_attention",
31
- "linear_attention",
32
- "linear_attention",
33
- "linear_attention",
34
- "full_attention"
35
- ],
36
- "linear_conv_kernel_dim": 4,
37
- "linear_key_head_dim": 32,
38
- "linear_num_key_heads": 8,
39
- "linear_num_value_heads": 16,
40
- "linear_value_head_dim": 32,
41
  "max_position_embeddings": 512,
42
  "model_type": "neollm",
43
  "num_attention_heads": 8,
 
19
  "hidden_size": 512,
20
  "initializer_range": 0.02,
21
  "intermediate_size": 1536,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
  "max_position_embeddings": 512,
23
  "model_type": "neollm",
24
  "num_attention_heads": 8,
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a0f6ad44388434d4fe9ef84940511ab9a2485efe43a14e57790e76a8c77aa893
3
- size 253937864
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:afc4d41bac48f2feada9e94c86628da5ad57dd2b598894587f3e670a8829ead9
3
+ size 251027488
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:950fb1b5260277fb5b4fbd5519494da796a7eb3cdae7696248c9d342a80a4151
3
  size 6033
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:91e9caacc299cd4001e307f245f5aedad5d2d18695dcc7365ced582b4aab68a6
3
  size 6033