ninagroot commited on
Commit
463c3b3
·
verified ·
1 Parent(s): c72e4b5

ninagroot/Llama-360Mtest

Browse files
README.md CHANGED
@@ -13,7 +13,7 @@ should probably proofread and complete it, then remove this comment. -->
13
 
14
  This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
15
  It achieves the following results on the evaluation set:
16
- - Loss: 4.1915
17
 
18
  ## Model description
19
 
@@ -33,31 +33,25 @@ More information needed
33
 
34
  The following hyperparameters were used during training:
35
  - learning_rate: 0.0003
36
- - train_batch_size: 1
37
  - eval_batch_size: 8
38
  - seed: 42
39
  - gradient_accumulation_steps: 8
40
- - total_train_batch_size: 8
41
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
42
  - lr_scheduler_type: cosine
43
  - lr_scheduler_warmup_steps: 300
44
- - num_epochs: 10
45
  - mixed_precision_training: Native AMP
46
 
47
  ### Training results
48
 
49
  | Training Loss | Epoch | Step | Validation Loss |
50
  |:-------------:|:-----:|:----:|:---------------:|
51
- | 7.042 | 0.99 | 44 | 6.5393 |
52
- | 5.6951 | 1.99 | 88 | 5.4631 |
53
- | 4.7481 | 2.98 | 132 | 4.6474 |
54
- | 4.1761 | 4.0 | 177 | 4.4405 |
55
- | 3.6565 | 4.99 | 221 | 4.3291 |
56
- | 3.5648 | 5.99 | 265 | 4.2850 |
57
- | 3.3644 | 6.98 | 309 | 4.2276 |
58
- | 3.1299 | 8.0 | 354 | 4.2050 |
59
- | 2.5705 | 8.99 | 398 | 4.2062 |
60
- | 2.1843 | 9.94 | 440 | 4.1915 |
61
 
62
 
63
  ### Framework versions
 
13
 
14
  This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
15
  It achieves the following results on the evaluation set:
16
+ - Loss: 8.3245
17
 
18
  ## Model description
19
 
 
33
 
34
  The following hyperparameters were used during training:
35
  - learning_rate: 0.0003
36
+ - train_batch_size: 16
37
  - eval_batch_size: 8
38
  - seed: 42
39
  - gradient_accumulation_steps: 8
40
+ - total_train_batch_size: 128
41
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
42
  - lr_scheduler_type: cosine
43
  - lr_scheduler_warmup_steps: 300
44
+ - num_epochs: 4
45
  - mixed_precision_training: Native AMP
46
 
47
  ### Training results
48
 
49
  | Training Loss | Epoch | Step | Validation Loss |
50
  |:-------------:|:-----:|:----:|:---------------:|
51
+ | No log | 0.89 | 2 | 8.5737 |
52
+ | No log | 1.78 | 4 | 8.5252 |
53
+ | No log | 2.67 | 6 | 8.4412 |
54
+ | No log | 3.56 | 8 | 8.3245 |
 
 
 
 
 
 
55
 
56
 
57
  ### Framework versions
config.json CHANGED
@@ -10,7 +10,7 @@
10
  "hidden_size": 1024,
11
  "initializer_range": 0.02,
12
  "intermediate_size": 3072,
13
- "max_position_embeddings": 200,
14
  "model_type": "llama",
15
  "num_attention_heads": 8,
16
  "num_hidden_layers": 24,
 
10
  "hidden_size": 1024,
11
  "initializer_range": 0.02,
12
  "intermediate_size": 3072,
13
+ "max_position_embeddings": 256,
14
  "model_type": "llama",
15
  "num_attention_heads": 8,
16
  "num_hidden_layers": 24,
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:75005945ce406a49db1b8b4b9aef520f33dd864987fe79a782192a532bf8d76b
3
  size 1344172280
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c4e7e2fdbdb2c9d26bec5fe70b1e741eb30888fa00a93c82d8fd5fdbeb7c94a1
3
  size 1344172280
runs/Mar22_11-42-50_gcn28.local.snellius.surf.nl/events.out.tfevents.1711104184.gcn28.local.snellius.surf.nl.3104221.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d9b34146abf395d823ae4cda1a6ec400ec4651ac7209ac60ffa9413f01c06747
3
+ size 5731
runs/Mar22_11-44-51_gcn14.local.snellius.surf.nl/events.out.tfevents.1711104302.gcn14.local.snellius.surf.nl.856670.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:578eb156726fc0e7e7211b9c6e5b6e5a6ba390bfdcc60d66a2fc502d9fc13e17
3
+ size 5731
tokenizer_config.json CHANGED
@@ -37,7 +37,7 @@
37
  "bos_token": "<s>",
38
  "clean_up_tokenization_spaces": true,
39
  "eos_token": "</s>",
40
- "model_max_length": 100,
41
  "pad_token": "<pad>",
42
  "tokenizer_class": "GPT2Tokenizer",
43
  "unk_token": "<|endoftext|>"
 
37
  "bos_token": "<s>",
38
  "clean_up_tokenization_spaces": true,
39
  "eos_token": "</s>",
40
+ "model_max_length": 128,
41
  "pad_token": "<pad>",
42
  "tokenizer_class": "GPT2Tokenizer",
43
  "unk_token": "<|endoftext|>"
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a8087606d28ebab0c1abebab447a0ae1e9c17fda4ef368b96a83e8f2e6950e66
3
  size 4728
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e3179a2a3fe68b86253b0ba9c42f796efa0b7ead1164a0e56535abe8e14039e7
3
  size 4728