HeRksTAn commited on
Commit
a6daa8c
·
verified ·
1 Parent(s): 093a83c

markat1/mistral-7binstruct-summary-100s

Browse files
README.md CHANGED
@@ -20,7 +20,12 @@ should probably proofread and complete it, then remove this comment. -->
20
 
21
  This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) on the generator dataset.
22
  It achieves the following results on the evaluation set:
23
- - Loss: 1.5987
 
 
 
 
 
24
 
25
  ## Model description
26
 
@@ -39,7 +44,7 @@ More information needed
39
  ### Training hyperparameters
40
 
41
  The following hyperparameters were used during training:
42
- - learning_rate: 4e-05
43
  - train_batch_size: 1
44
  - eval_batch_size: 8
45
  - seed: 42
@@ -47,45 +52,9 @@ The following hyperparameters were used during training:
47
  - total_train_batch_size: 2
48
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
49
  - lr_scheduler_type: cosine
50
- - lr_scheduler_warmup_steps: 0.1
51
  - training_steps: 150
52
 
53
- ### Training results
54
-
55
- | Training Loss | Epoch | Step | Validation Loss |
56
- |:-------------:|:-----:|:----:|:---------------:|
57
- | No log | 0.02 | 5 | 2.4606 |
58
- | 2.4794 | 0.04 | 10 | 2.2401 |
59
- | 2.4794 | 0.06 | 15 | 2.0792 |
60
- | 2.1147 | 0.08 | 20 | 1.9539 |
61
- | 2.1147 | 0.1 | 25 | 1.8629 |
62
- | 1.888 | 0.12 | 30 | 1.8053 |
63
- | 1.888 | 0.14 | 35 | 1.7636 |
64
- | 1.7809 | 0.16 | 40 | 1.7317 |
65
- | 1.7809 | 0.18 | 45 | 1.7081 |
66
- | 1.7043 | 0.2 | 50 | 1.6918 |
67
- | 1.7043 | 0.22 | 55 | 1.6788 |
68
- | 1.6923 | 0.25 | 60 | 1.6678 |
69
- | 1.6923 | 0.27 | 65 | 1.6579 |
70
- | 1.6453 | 0.29 | 70 | 1.6491 |
71
- | 1.6453 | 0.31 | 75 | 1.6416 |
72
- | 1.5791 | 0.33 | 80 | 1.6347 |
73
- | 1.5791 | 0.35 | 85 | 1.6279 |
74
- | 1.6144 | 0.37 | 90 | 1.6219 |
75
- | 1.6144 | 0.39 | 95 | 1.6168 |
76
- | 1.6472 | 0.41 | 100 | 1.6124 |
77
- | 1.6472 | 0.43 | 105 | 1.6086 |
78
- | 1.5904 | 0.45 | 110 | 1.6056 |
79
- | 1.5904 | 0.47 | 115 | 1.6033 |
80
- | 1.5891 | 0.49 | 120 | 1.6016 |
81
- | 1.5891 | 0.51 | 125 | 1.6004 |
82
- | 1.5882 | 0.53 | 130 | 1.5996 |
83
- | 1.5882 | 0.55 | 135 | 1.5991 |
84
- | 1.6018 | 0.57 | 140 | 1.5988 |
85
- | 1.6018 | 0.59 | 145 | 1.5987 |
86
- | 1.581 | 0.61 | 150 | 1.5987 |
87
-
88
-
89
  ### Framework versions
90
 
91
  - PEFT 0.9.0
 
20
 
21
  This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) on the generator dataset.
22
  It achieves the following results on the evaluation set:
23
+ - eval_loss: 2.0198
24
+ - eval_runtime: 34.7344
25
+ - eval_samples_per_second: 2.706
26
+ - eval_steps_per_second: 0.345
27
+ - epoch: 0.06
28
+ - step: 15
29
 
30
  ## Model description
31
 
 
44
  ### Training hyperparameters
45
 
46
  The following hyperparameters were used during training:
47
+ - learning_rate: 2e-05
48
  - train_batch_size: 1
49
  - eval_batch_size: 8
50
  - seed: 42
 
52
  - total_train_batch_size: 2
53
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
54
  - lr_scheduler_type: cosine
55
+ - lr_scheduler_warmup_steps: 0.03
56
  - training_steps: 150
57
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58
  ### Framework versions
59
 
60
  - PEFT 0.9.0
adapter_config.json CHANGED
@@ -9,13 +9,13 @@
9
  "layers_pattern": null,
10
  "layers_to_transform": null,
11
  "loftq_config": {},
12
- "lora_alpha": 16,
13
  "lora_dropout": 0.1,
14
  "megatron_config": null,
15
  "megatron_core": "megatron.core",
16
  "modules_to_save": null,
17
  "peft_type": "LORA",
18
- "r": 8,
19
  "rank_pattern": {},
20
  "revision": null,
21
  "target_modules": [
 
9
  "layers_pattern": null,
10
  "layers_to_transform": null,
11
  "loftq_config": {},
12
+ "lora_alpha": 64,
13
  "lora_dropout": 0.1,
14
  "megatron_config": null,
15
  "megatron_core": "megatron.core",
16
  "modules_to_save": null,
17
  "peft_type": "LORA",
18
+ "r": 32,
19
  "rank_pattern": {},
20
  "revision": null,
21
  "target_modules": [
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:cd820b60b88b9e51818cf5f5d266c456d82a971952320ae8bb106a5b9d2537e3
3
- size 13648432
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c60b8684e4fc174c64ee65169402174c73475b89d27b094a13d5b595591bc30c
3
+ size 54543184
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:80ded19b3f3184c49ad15ef2837df275e08cac6ca7769bfaa777728dd7d25d8b
3
  size 4920
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e328bed096c7f01176f4ccaa21df959a73c03b0f481e67f80ac001bf2d078c94
3
  size 4920