HeRksTAn commited on
Commit
093a83c
·
verified ·
1 Parent(s): aae0f9f

ai-maker-space/mistral-7binstruct-summary-100s

Browse files
README.md CHANGED
@@ -20,7 +20,7 @@ should probably proofread and complete it, then remove this comment. -->
20
 
21
  This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) on the generator dataset.
22
  It achieves the following results on the evaluation set:
23
- - Loss: 1.4145
24
 
25
  ## Model description
26
 
@@ -39,27 +39,57 @@ More information needed
39
  ### Training hyperparameters
40
 
41
  The following hyperparameters were used during training:
42
- - learning_rate: 0.0002
43
  - train_batch_size: 1
44
  - eval_batch_size: 8
45
  - seed: 42
 
 
46
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
47
- - lr_scheduler_type: constant
48
- - lr_scheduler_warmup_steps: 0.03
49
- - training_steps: 50
50
 
51
  ### Training results
52
 
53
  | Training Loss | Epoch | Step | Validation Loss |
54
  |:-------------:|:-----:|:----:|:---------------:|
55
- | 1.6514 | 0.21 | 25 | 1.4966 |
56
- | 1.5057 | 0.41 | 50 | 1.4145 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
 
58
 
59
  ### Framework versions
60
 
61
- - PEFT 0.8.2
62
- - Transformers 4.38.1
63
  - Pytorch 2.2.1+cu121
64
- - Datasets 2.17.1
65
  - Tokenizers 0.15.2
 
20
 
21
  This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) on the generator dataset.
22
  It achieves the following results on the evaluation set:
23
+ - Loss: 1.5987
24
 
25
  ## Model description
26
 
 
39
  ### Training hyperparameters
40
 
41
  The following hyperparameters were used during training:
42
+ - learning_rate: 4e-05
43
  - train_batch_size: 1
44
  - eval_batch_size: 8
45
  - seed: 42
46
+ - gradient_accumulation_steps: 2
47
+ - total_train_batch_size: 2
48
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
49
+ - lr_scheduler_type: cosine
50
+ - lr_scheduler_warmup_steps: 0.1
51
+ - training_steps: 150
52
 
53
  ### Training results
54
 
55
  | Training Loss | Epoch | Step | Validation Loss |
56
  |:-------------:|:-----:|:----:|:---------------:|
57
+ | No log | 0.02 | 5 | 2.4606 |
58
+ | 2.4794 | 0.04 | 10 | 2.2401 |
59
+ | 2.4794 | 0.06 | 15 | 2.0792 |
60
+ | 2.1147 | 0.08 | 20 | 1.9539 |
61
+ | 2.1147 | 0.1 | 25 | 1.8629 |
62
+ | 1.888 | 0.12 | 30 | 1.8053 |
63
+ | 1.888 | 0.14 | 35 | 1.7636 |
64
+ | 1.7809 | 0.16 | 40 | 1.7317 |
65
+ | 1.7809 | 0.18 | 45 | 1.7081 |
66
+ | 1.7043 | 0.2 | 50 | 1.6918 |
67
+ | 1.7043 | 0.22 | 55 | 1.6788 |
68
+ | 1.6923 | 0.25 | 60 | 1.6678 |
69
+ | 1.6923 | 0.27 | 65 | 1.6579 |
70
+ | 1.6453 | 0.29 | 70 | 1.6491 |
71
+ | 1.6453 | 0.31 | 75 | 1.6416 |
72
+ | 1.5791 | 0.33 | 80 | 1.6347 |
73
+ | 1.5791 | 0.35 | 85 | 1.6279 |
74
+ | 1.6144 | 0.37 | 90 | 1.6219 |
75
+ | 1.6144 | 0.39 | 95 | 1.6168 |
76
+ | 1.6472 | 0.41 | 100 | 1.6124 |
77
+ | 1.6472 | 0.43 | 105 | 1.6086 |
78
+ | 1.5904 | 0.45 | 110 | 1.6056 |
79
+ | 1.5904 | 0.47 | 115 | 1.6033 |
80
+ | 1.5891 | 0.49 | 120 | 1.6016 |
81
+ | 1.5891 | 0.51 | 125 | 1.6004 |
82
+ | 1.5882 | 0.53 | 130 | 1.5996 |
83
+ | 1.5882 | 0.55 | 135 | 1.5991 |
84
+ | 1.6018 | 0.57 | 140 | 1.5988 |
85
+ | 1.6018 | 0.59 | 145 | 1.5987 |
86
+ | 1.581 | 0.61 | 150 | 1.5987 |
87
 
88
 
89
  ### Framework versions
90
 
91
+ - PEFT 0.9.0
92
+ - Transformers 4.38.2
93
  - Pytorch 2.2.1+cu121
94
+ - Datasets 2.18.0
95
  - Tokenizers 0.15.2
adapter_config.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "alpha_pattern": {},
3
  "auto_mapping": null,
4
- "base_model_name_or_path": null,
5
  "bias": "none",
6
  "fan_in_fan_out": false,
7
  "inference_mode": true,
@@ -9,13 +9,13 @@
9
  "layers_pattern": null,
10
  "layers_to_transform": null,
11
  "loftq_config": {},
12
- "lora_alpha": 32,
13
  "lora_dropout": 0.1,
14
  "megatron_config": null,
15
  "megatron_core": "megatron.core",
16
  "modules_to_save": null,
17
  "peft_type": "LORA",
18
- "r": 16,
19
  "rank_pattern": {},
20
  "revision": null,
21
  "target_modules": [
@@ -23,5 +23,6 @@
23
  "v_proj"
24
  ],
25
  "task_type": "CAUSAL_LM",
 
26
  "use_rslora": false
27
  }
 
1
  {
2
  "alpha_pattern": {},
3
  "auto_mapping": null,
4
+ "base_model_name_or_path": "mistralai/Mistral-7B-Instruct-v0.2",
5
  "bias": "none",
6
  "fan_in_fan_out": false,
7
  "inference_mode": true,
 
9
  "layers_pattern": null,
10
  "layers_to_transform": null,
11
  "loftq_config": {},
12
+ "lora_alpha": 16,
13
  "lora_dropout": 0.1,
14
  "megatron_config": null,
15
  "megatron_core": "megatron.core",
16
  "modules_to_save": null,
17
  "peft_type": "LORA",
18
+ "r": 8,
19
  "rank_pattern": {},
20
  "revision": null,
21
  "target_modules": [
 
23
  "v_proj"
24
  ],
25
  "task_type": "CAUSAL_LM",
26
+ "use_dora": false,
27
  "use_rslora": false
28
  }
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9aaf56be6b8c6923f099315605fc3895422b49938bfbe2bcd6ce140b44c0b324
3
- size 27284504
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cd820b60b88b9e51818cf5f5d266c456d82a971952320ae8bb106a5b9d2537e3
3
+ size 13648432
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1eb377bb9ce00d974f92d96f4cfe3a5910e03b51aaad67b06158b9e8060e4ca7
3
  size 4920
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:80ded19b3f3184c49ad15ef2837df275e08cac6ca7769bfaa777728dd7d25d8b
3
  size 4920