Dev-SriramB commited on
Commit
acc19fe
·
verified ·
1 Parent(s): 66fff01

Dev-SriramB/qa_bot2

Browse files
README.md CHANGED
@@ -16,7 +16,7 @@ should probably proofread and complete it, then remove this comment. -->
16
 
17
  This model is a fine-tuned version of [TheBloke/Mistral-7B-Instruct-v0.2-GPTQ](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GPTQ) on the None dataset.
18
  It achieves the following results on the evaluation set:
19
- - Loss: 1.4925
20
 
21
  ## Model description
22
 
@@ -35,13 +35,13 @@ More information needed
35
  ### Training hyperparameters
36
 
37
  The following hyperparameters were used during training:
38
- - learning_rate: 0.0002
39
- - train_batch_size: 1
40
- - eval_batch_size: 1
41
  - seed: 42
42
  - gradient_accumulation_steps: 4
43
- - total_train_batch_size: 4
44
- - optimizer: Use paged_adamw_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
45
  - lr_scheduler_type: linear
46
  - lr_scheduler_warmup_steps: 2
47
  - num_epochs: 2
@@ -51,14 +51,14 @@ The following hyperparameters were used during training:
51
 
52
  | Training Loss | Epoch | Step | Validation Loss |
53
  |:-------------:|:-----:|:----:|:---------------:|
54
- | 1.8787 | 1.0 | 75 | 1.5135 |
55
- | 1.489 | 2.0 | 150 | 1.4925 |
56
 
57
 
58
  ### Framework versions
59
 
60
- - PEFT 0.13.2
61
- - Transformers 4.46.2
62
- - Pytorch 2.5.1+cu121
63
- - Datasets 3.1.0
64
- - Tokenizers 0.20.3
 
16
 
17
  This model is a fine-tuned version of [TheBloke/Mistral-7B-Instruct-v0.2-GPTQ](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GPTQ) on the None dataset.
18
  It achieves the following results on the evaluation set:
19
+ - Loss: 2.0784
20
 
21
  ## Model description
22
 
 
35
  ### Training hyperparameters
36
 
37
  The following hyperparameters were used during training:
38
+ - learning_rate: 2e-05
39
+ - train_batch_size: 3
40
+ - eval_batch_size: 3
41
  - seed: 42
42
  - gradient_accumulation_steps: 4
43
+ - total_train_batch_size: 12
44
+ - optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
45
  - lr_scheduler_type: linear
46
  - lr_scheduler_warmup_steps: 2
47
  - num_epochs: 2
 
51
 
52
  | Training Loss | Epoch | Step | Validation Loss |
53
  |:-------------:|:-----:|:----:|:---------------:|
54
+ | 10.1116 | 1.0 | 25 | 2.2149 |
55
+ | 8.6814 | 2.0 | 50 | 2.0784 |
56
 
57
 
58
  ### Framework versions
59
 
60
+ - PEFT 0.14.0
61
+ - Transformers 4.47.1
62
+ - Pytorch 2.5.1+cu124
63
+ - Datasets 3.2.0
64
+ - Tokenizers 0.21.0
adapter_config.json CHANGED
@@ -1,8 +1,10 @@
1
  {
2
  "alpha_pattern": {},
3
  "auto_mapping": null,
4
- "base_model_name_or_path": null,
5
  "bias": "none",
 
 
6
  "fan_in_fan_out": false,
7
  "inference_mode": true,
8
  "init_lora_weights": true,
@@ -11,15 +13,17 @@
11
  "layers_to_transform": null,
12
  "loftq_config": {},
13
  "lora_alpha": 32,
 
14
  "lora_dropout": 0.05,
15
  "megatron_config": null,
16
  "megatron_core": "megatron.core",
17
  "modules_to_save": null,
18
  "peft_type": "LORA",
19
- "r": 8,
20
  "rank_pattern": {},
21
  "revision": null,
22
  "target_modules": [
 
23
  "q_proj"
24
  ],
25
  "task_type": "CAUSAL_LM",
 
1
  {
2
  "alpha_pattern": {},
3
  "auto_mapping": null,
4
+ "base_model_name_or_path": "TheBloke/Mistral-7B-Instruct-v0.2-GPTQ",
5
  "bias": "none",
6
+ "eva_config": null,
7
+ "exclude_modules": null,
8
  "fan_in_fan_out": false,
9
  "inference_mode": true,
10
  "init_lora_weights": true,
 
13
  "layers_to_transform": null,
14
  "loftq_config": {},
15
  "lora_alpha": 32,
16
+ "lora_bias": false,
17
  "lora_dropout": 0.05,
18
  "megatron_config": null,
19
  "megatron_core": "megatron.core",
20
  "modules_to_save": null,
21
  "peft_type": "LORA",
22
+ "r": 16,
23
  "rank_pattern": {},
24
  "revision": null,
25
  "target_modules": [
26
+ "v_proj",
27
  "q_proj"
28
  ],
29
  "task_type": "CAUSAL_LM",
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:983641d24bf39235041c15c2003b9f1a2ff0ecb741f371d4cde3086a1b1fb0c8
3
- size 8398144
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:23ce61ead5d7b31bf2178da17ab221af8d36af442e1ae5a91364249e137dd0b2
3
+ size 27280152
runs/Jan31_16-11-51_d64bec8c4c1b/events.out.tfevents.1738339915.d64bec8c4c1b.449.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:be26f2d25fdf4b45820ebf4edbcb6235c3423c24e790c388b3bacb6535416726
3
+ size 7019
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:024f5ab57e6d0309ea3dd026e2712ac7228b110d23ad844814b1121588355385
3
  size 5304
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:13789db8eeb74358645966e07c8b906286149896a893d3eb67472ade542abdc5
3
  size 5304