Kquant03 commited on
Commit
175c28c
·
verified ·
1 Parent(s): 0168618

End of training

Browse files
Files changed (1) hide show
  1. README.md +31 -23
README.md CHANGED
@@ -1,10 +1,12 @@
1
  ---
2
  library_name: transformers
3
- license: llama3
4
- base_model: meta-llama/Meta-Llama-3-8B
5
  tags:
6
  - axolotl
7
  - generated_from_trainer
 
 
8
  model-index:
9
  - name: L3-Pneuma-8B
10
  results: []
@@ -16,9 +18,9 @@ should probably proofread and complete it, then remove this comment. -->
16
  [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
17
  <details><summary>See axolotl config</summary>
18
 
19
- axolotl version: `0.4.1`
20
  ```yaml
21
- base_model: meta-llama/Meta-Llama-3-8B
22
 
23
  load_in_8bit: false
24
  load_in_4bit: false
@@ -29,12 +31,11 @@ load_in_4bit: false
29
  strict: false
30
 
31
  datasets:
32
- - path: Kquant03/Sandevistan_Reformat
33
  type: customllama3_stan
34
  dataset_prepared_path: last_run_prepared
35
  val_set_size: 0.05
36
  output_dir: ./outputs/out
37
- max_steps: 80000
38
 
39
  fix_untrained_tokens: true
40
 
@@ -50,10 +51,10 @@ wandb_log_model:
50
 
51
  gradient_accumulation_steps: 16
52
  micro_batch_size: 8
53
- num_epochs: 1
54
  optimizer: paged_adamw_8bit
55
  lr_scheduler: cosine
56
- learning_rate: 0.00001
57
  max_grad_norm: 1
58
 
59
  train_on_inputs: false
@@ -94,54 +95,61 @@ special_tokens:
94
  eos_token: "<|end_of_text|>"
95
  pad_token: "<|end_of_text|>"
96
  tokens:
 
97
  ```
98
 
99
  </details><br>
100
 
101
  # L3-Pneuma-8B
102
 
103
- This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on the [Sandevistan](https://huggingface.co/datasets/Replete-AI/Sandevistan) dataset.
104
  It achieves the following results on the evaluation set:
105
- - Loss: 2.7381
106
 
107
  ## Model description
108
 
109
- This model is designed to challenge common paradigms in training Large Language Models, giving them a focus on user experience over profitability. These are highly experimental, and need preference training in order to increase their effectiveness.
110
 
111
  ## Intended uses & limitations
112
 
113
- Chatting, conversation, and assistance in small downstream tasks.
 
 
114
 
115
- Large Language Models work incredibly differently from humans, so while we are capable of training and rewarding them to act just like us in many ways, you should treat it as a simulation and use the Socratic method when engaging with them. You, as an end-user should always remain in control of your own thoughts and decisions, and use AI as a way to improve yourself rather than becoming dependent on it.
116
 
117
  ## Training procedure
118
 
119
  ### Training hyperparameters
120
 
121
  The following hyperparameters were used during training:
122
- - learning_rate: 1e-05
123
  - train_batch_size: 8
124
  - eval_batch_size: 8
125
  - seed: 42
126
  - gradient_accumulation_steps: 16
127
  - total_train_batch_size: 128
128
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
129
  - lr_scheduler_type: cosine
130
  - lr_scheduler_warmup_steps: 10
131
- - training_steps: 743
132
 
133
  ### Training results
134
 
135
  | Training Loss | Epoch | Step | Validation Loss |
136
  |:-------------:|:------:|:----:|:---------------:|
137
- | 1.0378 | 0.0013 | 1 | 3.0437 |
138
- | 0.6816 | 0.3334 | 248 | 2.7341 |
139
- | 0.6543 | 0.6667 | 496 | 2.7381 |
 
 
 
 
140
 
141
 
142
  ### Framework versions
143
 
144
- - Transformers 4.45.1
145
- - Pytorch 2.3.1+cu121
146
- - Datasets 2.21.0
147
- - Tokenizers 0.20.1
 
1
  ---
2
  library_name: transformers
3
+ license: llama3.1
4
+ base_model: meta-llama/Llama-3.1-8B-Instruct
5
  tags:
6
  - axolotl
7
  - generated_from_trainer
8
+ datasets:
9
+ - Sandevistan_cleaned.jsonl
10
  model-index:
11
  - name: L3-Pneuma-8B
12
  results: []
 
18
  [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
19
  <details><summary>See axolotl config</summary>
20
 
21
+ axolotl version: `0.8.0`
22
  ```yaml
23
+ base_model: meta-llama/Llama-3.1-8B-Instruct
24
 
25
  load_in_8bit: false
26
  load_in_4bit: false
 
31
  strict: false
32
 
33
  datasets:
34
+ - path: Sandevistan_cleaned.jsonl
35
  type: customllama3_stan
36
  dataset_prepared_path: last_run_prepared
37
  val_set_size: 0.05
38
  output_dir: ./outputs/out
 
39
 
40
  fix_untrained_tokens: true
41
 
 
51
 
52
  gradient_accumulation_steps: 16
53
  micro_batch_size: 8
54
+ num_epochs: 2
55
  optimizer: paged_adamw_8bit
56
  lr_scheduler: cosine
57
+ learning_rate: 0.000075
58
  max_grad_norm: 1
59
 
60
  train_on_inputs: false
 
95
  eos_token: "<|end_of_text|>"
96
  pad_token: "<|end_of_text|>"
97
  tokens:
98
+
99
  ```
100
 
101
  </details><br>
102
 
103
  # L3-Pneuma-8B
104
 
105
+ This model is a fine-tuned version of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) on the Sandevistan_cleaned.jsonl dataset.
106
  It achieves the following results on the evaluation set:
107
+ - Loss: 0.7796
108
 
109
  ## Model description
110
 
111
+ More information needed
112
 
113
  ## Intended uses & limitations
114
 
115
+ More information needed
116
+
117
+ ## Training and evaluation data
118
 
119
+ More information needed
120
 
121
  ## Training procedure
122
 
123
  ### Training hyperparameters
124
 
125
  The following hyperparameters were used during training:
126
+ - learning_rate: 7.5e-05
127
  - train_batch_size: 8
128
  - eval_batch_size: 8
129
  - seed: 42
130
  - gradient_accumulation_steps: 16
131
  - total_train_batch_size: 128
132
+ - optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
133
  - lr_scheduler_type: cosine
134
  - lr_scheduler_warmup_steps: 10
135
+ - num_epochs: 2.0
136
 
137
  ### Training results
138
 
139
  | Training Loss | Epoch | Step | Validation Loss |
140
  |:-------------:|:------:|:----:|:---------------:|
141
+ | 1.3399 | 0.0023 | 1 | 1.3175 |
142
+ | 0.846 | 0.3332 | 143 | 0.8312 |
143
+ | 0.8103 | 0.6665 | 286 | 0.8021 |
144
+ | 0.7617 | 0.9997 | 429 | 0.7737 |
145
+ | 0.5824 | 1.3309 | 572 | 0.7851 |
146
+ | 0.5651 | 1.6641 | 715 | 0.7798 |
147
+ | 0.5738 | 1.9974 | 858 | 0.7796 |
148
 
149
 
150
  ### Framework versions
151
 
152
+ - Transformers 4.51.3
153
+ - Pytorch 2.6.0+cu124
154
+ - Datasets 3.5.0
155
+ - Tokenizers 0.21.1