turtle170 commited on
Commit
15bdfbc
·
verified ·
1 Parent(s): 8bd2e83

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -119
README.md CHANGED
@@ -17,36 +17,55 @@ language:
17
  # Model Card for Model ID
18
 
19
  Phi-3-Mini-OpenHermes-Magpie-V1 is a general purpose model trained on both the teknium/OpenHermes-2.5 dataset and the Magpie-Align/Phi3-Pro-300K-Filtered dataset
20
- and designed to provide speed, efficiency, and intelligence.
 
21
 
22
 
23
 
24
  ## Model Details
25
  OpenHermes dataset:
 
26
  1 Epoch
 
27
  8 Batch Size
 
28
  1 Gradient Accumulation
 
29
  5e-5 LR
 
30
  16 LoRa r
 
31
  32 LoRa Alpha
 
32
  300 Warmup steps
 
33
  500 Eval steps
 
34
  Trained only on Attention layers.
35
 
36
  Magpie dataset:
 
37
  1 Epoch
 
38
  16 Batch Size
 
39
  1 Gradient Accumulation
 
40
  1e-4 LR
 
41
  16 LoRa r
 
42
  32 LoRa Alpha
 
43
  150 Warmup steps
 
44
  500 Eval steps
45
- Trained with Gate, Up, and Down layers.
 
46
 
47
  ### Model Description
48
 
49
- This model excels at creating bullet point formatting, while still mantaining
50
 
51
 
52
 
@@ -55,178 +74,94 @@ This model excels at creating bullet point formatting, while still mantaining
55
  - **License:** apache-2.0
56
  - **Finetuned from model :** Phi-3-Mini-4k-Instruct with turtle170/Phi-3-Mini-OpenHermes-V1 adapters
57
 
58
- ### Model Sources [optional]
59
-
60
- <!-- Provide the basic
61
 
62
- - **Repository:** [More Information Needed]
63
- - **Paper [optional]:** [More Information Needed]
64
- - **Demo [optional]:** [More Information Needed]
65
-
66
- ## Uses
67
-
68
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
69
 
70
  ### Direct Use
71
 
72
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
73
-
74
- [More Information Needed]
75
-
76
- ### Downstream Use [optional]
77
 
78
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
79
-
80
- [More Information Needed]
81
 
82
  ### Out-of-Scope Use
83
 
84
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
85
-
86
- [More Information Needed]
87
 
88
  ## Bias, Risks, and Limitations
89
 
90
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
91
-
92
- [More Information Needed]
93
 
94
  ### Recommendations
95
 
96
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
97
 
98
  Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
99
 
100
- ## How to Get Started with the Model
101
-
102
- Use the code below to get started with the model.
103
 
104
- [More Information Needed]
105
-
106
- ## Training Details
107
 
108
  ### Training Data
 
109
 
110
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
111
-
112
- [More Information Needed]
113
 
114
  ### Training Procedure
115
 
116
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
117
 
118
- #### Preprocessing [optional]
119
-
120
- [More Information Needed]
121
-
122
 
123
  #### Training Hyperparameters
124
 
125
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
126
-
127
- #### Speeds, Sizes, Times [optional]
128
 
129
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
 
130
 
131
- [More Information Needed]
132
 
133
  ## Evaluation
134
 
135
- <!-- This section describes the evaluation protocols and provides the results. -->
136
 
137
- ### Testing Data, Factors & Metrics
138
 
139
- #### Testing Data
 
140
 
141
- <!-- This should link to a Dataset Card if possible. -->
142
 
143
- [More Information Needed]
144
 
145
- #### Factors
146
 
147
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
148
 
149
- [More Information Needed]
150
 
151
- #### Metrics
152
 
153
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
154
 
155
- [More Information Needed]
156
 
157
  ### Results
158
 
159
- [More Information Needed]
160
-
161
- #### Summary
162
-
163
-
164
-
165
- ## Model Examination [optional]
166
 
167
- <!-- Relevant interpretability work for the model goes here -->
168
-
169
- [More Information Needed]
170
 
171
  ## Environmental Impact
172
 
173
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
174
-
175
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
176
 
177
- - **Hardware Type:** [More Information Needed]
178
- - **Hours used:** [More Information Needed]
179
- - **Cloud Provider:** [More Information Needed]
180
- - **Compute Region:** [More Information Needed]
181
- - **Carbon Emitted:** [More Information Needed]
182
 
183
- ## Technical Specifications [optional]
184
 
185
  ### Model Architecture and Objective
 
186
 
187
- [More Information Needed]
188
-
189
- ### Compute Infrastructure
190
-
191
- [More Information Needed]
192
-
193
- #### Hardware
194
-
195
- [More Information Needed]
196
-
197
- #### Software
198
-
199
- [More Information Needed]
200
-
201
- ## Citation [optional]
202
-
203
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
204
-
205
- **BibTeX:**
206
-
207
- [More Information Needed]
208
-
209
- **APA:**
210
-
211
- [More Information Needed]
212
-
213
- ## Glossary [optional]
214
-
215
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
216
-
217
- [More Information Needed]
218
-
219
- ## More Information [optional]
220
-
221
- [More Information Needed]
222
-
223
- ## Model Card Authors [optional]
224
-
225
- [More Information Needed]
226
-
227
- ## Model Card Contact
228
-
229
- [More Information Needed]
230
  ### Framework versions
231
 
232
  - PEFT 0.17.1
 
17
  # Model Card for Model ID
18
 
19
  Phi-3-Mini-OpenHermes-Magpie-V1 is a general purpose model trained on both the teknium/OpenHermes-2.5 dataset and the Magpie-Align/Phi3-Pro-300K-Filtered dataset
20
+ and designed to provide speed, efficiency, and intelligence while still being relatively small.
21
+
22
 
23
 
24
 
25
  ## Model Details
26
  OpenHermes dataset:
27
+
28
  1 Epoch
29
+
30
  8 Batch Size
31
+
32
  1 Gradient Accumulation
33
+
34
  5e-5 LR
35
+
36
  16 LoRa r
37
+
38
  32 LoRa Alpha
39
+
40
  300 Warmup steps
41
+
42
  500 Eval steps
43
+
44
  Trained only on Attention layers.
45
 
46
  Magpie dataset:
47
+
48
  1 Epoch
49
+
50
  16 Batch Size
51
+
52
  1 Gradient Accumulation
53
+
54
  1e-4 LR
55
+
56
  16 LoRa r
57
+
58
  32 LoRa Alpha
59
+
60
  150 Warmup steps
61
+
62
  500 Eval steps
63
+
64
+ Trained with Gate,Up, and Down layers.
65
 
66
  ### Model Description
67
 
68
+ This model excels at creating bullet point formatting.
69
 
70
 
71
 
 
74
  - **License:** apache-2.0
75
  - **Finetuned from model :** Phi-3-Mini-4k-Instruct with turtle170/Phi-3-Mini-OpenHermes-V1 adapters
76
 
 
 
 
77
 
 
 
 
 
 
 
 
78
 
79
  ### Direct Use
80
 
81
+ For direct use, the easiest method is to just download the .gguf file from and loading it into llama.cpp or Ollama.
 
 
 
 
82
 
 
 
 
83
 
84
  ### Out-of-Scope Use
85
 
86
+ Users of this model need only adhere to the **Microsoft Phi-3** Terms of use,
87
+ and you are solely responsible for any misuse of this model, as according to Section 7 and 8 of
88
+ the apache-2.0 licence
89
 
90
  ## Bias, Risks, and Limitations
91
 
92
+ As this model was trained on a small base model, and only exposed to 2 50k example datasets,
93
+ so you should not expect much from it.
94
+ However, this model is smart for its size.
95
 
96
  ### Recommendations
97
 
98
+ This model
99
 
100
  Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
101
 
 
 
 
102
 
103
+ ## Training Details
104
+ Stated above.
 
105
 
106
  ### Training Data
107
+ teknium/OpenHermes-2.5 dataset and the Magpie-Align/Phi3-Pro-300K-Filtered dataset.
108
 
 
 
 
109
 
110
  ### Training Procedure
111
 
112
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
113
 
 
 
 
 
114
 
115
  #### Training Hyperparameters
116
 
117
+ - **Training regime:** The OpenHermes run was on fp16 mixed precision, while the Magpie run was on fp32 mixed precison.
 
 
118
 
119
+ #### Speeds, Sizes, Times
120
+ The Magpie adapter is about 100-200 Mb.
121
 
 
122
 
123
  ## Evaluation
124
 
125
+ The evaluation strategy was epochs, and the results were 0.4203 loss.
126
 
 
127
 
128
+ #### Metrics
129
+ 1 Epoch --> fast while prevents overfitting.
130
 
131
+ 16 Batch Size --> Helps squeeze every bit of intelligence.
132
 
133
+ 1 Gradient Accumulation --> fast, while not crashing the model.
134
 
135
+ 1e-4 LR --> helps prevent breaking the intelligence stored on the Hermes run.
136
 
137
+ 16 LoRa r --> helps the model understand the harder examples in the Magpie run.
138
 
139
+ 32 LoRa Alpha --> self-explanatory. Alpha = LoRa r x 2
140
 
141
+ 150 Warmup steps -->fast, and since the starting loss was already 0.4
142
 
143
+ 1500 Eval steps --> the loss had fluctuated between 0.4 and 0.6, and eval wastes time, so i chose it to only be 2 per run.
144
 
 
145
 
146
  ### Results
147
 
148
+ eval loss: 0.4
149
+ Avg. train loss: 0.4
 
 
 
 
 
150
 
 
 
 
151
 
152
  ## Environmental Impact
153
 
 
 
 
154
 
155
+ - **Hardware Type:** 2x NVIDIA Tesla T4s
156
+ - **Hours used:** 12
157
+ - **Cloud Provider:** Kaggle
158
+ - **Compute Region:** asia-east1
159
+ - **Carbon Emitted:** 0.47 kg
160
 
 
161
 
162
  ### Model Architecture and Objective
163
+ To provide a smart model while keeping the size small.
164
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
165
  ### Framework versions
166
 
167
  - PEFT 0.17.1