DavidAU commited on
Commit
9a210b0
·
verified ·
1 Parent(s): 83fc0d3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -15
README.md CHANGED
@@ -21,7 +21,9 @@ This settings can also fix a number of model issues such as:
21
  - "Gibberish"
22
  - letter, word, phrase, paragraph repeats
23
  - coherence
 
24
  - creativeness or lack there of or .. too much - purple prose.
 
25
 
26
  Likewise setting can also improve model generation and/or general overall "smoothness" / "quality" of model operation.
27
 
@@ -87,13 +89,13 @@ Note that https://github.com/LostRuins/koboldcpp also allows access to all LLAMA
87
 
88
  Other programs like https://www.LMStudio.ai allows access to most of STANDARD samplers, where as others (llamacpp only here) you may need to add to the json file(s) for a model and/or template preset.
89
 
90
- In most cases all llama_cpp settings are available when using API / headless / server mode in "text-generation-webui", "koboldcpp" and "lmstudio" (as well as other apps too).
91
 
92
  You can also use llama_cpp directly too. (IE: llama-server.exe) ; see :
93
 
94
  https://github.com/ggerganov/llama.cpp
95
 
96
- (scroll down on the main page for more apps/programs to use GGUFs too)
97
 
98
  ---
99
 
@@ -112,7 +114,7 @@ for chat / role play and/or other use case(s). Generally speaking, this helps th
112
  Class 4 are balanced on the very edge of stability. These models are generally highly creative, for very narrow use case(s), and closer to "human prose" than other models. With these models, advanced samplers
113
  are used to "bring these bad boys" inline which is especially important for chat and/or role play type use cases AND/OR use case(s) these models were not designed for.
114
 
115
- The goal here is to use parameters to raise/lower the power of the model and samplers to "prune" (or in some cases enhance) operation.
116
 
117
  With that being said, generation "examples" (at my repo) are created using the "Primary Testing Parameters" (top of this document) settings regardless of the "class" of the model AND NO advanced settings, or samplers.
118
 
@@ -131,6 +133,8 @@ Generally it is recommended to run the highest quant(s) you can on your machine
131
 
132
  The smaller the size of model, the greater the contrast between the smallest quant and largest quant in terms of operation, quality, nuance and general overall function.
133
 
 
 
134
  Imatrix quants generally improve all quants, and also allow you to use smaller quants (less memory, more context space) and retain quality of operation.
135
 
136
  IE: Instead of using a q4KM, you might be able to run an IQ3_M and get close to Q4KM's quality, but at a higher token per second speed and have more VRAM for context.
@@ -142,8 +146,9 @@ PRIMARY PARAMETERS:
142
 
143
  These parameters will have SIGNIFICANT effect on prose, generation, length and content; with temp being the most powerful.
144
 
145
- Keep in mind the biggest parameter / random "unknown" is your prompt. A word change, rephrasing, punctation , even a comma, or semi-colon can drastically alter the
146
- output, even at min temp settings. CAPS also affect generation too.
 
147
 
148
  <B>temp / temperature</B>
149
 
@@ -183,7 +188,10 @@ Bring this up to 80-120 for a lot more word choice, and below 40 for simpler wor
183
 
184
  NOTES:
185
 
186
- For an interesting test, set "temp" to 0 ; this will give you the SAME generation for a given prompt each time. Then adjust a word, phrase, sentence etc - to see the differences.
 
 
 
187
  Keep in mind this will show model operation at its LEAST powerful/creative level and should NOT be used to determine if the model works for your use case(s).
188
 
189
  Then test "at temp" to see the model in action. (5-10 generations recommended)
@@ -199,7 +207,7 @@ Then test "at temp" to see the MODELS in action. (5-10 generations recommended)
199
  PENALITY SAMPLERS:
200
  ------------------------------------------------------------------------------
201
 
202
- These samplers "trim" or "prune" output.
203
 
204
  PRIMARY:
205
 
@@ -208,7 +216,11 @@ PRIMARY:
208
  last n tokens to consider for penalize (default: 64, 0 = disabled, -1 = ctx_size)
209
  ("repetition_penalty_range" in oobabooga/text-generation-webui , "rp_range" in kobold)
210
 
211
- THIS IS CRITICAL. Too high you can get all kinds of issues (repeat words, sentences, paragraphs or "gibberish"), especially with class 3 or 4 models.
 
 
 
 
212
 
213
  This setting also works in conjunction with all other "rep pens" below.
214
 
@@ -223,18 +235,18 @@ penalize repeat sequence of tokens (default: 1.0, 1.0 = disabled)
223
 
224
  Generally this is set from 1.0 to 1.15 ; smallest increments are best IE: 1.01... 1,.02 or even 1.001... 1.002.
225
 
226
- This affects creativity of the model over all , not just how words are penalized.
227
 
228
 
229
  <B>presence-penalty</B>
230
 
231
  repeat alpha presence penalty (default: 0.0, 0.0 = disabled)
232
 
233
- Generally leave this at zero IF repeat-last-n is 256 or less. You may want to use this for higher repeat-last-n settings.
234
 
235
- CLASS 3: 0.05 may assist generation BUT SET "repeat-last-n" to 512 or less. Better is 128 or 64.
236
 
237
- CLASS 4: 0.1 to 0.25 may assist generation BUT SET "repeat-last-n" to 64
238
 
239
 
240
  <B>frequency-penalty</B>
@@ -245,7 +257,7 @@ Generally leave this at zero IF repeat-last-n is 512 or less. You may want to us
245
 
246
  CLASS 3: 0.25 may assist generation BUT SET "repeat-last-n" to 512 or less. Better is 128 or 64.
247
 
248
- CLASS 4: 0.7 to 0.8 may assist generation BUT SET "repeat-last-n" to 64.
249
 
250
 
251
  <B>penalize-nl </B>
@@ -295,14 +307,17 @@ mirostat_tau: 5-8 is a good value.
295
  mirostat_eta: 0.1 is a good value.
296
 
297
 
298
- This is the big one ; activating this will help with creative generation. It can also help with stability.
 
299
 
300
  This is both a sampler (and pruner) and enhancement all in one.
301
 
 
 
302
 
303
  For Class 3 models it is suggested to use this to assist with generation (min settings).
304
 
305
- For Class 4 models it is highly recommended with Microstat 1 or 2 + mirostat-lr @ 6 to 8 and mirostat_eta at .1 to .5
306
 
307
 
308
  <B>dynatemp-range</B>
@@ -466,3 +481,4 @@ Smaller quants may require STRONGER settings (all classes of models) due to comp
466
  This is also influenced by the parameter size of the model in relation to the quant size.
467
 
468
  IE: a 8B model at Q2K will be far more unstable relative to a 20B model at Q2K, and as a result require stronger settings.
 
 
21
  - "Gibberish"
22
  - letter, word, phrase, paragraph repeats
23
  - coherence
24
+ - instruction following
25
  - creativeness or lack there of or .. too much - purple prose.
26
+ - low quant (ie q2k, iq1s, iq2s) issues.
27
 
28
  Likewise setting can also improve model generation and/or general overall "smoothness" / "quality" of model operation.
29
 
 
89
 
90
  Other programs like https://www.LMStudio.ai allows access to most of STANDARD samplers, where as others (llamacpp only here) you may need to add to the json file(s) for a model and/or template preset.
91
 
92
+ In most cases all llama_cpp settings are available when using API / headless / server mode in "text-generation-webui", "koboldcpp", "Olama" and "lmstudio" (as well as other apps too).
93
 
94
  You can also use llama_cpp directly too. (IE: llama-server.exe) ; see :
95
 
96
  https://github.com/ggerganov/llama.cpp
97
 
98
+ (scroll down on the main page for more apps/programs to use GGUFs too that connect to / use the LLAMA-CPP package.)
99
 
100
  ---
101
 
 
114
  Class 4 are balanced on the very edge of stability. These models are generally highly creative, for very narrow use case(s), and closer to "human prose" than other models. With these models, advanced samplers
115
  are used to "bring these bad boys" inline which is especially important for chat and/or role play type use cases AND/OR use case(s) these models were not designed for.
116
 
117
+ The goal here is to use parameters to raise/lower the power of the model and samplers to "prune" (and/or in some cases enhance) operation.
118
 
119
  With that being said, generation "examples" (at my repo) are created using the "Primary Testing Parameters" (top of this document) settings regardless of the "class" of the model AND NO advanced settings, or samplers.
120
 
 
133
 
134
  The smaller the size of model, the greater the contrast between the smallest quant and largest quant in terms of operation, quality, nuance and general overall function.
135
 
136
+ IMATRIX:
137
+
138
  Imatrix quants generally improve all quants, and also allow you to use smaller quants (less memory, more context space) and retain quality of operation.
139
 
140
  IE: Instead of using a q4KM, you might be able to run an IQ3_M and get close to Q4KM's quality, but at a higher token per second speed and have more VRAM for context.
 
146
 
147
  These parameters will have SIGNIFICANT effect on prose, generation, length and content; with temp being the most powerful.
148
 
149
+ Keep in mind the biggest parameter / random "unknown" is your prompt.
150
+
151
+ A word change, rephrasing, punctation , even a comma, or semi-colon can drastically alter the output, even at min temp settings. CAPS also affect generation too.
152
 
153
  <B>temp / temperature</B>
154
 
 
188
 
189
  NOTES:
190
 
191
+ For an interesting test, set "temp" to 0 ; this will give you the SAME generation for a given prompt each time.
192
+
193
+ Then adjust a word, phrase, sentence etc - to see the differences.
194
+
195
  Keep in mind this will show model operation at its LEAST powerful/creative level and should NOT be used to determine if the model works for your use case(s).
196
 
197
  Then test "at temp" to see the model in action. (5-10 generations recommended)
 
207
  PENALITY SAMPLERS:
208
  ------------------------------------------------------------------------------
209
 
210
+ These samplers "trim" or "prune" output in real time. The longer the generation, the stronger overall effect.
211
 
212
  PRIMARY:
213
 
 
216
  last n tokens to consider for penalize (default: 64, 0 = disabled, -1 = ctx_size)
217
  ("repetition_penalty_range" in oobabooga/text-generation-webui , "rp_range" in kobold)
218
 
219
+ THIS IS CRITICAL.
220
+
221
+ Too high you can get all kinds of issues (repeat words, sentences, paragraphs or "gibberish"), especially with class 3 or 4 models.
222
+
223
+ Likewise if you change this parameter it will drastically alter the output.
224
 
225
  This setting also works in conjunction with all other "rep pens" below.
226
 
 
235
 
236
  Generally this is set from 1.0 to 1.15 ; smallest increments are best IE: 1.01... 1,.02 or even 1.001... 1.002.
237
 
238
+ This affects creativity of the model over all, not just how words are penalized.
239
 
240
 
241
  <B>presence-penalty</B>
242
 
243
  repeat alpha presence penalty (default: 0.0, 0.0 = disabled)
244
 
245
+ Generally leave this at zero IF repeat-last-n is 512-1024 or less. You may want to use this for higher repeat-last-n settings.
246
 
247
+ CLASS 3: 0.05 to .2 may assist generation BUT SET "repeat-last-n" to 512 or less. Better is 128 or 64.
248
 
249
+ CLASS 4: 0.1 to 0.35 may assist generation BUT SET "repeat-last-n" to 64.
250
 
251
 
252
  <B>frequency-penalty</B>
 
257
 
258
  CLASS 3: 0.25 may assist generation BUT SET "repeat-last-n" to 512 or less. Better is 128 or 64.
259
 
260
+ CLASS 4: 0.4 to 0.8 may assist generation BUT SET "repeat-last-n" to 64.
261
 
262
 
263
  <B>penalize-nl </B>
 
307
  mirostat_eta: 0.1 is a good value.
308
 
309
 
310
+ This is the big one ; activating this will help with creative generation. It can also help with stability. Also note which
311
+ samplers are disable/ignored here, and that "mirostat_eta" is a learning rate.
312
 
313
  This is both a sampler (and pruner) and enhancement all in one.
314
 
315
+ It also has two modes of generation "1" and "2" - test both with 5-10 generations of the same prompt. Make adjustments, and repeat.
316
+
317
 
318
  For Class 3 models it is suggested to use this to assist with generation (min settings).
319
 
320
+ For Class 4 models it is highly recommended with Microstat 1 or 2 + mirostat_tau @ 6 to 8 and mirostat_eta at .1 to .5
321
 
322
 
323
  <B>dynatemp-range</B>
 
481
  This is also influenced by the parameter size of the model in relation to the quant size.
482
 
483
  IE: a 8B model at Q2K will be far more unstable relative to a 20B model at Q2K, and as a result require stronger settings.
484
+