Update README.md
Browse files
README.md
CHANGED
|
@@ -45,7 +45,7 @@ to see all files<br>
|
|
| 45 |
<b> the setup for local documents described below is allmost the same, GPT4All has only one model (nomic), and koboldcpp and JAN(Menlo) is not build in right now but in development</b><br>
|
| 46 |
|
| 47 |
(sometimes the results are more truthful if the “chat with document only” option is used)<br>
|
| 48 |
-
BTW the embedder-model is only a part of a good RAG (Retrieval-Augmented Generation)
|
| 49 |
<b>⇨</b> give me a ❤️, if you like ;)<br>
|
| 50 |
<br>
|
| 51 |
<b>My short impression:</b>
|
|
@@ -80,10 +80,10 @@ Hint in ALLM, set all in LM studio start both models and both are on top in ALLM
|
|
| 80 |
|
| 81 |
-> Ok what that mean!<br>
|
| 82 |
Your document will be embedd in x times 1500 character/chunks(snippets),<br>
|
| 83 |
-
You can receive 10-snippets a 1500chars=15000chars (~4000token) from your document ~2500words(2.5pages)
|
| 84 |
If you change your settings so that you receive more than 8000t in the response, you should use a good LLM model to obtain a detailed and meaningful response from such a large amount of content. It doesn't matter whether the model can process 100'000 tokens; that doesn't mean that this model can process 100'000 meaningfully.
|
| 85 |
<br>
|
| 86 |
-
You can play and set for your needs, eg
|
| 87 |
<ul style="line-height: 1.05;">
|
| 88 |
english vs german differ 50% in calculate tokens/word<br>
|
| 89 |
but ~5000 characters is one page of a book (no matter ger/en). But if you calculate with words ... words in german are longer, that means per word more token.<br>
|
|
@@ -122,15 +122,17 @@ Example: "https://huggingface.co/unsloth/granite-3.3-8b-instruct-GGUF/blob/main/
|
|
| 122 |
|
| 123 |
# How embedding and search works:
|
| 124 |
|
| 125 |
-
|
|
|
|
|
|
|
| 126 |
Now it searches for keywords or similar semantic terms in the document. if it has found them, lets say word and meaning around “XYZ and ZYX” ,
|
| 127 |
-
now a piece of text
|
| 128 |
This text snippet is then used for your answer. <br>
|
| 129 |
<ul style="line-height: 1.05;">
|
| 130 |
<li>If, for example, the word/meaning “XYZ” occurs 50 times in one txt, not all 50 are used for answer, only the number of snippets with the best ranking are used</li>
|
| 131 |
-
<li>If only one snippet corresponds to your question all other snippets can negatively influence your answer because they do not fit the topic (usually 4 to
|
| 132 |
-
<li>If you expect multible search results in your docs try
|
| 133 |
-
<li>If you use chunk-length ~
|
| 134 |
<li>A question for "summary of the document" is most time not useful, if the document has an introduction or summaries its searching there if you have luck.</li>
|
| 135 |
<li>If a book has a table of contents or a bibliography, I would delete these pages as they often contain relevant search terms but do not help answer your question.</li>
|
| 136 |
<li>If the documents small like 10-20 Pages, its better you copy the whole text inside the CHAT, some options called "pin".</li>
|
|
@@ -151,11 +153,12 @@ llama3.1, llama3.2, qwen2.5, deepseek-r1-distill, gemma-3, granite, SauerkrautLM
|
|
| 151 |
granit3.2-8b (2b version also) - https://huggingface.co/ibm-research/granite-3.2-8b-instruct-GGUF<br>
|
| 152 |
Chocolatine-2-14B (other versions also) - https://huggingface.co/mradermacher/Chocolatine-2-14B-Instruct-DPO-v2.0b11-GGUF<br>
|
| 153 |
QwQ-LCoT- (7/14B) - https://huggingface.co/mradermacher/QwQ-LCoT-14B-Conversational-GGUF<br>
|
| 154 |
-
gemma-3 (4/12/27B) - https://huggingface.co/bartowski/google_gemma-3-12b-it-GGUF<br>
|
|
|
|
| 155 |
|
| 156 |
...
|
| 157 |
# Important -> The Systemprompt (some examples):
|
| 158 |
-
<li> The system prompt is weighted with a certain amount of influence around your question. You can easily test it once without or with a nonsensical system prompt.</li>
|
| 159 |
|
| 160 |
"You are a helpful assistant who provides an overview of ... under the aspects of ... .
|
| 161 |
You use attached excerpts from the collection to generate your answers!
|
|
|
|
| 45 |
<b> the setup for local documents described below is allmost the same, GPT4All has only one model (nomic), and koboldcpp and JAN(Menlo) is not build in right now but in development</b><br>
|
| 46 |
|
| 47 |
(sometimes the results are more truthful if the “chat with document only” option is used)<br>
|
| 48 |
+
BTW the embedder-model is only a part of a good RAG (Retrieval-Augmented Generation)<br>
|
| 49 |
<b>⇨</b> give me a ❤️, if you like ;)<br>
|
| 50 |
<br>
|
| 51 |
<b>My short impression:</b>
|
|
|
|
| 80 |
|
| 81 |
-> Ok what that mean!<br>
|
| 82 |
Your document will be embedd in x times 1500 character/chunks(snippets),<br>
|
| 83 |
+
You can receive 10-snippets a 1500chars=15000chars (~4000token) from your document all in all -> ~2500words(2.5pages). now ~4000t left (from 8000t) for the answer ~2500words (2.5 pages). one question - one answer, with every setup i suggest only query once. <br>
|
| 84 |
If you change your settings so that you receive more than 8000t in the response, you should use a good LLM model to obtain a detailed and meaningful response from such a large amount of content. It doesn't matter whether the model can process 100'000 tokens; that doesn't mean that this model can process 100'000 meaningfully.
|
| 85 |
<br>
|
| 86 |
+
You can play and set for your needs, eg 5-snippets a 5000chars, or 20-snippets a 500chars ... (every time you change the chunk-length the document must be embedd again). With these settings everything fits best for ONE answer.
|
| 87 |
<ul style="line-height: 1.05;">
|
| 88 |
english vs german differ 50% in calculate tokens/word<br>
|
| 89 |
but ~5000 characters is one page of a book (no matter ger/en). But if you calculate with words ... words in german are longer, that means per word more token.<br>
|
|
|
|
| 122 |
|
| 123 |
# How embedding and search works:
|
| 124 |
|
| 125 |
+
Embeddding is not a database, you can not ask how many times the word "xyz" appears. With embedding, you cannot ask for a summary of an entire book.<br>
|
| 126 |
+
Example:<br>
|
| 127 |
+
You have a txt/pdf file maybe 90000words(~300pages, ~1'000'000chars) a book. You ask the model lets say "what is described in chapter called XYZ in relation to person ZYX".
|
| 128 |
Now it searches for keywords or similar semantic terms in the document. if it has found them, lets say word and meaning around “XYZ and ZYX” ,
|
| 129 |
+
now a piece of text 2000chars around this word “XYZ/ZYX” is cut out at this point. (In reality, it's all done with coded numbers per chunck and thats why you dont can search for single numbers or words, but dosnt matter - the principle)<br>
|
| 130 |
This text snippet is then used for your answer. <br>
|
| 131 |
<ul style="line-height: 1.05;">
|
| 132 |
<li>If, for example, the word/meaning “XYZ” occurs 50 times in one txt, not all 50 are used for answer, only the number of snippets with the best ranking are used</li>
|
| 133 |
+
<li>If only one snippet corresponds to your question all other snippets can negatively influence your answer because they do not fit the topic (usually 4 to 20 snippet are fine)</li>
|
| 134 |
+
<li>If you expect multible search results in your docs try 15-snippets or more, if you expect only 2 than dont use more!</li>
|
| 135 |
+
<li>If you use chunk-length ~2000(chars) you receive more content, if you use ~500chars you receive more facts BUT lower chunk-length are more chunks and need much longer time.</li>
|
| 136 |
<li>A question for "summary of the document" is most time not useful, if the document has an introduction or summaries its searching there if you have luck.</li>
|
| 137 |
<li>If a book has a table of contents or a bibliography, I would delete these pages as they often contain relevant search terms but do not help answer your question.</li>
|
| 138 |
<li>If the documents small like 10-20 Pages, its better you copy the whole text inside the CHAT, some options called "pin".</li>
|
|
|
|
| 153 |
granit3.2-8b (2b version also) - https://huggingface.co/ibm-research/granite-3.2-8b-instruct-GGUF<br>
|
| 154 |
Chocolatine-2-14B (other versions also) - https://huggingface.co/mradermacher/Chocolatine-2-14B-Instruct-DPO-v2.0b11-GGUF<br>
|
| 155 |
QwQ-LCoT- (7/14B) - https://huggingface.co/mradermacher/QwQ-LCoT-14B-Conversational-GGUF<br>
|
| 156 |
+
gemma-3 (4/12/27B) - https://huggingface.co/bartowski/google_gemma-3-12b-it-GGUF<br>
|
| 157 |
+
qwen3 (4/8B) - https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct-GGUF <br><br>
|
| 158 |
|
| 159 |
...
|
| 160 |
# Important -> The Systemprompt (some examples):
|
| 161 |
+
<li> The system prompt is weighted with a certain amount of influence around your question, its not change your embedding/receiving result. You can easily test it once without or with a nonsensical system prompt.</li>
|
| 162 |
|
| 163 |
"You are a helpful assistant who provides an overview of ... under the aspects of ... .
|
| 164 |
You use attached excerpts from the collection to generate your answers!
|