Update README.md
Browse files
README.md
CHANGED
|
@@ -54,15 +54,15 @@ With the same setting, these embedders found same 6-7 snippets out of 10 from a
|
|
| 54 |
...
|
| 55 |
|
| 56 |
# Short hints for using (Example for a large context with many expected hits):
|
| 57 |
-
Set your (Max Tokens)context-lenght 16000t main-model, set your embedder-model (Max Embedding Chunk Length) 1024t,set (Max Context Snippets) 14,
|
| 58 |
in ALLM set also (Text splitting & Chunking Preferences - Text Chunk Size) 1024 character parts and (Search Preference) "accuracy".
|
| 59 |
<br>
|
| 60 |
|
| 61 |
-> Ok what that mean!<br>
|
| 62 |
Your document will be embedd in x times 1024t chunks(snippets),<br>
|
| 63 |
-
You can receive 14-snippets a 1024t (~14000t) from your document ~10000words and ~2000t left (from 16000t) for the answer ~1000words (2 pages)
|
| 64 |
<br>
|
| 65 |
-
You can play and set for your needs, eg 8-snippets a 2048t, or 28-snippets a 512t ... (every time you change the chunk-length the document must be embedd again)
|
| 66 |
<ul style="line-height: 1.05;"><br>
|
| 67 |
english vs german differ 50%<br>
|
| 68 |
~5000 character is one page of a book (no matter ger/en) but words in german are longer, that means per word more token<br>
|
|
|
|
| 54 |
...
|
| 55 |
|
| 56 |
# Short hints for using (Example for a large context with many expected hits):
|
| 57 |
+
Set your (Max Tokens)context-lenght 16000t main-LLM-model, set your embedder-model (Max Embedding Chunk Length) 1024t,set (Max Context Snippets) 14,
|
| 58 |
in ALLM set also (Text splitting & Chunking Preferences - Text Chunk Size) 1024 character parts and (Search Preference) "accuracy".
|
| 59 |
<br>
|
| 60 |
|
| 61 |
-> Ok what that mean!<br>
|
| 62 |
Your document will be embedd in x times 1024t chunks(snippets),<br>
|
| 63 |
+
You can receive 14-snippets a 1024t (~14000t) from your document ~10000words(10pages) and ~2000t left (from 16000t) for the answer ~1000words (2 pages)
|
| 64 |
<br>
|
| 65 |
+
You can play and set for your needs, eg 8-snippets a 2048t, or 28-snippets a 512t ... (every time you change the chunk-length the document must be embedd again). With these settings everything fits best for an answer, if you need more for a conversation, you should set lower and/or disable the document.
|
| 66 |
<ul style="line-height: 1.05;"><br>
|
| 67 |
english vs german differ 50%<br>
|
| 68 |
~5000 character is one page of a book (no matter ger/en) but words in german are longer, that means per word more token<br>
|