kalle07 commited on
Commit
87787b1
·
verified ·
1 Parent(s): bf821af

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -54,15 +54,15 @@ With the same setting, these embedders found same 6-7 snippets out of 10 from a
54
  ...
55
 
56
  # Short hints for using (Example for a large context with many expected hits):
57
- Set your (Max Tokens)context-lenght 16000t main-model, set your embedder-model (Max Embedding Chunk Length) 1024t,set (Max Context Snippets) 14,
58
  in ALLM set also (Text splitting & Chunking Preferences - Text Chunk Size) 1024 character parts and (Search Preference) "accuracy".
59
  <br>
60
 
61
  -> Ok what that mean!<br>
62
  Your document will be embedd in x times 1024t chunks(snippets),<br>
63
- You can receive 14-snippets a 1024t (~14000t) from your document ~10000words and ~2000t left (from 16000t) for the answer ~1000words (2 pages)
64
  <br>
65
- You can play and set for your needs, eg 8-snippets a 2048t, or 28-snippets a 512t ... (every time you change the chunk-length the document must be embedd again)
66
  <ul style="line-height: 1.05;"><br>
67
  english vs german differ 50%<br>
68
  ~5000 character is one page of a book (no matter ger/en) but words in german are longer, that means per word more token<br>
 
54
  ...
55
 
56
  # Short hints for using (Example for a large context with many expected hits):
57
+ Set your (Max Tokens)context-lenght 16000t main-LLM-model, set your embedder-model (Max Embedding Chunk Length) 1024t,set (Max Context Snippets) 14,
58
  in ALLM set also (Text splitting & Chunking Preferences - Text Chunk Size) 1024 character parts and (Search Preference) "accuracy".
59
  <br>
60
 
61
  -> Ok what that mean!<br>
62
  Your document will be embedd in x times 1024t chunks(snippets),<br>
63
+ You can receive 14-snippets a 1024t (~14000t) from your document ~10000words(10pages) and ~2000t left (from 16000t) for the answer ~1000words (2 pages)
64
  <br>
65
+ You can play and set for your needs, eg 8-snippets a 2048t, or 28-snippets a 512t ... (every time you change the chunk-length the document must be embedd again). With these settings everything fits best for an answer, if you need more for a conversation, you should set lower and/or disable the document.
66
  <ul style="line-height: 1.05;"><br>
67
  english vs german differ 50%<br>
68
  ~5000 character is one page of a book (no matter ger/en) but words in german are longer, that means per word more token<br>