Update README.md
Browse files
README.md
CHANGED
|
@@ -63,10 +63,14 @@ Your document will be embedd in x times 1024t chunks(snippets),<br>
|
|
| 63 |
You can receive 14-snippets a 1024t (~14000t) from your document ~10000words and ~2000t left (from 16000t) for the answer ~1000words (2 pages)
|
| 64 |
<br>
|
| 65 |
You can play and set for your needs, eg 8-snippets a 2048t, or 28-snippets a 512t ... (every time you change the chunk-length the document must be embedd again)
|
| 66 |
-
<ul style="line-height: 1;">
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 70 |
</ul>
|
| 71 |
<br>
|
| 72 |
here is a tokenizer calculator<br>
|
|
|
|
| 63 |
You can receive 14-snippets a 1024t (~14000t) from your document ~10000words and ~2000t left (from 16000t) for the answer ~1000words (2 pages)
|
| 64 |
<br>
|
| 65 |
You can play and set for your needs, eg 8-snippets a 2048t, or 28-snippets a 512t ... (every time you change the chunk-length the document must be embedd again)
|
| 66 |
+
<ul style="line-height: 1;"><br>
|
| 67 |
+
english vs german differ 50%<br>
|
| 68 |
+
~5000 character is one page of a book (no matter ger/en) but words in german are longer, that means per word more token<br>
|
| 69 |
+
the example is english, for german you can add apox 50% more token (1000 words 1800t)<br>
|
| 70 |
+
<li>1200t (~1000 words ~5000 chracter) ~0.1GB, this is aprox one page with small font</li>
|
| 71 |
+
<li>8000t (~6000 words) ~0.8GB VRAM usage</li>
|
| 72 |
+
<li>16000t (~12000 words) ~1.5GB VRAM usage</li>
|
| 73 |
+
<li>32000t (~24000 words) ~3GB VRAM usage</li>
|
| 74 |
</ul>
|
| 75 |
<br>
|
| 76 |
here is a tokenizer calculator<br>
|