kalle07 commited on
Commit
aee8285
·
verified ·
1 Parent(s): 0fb7a38

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -4
README.md CHANGED
@@ -63,10 +63,14 @@ Your document will be embedd in x times 1024t chunks(snippets),<br>
63
  You can receive 14-snippets a 1024t (~14000t) from your document ~10000words and ~2000t left (from 16000t) for the answer ~1000words (2 pages)
64
  <br>
65
  You can play and set for your needs, eg 8-snippets a 2048t, or 28-snippets a 512t ... (every time you change the chunk-length the document must be embedd again)
66
- <ul style="line-height: 1;">
67
- <li>8000t (~6000words) ~0.8GB VRAM usage</li>
68
- <li>16000t (~12000words) ~1.5GB VRAM usage</li>
69
- <li>32000t (~24000words) ~3GB VRAM usage</li>
 
 
 
 
70
  </ul>
71
  <br>
72
  here is a tokenizer calculator<br>
 
63
  You can receive 14-snippets a 1024t (~14000t) from your document ~10000words and ~2000t left (from 16000t) for the answer ~1000words (2 pages)
64
  <br>
65
  You can play and set for your needs, eg 8-snippets a 2048t, or 28-snippets a 512t ... (every time you change the chunk-length the document must be embedd again)
66
+ <ul style="line-height: 1;"><br>
67
+ english vs german differ 50%<br>
68
+ ~5000 character is one page of a book (no matter ger/en) but words in german are longer, that means per word more token<br>
69
+ the example is english, for german you can add apox 50% more token (1000 words 1800t)<br>
70
+ <li>1200t (~1000 words ~5000 chracter) ~0.1GB, this is aprox one page with small font</li>
71
+ <li>8000t (~6000 words) ~0.8GB VRAM usage</li>
72
+ <li>16000t (~12000 words) ~1.5GB VRAM usage</li>
73
+ <li>32000t (~24000 words) ~3GB VRAM usage</li>
74
  </ul>
75
  <br>
76
  here is a tokenizer calculator<br>