kalle07 commited on
Commit
3a41b7f
·
verified ·
1 Parent(s): 78b1760

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -87,14 +87,14 @@ The example is english, for german you can add apox 50% more token/word (1000 wo
87
  <br>
88
  Vector Size (Dimensions- you can not change)
89
 
90
- The vector size, or dimensionality, is the number of numbers in each embedding vector.
91
  Common embedding models produce vectors ranging from 384 dimensions (e.g., all-MiniLM-L6-v2) to 3072 dimensions (text-embedding-3-large).
92
  Higher dimensions capture more semantic details but require more storage and computational resources for database indexing and search.
93
  Some models allow you to shorten vectors (e.g., use only 256 out of 3072 dimensions) to save space while retaining high performance.<br>
94
  <br>
95
  Vector count refers to the total number of vectors stored, which usually corresponds to the number of content chunks indexed +the overlap chracters space.<br>
96
  <br>
97
- More vectors mean more granularity for search and retrieval but also increase database size and operational overhead sometimes 5times teh size and also need more time for response.
98
  Chunk Length<br>
99
  <br>
100
  Chunk length is the size (usually measured in words, tokens, or characters) of the text split for embedding (ALLM chunk length/ chunk size -> in characters).
 
87
  <br>
88
  Vector Size (Dimensions- you can not change)
89
 
90
+ The vector size, or dimensionality (embedding_length: xxx), is the number of numbers in each embedding vector.
91
  Common embedding models produce vectors ranging from 384 dimensions (e.g., all-MiniLM-L6-v2) to 3072 dimensions (text-embedding-3-large).
92
  Higher dimensions capture more semantic details but require more storage and computational resources for database indexing and search.
93
  Some models allow you to shorten vectors (e.g., use only 256 out of 3072 dimensions) to save space while retaining high performance.<br>
94
  <br>
95
  Vector count refers to the total number of vectors stored, which usually corresponds to the number of content chunks indexed +the overlap chracters space.<br>
96
  <br>
97
+ More vectors mean more granularity for search and retrieval but also increase database size and operational overhead sometimes 5times the size and also need more time for response.
98
  Chunk Length<br>
99
  <br>
100
  Chunk length is the size (usually measured in words, tokens, or characters) of the text split for embedding (ALLM chunk length/ chunk size -> in characters).