Update README.md
Browse files
README.md
CHANGED
|
@@ -87,14 +87,14 @@ The example is english, for german you can add apox 50% more token/word (1000 wo
|
|
| 87 |
<br>
|
| 88 |
Vector Size (Dimensions- you can not change)
|
| 89 |
|
| 90 |
-
The vector size, or dimensionality, is the number of numbers in each embedding vector.
|
| 91 |
Common embedding models produce vectors ranging from 384 dimensions (e.g., all-MiniLM-L6-v2) to 3072 dimensions (text-embedding-3-large).
|
| 92 |
Higher dimensions capture more semantic details but require more storage and computational resources for database indexing and search.
|
| 93 |
Some models allow you to shorten vectors (e.g., use only 256 out of 3072 dimensions) to save space while retaining high performance.<br>
|
| 94 |
<br>
|
| 95 |
Vector count refers to the total number of vectors stored, which usually corresponds to the number of content chunks indexed +the overlap chracters space.<br>
|
| 96 |
<br>
|
| 97 |
-
More vectors mean more granularity for search and retrieval but also increase database size and operational overhead sometimes 5times
|
| 98 |
Chunk Length<br>
|
| 99 |
<br>
|
| 100 |
Chunk length is the size (usually measured in words, tokens, or characters) of the text split for embedding (ALLM chunk length/ chunk size -> in characters).
|
|
|
|
| 87 |
<br>
|
| 88 |
Vector Size (Dimensions- you can not change)
|
| 89 |
|
| 90 |
+
The vector size, or dimensionality (embedding_length: xxx), is the number of numbers in each embedding vector.
|
| 91 |
Common embedding models produce vectors ranging from 384 dimensions (e.g., all-MiniLM-L6-v2) to 3072 dimensions (text-embedding-3-large).
|
| 92 |
Higher dimensions capture more semantic details but require more storage and computational resources for database indexing and search.
|
| 93 |
Some models allow you to shorten vectors (e.g., use only 256 out of 3072 dimensions) to save space while retaining high performance.<br>
|
| 94 |
<br>
|
| 95 |
Vector count refers to the total number of vectors stored, which usually corresponds to the number of content chunks indexed +the overlap chracters space.<br>
|
| 96 |
<br>
|
| 97 |
+
More vectors mean more granularity for search and retrieval but also increase database size and operational overhead sometimes 5times the size and also need more time for response.
|
| 98 |
Chunk Length<br>
|
| 99 |
<br>
|
| 100 |
Chunk length is the size (usually measured in words, tokens, or characters) of the text split for embedding (ALLM chunk length/ chunk size -> in characters).
|