Update README.md

Browse files

Files changed (1) hide show

README.md +7 -5

README.md CHANGED Viewed

@@ -45,17 +45,15 @@ to see all files<br>
 <b> the setup for local documents described below is allmost the same, GPT4All has only one model (nomic), and koboldcpp and JAN(Menlo) is not build in right now but in development</b><br>
 (sometimes the results are more truthful if the “chat with document only” option is used)<br>
-BTW the embedder-model is only a part of a good RAG (Retrieval-Augmented Generation)<br>
 <b>&#x21e8;</b> give me a ❤️, if you like  ;)<br>
 <br>
 <b>My short impression:</b>
 <ul style="line-height: 1.05;">
-<li>nomic-embed-text (up to 2048t context length)</li>
-<li>mxbai-embed-large</li>
 <li>mug-b-1.6</li>
 <li>snowflake-arctic-embed-l-v2.0 (up to 8192t context length)</li>
-<li>ger-RAG-bge-M3-merg-snowf-artic-hessian-AI (german, up to 8192t context length)</li>
-<li>german-roberta</li>
 <li>bge-m3 (up to 8192t context length)</li>
 </ul>
 Working well, all other its up to you! Some models are very similar! (jina and qwen based you can add manual to LM-Studio, set model "gear wheel" below "overide domain type")<br>
@@ -64,6 +62,10 @@ Further tests have shown that the following models are suitable for complex task
 <ul style="line-height: 1.05;">
 <li>GTE large</li>
 <li>cross-en-de-es-roberta</li>
 </ul>
 There are two embedder to find toxic content (toxic-prompt-roberta and minilmv2-toxic-jigsaw), dont know how good it works, and from ibm it give a whole LLM model (granite-guardian).
 <br>

 <b> the setup for local documents described below is allmost the same, GPT4All has only one model (nomic), and koboldcpp and JAN(Menlo) is not build in right now but in development</b><br>
 (sometimes the results are more truthful if the “chat with document only” option is used)<br>
+BTW the embedder-model is only a part of a good RAG (Retrieval-Augmented Generation), 512t are ~2000 characters most cases enough.<br>
 <b>&#x21e8;</b> give me a ❤️, if you like  ;)<br>
 <br>
 <b>My short impression:</b>
 <ul style="line-height: 1.05;">
+<li>nomic-embed-text-v2-moe (up to 512t context length)</li>
+<li>mxbai-embed-large (small and fast model)</li>
 <li>mug-b-1.6</li>
 <li>snowflake-arctic-embed-l-v2.0 (up to 8192t context length)</li>
 <li>bge-m3 (up to 8192t context length)</li>
 </ul>
 Working well, all other its up to you! Some models are very similar! (jina and qwen based you can add manual to LM-Studio, set model "gear wheel" below "overide domain type")<br>
 <ul style="line-height: 1.05;">
 <li>GTE large</li>
 <li>cross-en-de-es-roberta</li>
+<li>ger-RAG-bge-M3-merg-snowf-artic-hessian-AI (very good for german, up to 8192t context length)</li>
+<li>German-RAG-BGE-M3-TRIPLES-HESSIAN-AI (very good for german, up to 8192t context length)</li>
+<li>bge-m3 (good for german, up to 8192t context length)</li>
+<li>jina-embeddings-v3 (good for german, up to 8192t context length)</li>
 </ul>
 There are two embedder to find toxic content (toxic-prompt-roberta and minilmv2-toxic-jigsaw), dont know how good it works, and from ibm it give a whole LLM model (granite-guardian).
 <br>