aari1995
/

German_Semantic_V3b

Sentence Similarity

sentence-transformers

feature-extraction

loss:MatryoshkaLoss

text-embeddings-inference

Model card Files Files and versions

aari1995 commited on Jun 20, 2024

Commit

ed87d6c

·

verified ·

1 Parent(s): 8a06dec

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -284,16 +284,16 @@ license: apache-2.0
 # German Semantic V3
-The successor of German_Semantic_STS_V2 is here!
 ## Major updates and USPs:
 - **Sequence length:** 8192, (16 times more than V2 and other models) -> thanks to the ALiBi implementation of Jina-Team!
 - **Matryoshka Embeddings:** The model is trained for embedding sizes from 1024 down to 64, allowing you to store much smaller embeddings with little quality loss.
-- **License:** Apache 2.0
 - **German only:** This model is German-only, causing the model to learn more efficient thanks to its tokenizer, deal better with shorter queries and generally be more nuanced.
 - **Updated knowledge and quality data:** The backbone of this model is gbert-large by deepset. With Stage-2 pretraining on German fineweb by occiglot (newest only), up-to-date knowledge is ensured.
-- **Flexibility:** Trained with flexible sequence-length and embedding truncation, flexibility is a core feature of the model, while improving on V2-performance.
 ## Usage:

 # German Semantic V3
+Finally, a new version! The successor of German_Semantic_STS_V2 is here and comes with loads of cool new features!
 ## Major updates and USPs:
 - **Sequence length:** 8192, (16 times more than V2 and other models) -> thanks to the ALiBi implementation of Jina-Team!
 - **Matryoshka Embeddings:** The model is trained for embedding sizes from 1024 down to 64, allowing you to store much smaller embeddings with little quality loss.
 - **German only:** This model is German-only, causing the model to learn more efficient thanks to its tokenizer, deal better with shorter queries and generally be more nuanced.
 - **Updated knowledge and quality data:** The backbone of this model is gbert-large by deepset. With Stage-2 pretraining on German fineweb by occiglot (newest only), up-to-date knowledge is ensured.
+- **Flexibility:** Trained with flexible sequence-length and embedding truncation, flexibility is a core feature of the model, while improving on V2-performance.
+- **License:** Apache 2.0
 ## Usage: