Update README.md
Browse files
README.md
CHANGED
|
@@ -284,16 +284,16 @@ license: apache-2.0
|
|
| 284 |
|
| 285 |
# German Semantic V3
|
| 286 |
|
| 287 |
-
The successor of German_Semantic_STS_V2 is here!
|
| 288 |
|
| 289 |
## Major updates and USPs:
|
| 290 |
|
| 291 |
- **Sequence length:** 8192, (16 times more than V2 and other models) -> thanks to the ALiBi implementation of Jina-Team!
|
| 292 |
- **Matryoshka Embeddings:** The model is trained for embedding sizes from 1024 down to 64, allowing you to store much smaller embeddings with little quality loss.
|
| 293 |
-
- **License:** Apache 2.0
|
| 294 |
- **German only:** This model is German-only, causing the model to learn more efficient thanks to its tokenizer, deal better with shorter queries and generally be more nuanced.
|
| 295 |
- **Updated knowledge and quality data:** The backbone of this model is gbert-large by deepset. With Stage-2 pretraining on German fineweb by occiglot (newest only), up-to-date knowledge is ensured.
|
| 296 |
-
- **Flexibility:** Trained with flexible sequence-length and embedding truncation, flexibility is a core feature of the model, while improving on V2-performance.
|
|
|
|
| 297 |
|
| 298 |
## Usage:
|
| 299 |
|
|
|
|
| 284 |
|
| 285 |
# German Semantic V3
|
| 286 |
|
| 287 |
+
Finally, a new version! The successor of German_Semantic_STS_V2 is here and comes with loads of cool new features!
|
| 288 |
|
| 289 |
## Major updates and USPs:
|
| 290 |
|
| 291 |
- **Sequence length:** 8192, (16 times more than V2 and other models) -> thanks to the ALiBi implementation of Jina-Team!
|
| 292 |
- **Matryoshka Embeddings:** The model is trained for embedding sizes from 1024 down to 64, allowing you to store much smaller embeddings with little quality loss.
|
|
|
|
| 293 |
- **German only:** This model is German-only, causing the model to learn more efficient thanks to its tokenizer, deal better with shorter queries and generally be more nuanced.
|
| 294 |
- **Updated knowledge and quality data:** The backbone of this model is gbert-large by deepset. With Stage-2 pretraining on German fineweb by occiglot (newest only), up-to-date knowledge is ensured.
|
| 295 |
+
- **Flexibility:** Trained with flexible sequence-length and embedding truncation, flexibility is a core feature of the model, while improving on V2-performance.
|
| 296 |
+
- **License:** Apache 2.0
|
| 297 |
|
| 298 |
## Usage:
|
| 299 |
|