Update README.md
Browse files
README.md
CHANGED
|
@@ -31,7 +31,7 @@ The successors of [German_Semantic_STS_V2](https://huggingface.co/aari1995/Germa
|
|
| 31 |
|
| 32 |
**Note:** To run this model properly, see "Usage".
|
| 33 |
|
| 34 |
-
|
| 35 |
|
| 36 |
- **Flexibility:** Trained with flexible sequence-length and embedding truncation, flexibility is a core feature of the model. Yet, smaller dimensions bring a minor trade-off in quality.
|
| 37 |
- **Sequence length:** Embed up to 8192 tokens (16 times more than V2 and other models)
|
|
@@ -42,7 +42,7 @@ The successors of [German_Semantic_STS_V2](https://huggingface.co/aari1995/Germa
|
|
| 42 |
- **License:** Apache 2.0
|
| 43 |
|
| 44 |
|
| 45 |
-
|
| 46 |
|
| 47 |
This model has some build-in functionality that is rather hidden. To profit from it, use this code:
|
| 48 |
|
|
@@ -74,7 +74,7 @@ similarities = model.similarity(embeddings, embeddings)
|
|
| 74 |
|
| 75 |
```
|
| 76 |
|
| 77 |
-
|
| 78 |
|
| 79 |
```
|
| 80 |
SentenceTransformer(
|
|
@@ -84,7 +84,7 @@ SentenceTransformer(
|
|
| 84 |
```
|
| 85 |
|
| 86 |
|
| 87 |
-
|
| 88 |
|
| 89 |
**Q: Is this Model better than V2?**
|
| 90 |
|
|
@@ -111,17 +111,17 @@ Another noticable difference is that V3 has a broader cosine_similarity spectrum
|
|
| 111 |
**A:** Broadly speaking, when going from 1024 to 512 dimensions, there is very little trade-off (1 percent). When going down to 64 dimensions, you may face a decrease of up to 3 percent.
|
| 112 |
|
| 113 |
|
| 114 |
-
|
| 115 |
|
| 116 |
Storage comparison:
|
| 117 |

|
| 118 |
|
| 119 |
Benchmarks: soon.
|
| 120 |
|
| 121 |
-
|
| 122 |
-
German_Semantic_V3_Instruct: Guiding your embeddings towards self-selected aspects
|
| 123 |
|
| 124 |
-
|
| 125 |
|
| 126 |
- To [jinaAI](https://huggingface.co/jinaai) for their BERT implementation that is used, especially ALiBi
|
| 127 |
- To [deepset](https://huggingface.co/deepset) for the gbert-large, which is a really great model
|
|
|
|
| 31 |
|
| 32 |
**Note:** To run this model properly, see "Usage".
|
| 33 |
|
| 34 |
+
# Major updates and USPs:
|
| 35 |
|
| 36 |
- **Flexibility:** Trained with flexible sequence-length and embedding truncation, flexibility is a core feature of the model. Yet, smaller dimensions bring a minor trade-off in quality.
|
| 37 |
- **Sequence length:** Embed up to 8192 tokens (16 times more than V2 and other models)
|
|
|
|
| 42 |
- **License:** Apache 2.0
|
| 43 |
|
| 44 |
|
| 45 |
+
# Usage:
|
| 46 |
|
| 47 |
This model has some build-in functionality that is rather hidden. To profit from it, use this code:
|
| 48 |
|
|
|
|
| 74 |
|
| 75 |
```
|
| 76 |
|
| 77 |
+
## Full Model Architecture
|
| 78 |
|
| 79 |
```
|
| 80 |
SentenceTransformer(
|
|
|
|
| 84 |
```
|
| 85 |
|
| 86 |
|
| 87 |
+
# FAQ
|
| 88 |
|
| 89 |
**Q: Is this Model better than V2?**
|
| 90 |
|
|
|
|
| 111 |
**A:** Broadly speaking, when going from 1024 to 512 dimensions, there is very little trade-off (1 percent). When going down to 64 dimensions, you may face a decrease of up to 3 percent.
|
| 112 |
|
| 113 |
|
| 114 |
+
# Evaluation
|
| 115 |
|
| 116 |
Storage comparison:
|
| 117 |

|
| 118 |
|
| 119 |
Benchmarks: soon.
|
| 120 |
|
| 121 |
+
# Up next:
|
| 122 |
+
German_Semantic_V3_Instruct: Guiding your embeddings towards self-selected aspects. - planned: 2024.
|
| 123 |
|
| 124 |
+
# Thank You and Credits
|
| 125 |
|
| 126 |
- To [jinaAI](https://huggingface.co/jinaai) for their BERT implementation that is used, especially ALiBi
|
| 127 |
- To [deepset](https://huggingface.co/deepset) for the gbert-large, which is a really great model
|