Upload README.md
Browse files
README.md
CHANGED
|
@@ -1062,6 +1062,12 @@ model-index:
|
|
| 1062 |
|
| 1063 |
## acge model
|
| 1064 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1065 |
acge是一个通用的文本编码模型,是一个可变长度的向量化模型,使用了[Matryoshka Representation Learning](https://arxiv.org/abs/2205.13147),如图所示:
|
| 1066 |
|
| 1067 |

|
|
@@ -1179,7 +1185,7 @@ print(similarity)
|
|
| 1179 |
在sentence-transformer库中的使用方法,选取不同的维度:
|
| 1180 |
|
| 1181 |
```python
|
| 1182 |
-
import
|
| 1183 |
from sentence_transformers import SentenceTransformer
|
| 1184 |
|
| 1185 |
sentences = ["数据1", "数据2"]
|
|
@@ -1187,8 +1193,11 @@ model = SentenceTransformer('acge_text_embedding')
|
|
| 1187 |
embeddings = model.encode(sentences, normalize_embeddings=False)
|
| 1188 |
matryoshka_dim = 1024
|
| 1189 |
embeddings = embeddings[..., :matryoshka_dim] # Shrink the embedding dimensions
|
| 1190 |
-
embeddings =
|
| 1191 |
print(embeddings.shape)
|
| 1192 |
# => (2, 1024)
|
| 1193 |
|
| 1194 |
```
|
|
|
|
|
|
|
|
|
|
|
|
| 1062 |
|
| 1063 |
## acge model
|
| 1064 |
|
| 1065 |
+

|
| 1066 |
+
|
| 1067 |
+
acge模型来自于[合合信息](https://www.intsig.com/)技术团队,对外技术试用平台[TextIn](https://www.textin.com/)。合合信息是行业领先的人工智能及大数据科技企业,致力于通过智能文字识别及商业大数据领域的核心技术、C端和B端产品以及行业解决方案为全球企业和个人用户提供创新的数字化、智能化服务。
|
| 1068 |
+
|
| 1069 |
+
技术交流请联系[yanhui](yanhui_he@intsig.net),商务合作联系[simon](simon_liu@intsig.net),可以[点击图片](https://huggingface.co/aspire/acge_text_embedding/img/wx.jpg),扫面二维码来加入我们的微信社群。
|
| 1070 |
+
|
| 1071 |
acge是一个通用的文本编码模型,是一个可变长度的向量化模型,使用了[Matryoshka Representation Learning](https://arxiv.org/abs/2205.13147),如图所示:
|
| 1072 |
|
| 1073 |

|
|
|
|
| 1185 |
在sentence-transformer库中的使用方法,选取不同的维度:
|
| 1186 |
|
| 1187 |
```python
|
| 1188 |
+
from sklearn.preprocessing import normalize
|
| 1189 |
from sentence_transformers import SentenceTransformer
|
| 1190 |
|
| 1191 |
sentences = ["数据1", "数据2"]
|
|
|
|
| 1193 |
embeddings = model.encode(sentences, normalize_embeddings=False)
|
| 1194 |
matryoshka_dim = 1024
|
| 1195 |
embeddings = embeddings[..., :matryoshka_dim] # Shrink the embedding dimensions
|
| 1196 |
+
embeddings = normalize(embeddings, norm="l2", axis=1)
|
| 1197 |
print(embeddings.shape)
|
| 1198 |
# => (2, 1024)
|
| 1199 |
|
| 1200 |
```
|
| 1201 |
+
|
| 1202 |
+
|
| 1203 |
+
|