Update README.md
Browse files
README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
---
|
| 2 |
model-index:
|
| 3 |
-
- name:
|
| 4 |
results:
|
| 5 |
- dataset:
|
| 6 |
config: default
|
|
@@ -1259,3 +1259,41 @@ model-index:
|
|
| 1259 |
tags:
|
| 1260 |
- mteb
|
| 1261 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
model-index:
|
| 3 |
+
- name: PLACEHOLDER
|
| 4 |
results:
|
| 5 |
- dataset:
|
| 6 |
config: default
|
|
|
|
| 1259 |
tags:
|
| 1260 |
- mteb
|
| 1261 |
---
|
| 1262 |
+
## Yuan-embedding-1.0
|
| 1263 |
+
|
| 1264 |
+
Yuan-embedding-1.0是专门为中文文本检索任务设计的嵌入模型。它基于xiaobu-embedding-v2[1],主要改动如下:
|
| 1265 |
+
|
| 1266 |
+
- 在Hard negative sampling中,使用Rerank模型(bge-reranker-large [2])进行数据排序筛选
|
| 1267 |
+
|
| 1268 |
+
- 通过LLM(llama3.1[3])迭代生成新query
|
| 1269 |
+
|
| 1270 |
+
- 基于piccolo-embedding [4]进行训练
|
| 1271 |
+
|
| 1272 |
+
|
| 1273 |
+
## Usage
|
| 1274 |
+
|
| 1275 |
+
```bash
|
| 1276 |
+
pip install -U sentence-transformers
|
| 1277 |
+
```
|
| 1278 |
+
|
| 1279 |
+
使用示例:
|
| 1280 |
+
|
| 1281 |
+
```python
|
| 1282 |
+
from sentence_transformers import SentenceTransformer
|
| 1283 |
+
|
| 1284 |
+
model = SentenceTransformer("IEIYuan/Yuan-embedding-1.0")
|
| 1285 |
+
sentences = [
|
| 1286 |
+
"这是一个样例-1",
|
| 1287 |
+
"这是一个样例-2",
|
| 1288 |
+
]
|
| 1289 |
+
embeddings = model.encode(sentences)
|
| 1290 |
+
similarities = model.similarity(embeddings, embeddings)
|
| 1291 |
+
print(similarities)
|
| 1292 |
+
```
|
| 1293 |
+
|
| 1294 |
+
## Reference
|
| 1295 |
+
|
| 1296 |
+
1. https://huggingface.co/lier007/xiaobu-embedding-v2
|
| 1297 |
+
2. https://huggingface.co/BAAI/bge-reranker-large
|
| 1298 |
+
3. https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct
|
| 1299 |
+
4. https://github.com/hjq133/piccolo-embedding
|