Instructions to use lier007/xiaobu-embedding with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use lier007/xiaobu-embedding with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="lier007/xiaobu-embedding")# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("lier007/xiaobu-embedding") model = AutoModel.from_pretrained("lier007/xiaobu-embedding") - Notebooks
- Google Colab
- Kaggle
# Load model directly
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("lier007/xiaobu-embedding")
model = AutoModel.from_pretrained("lier007/xiaobu-embedding")Quick Links
xiaobu-embedding
模型:基于GTE模型[1]多任务微调。
数据:闲聊类Query-Query、知识类Query-Doc、BGE开源Query-Doc[2];清洗正例,挖掘中等难度负例;累计6M(质量更重要)。
Usage (Sentence-Transformers)
pip install -U sentence-transformers
相似度计算:
from sentence_transformers import SentenceTransformer
sentences_1 = ["样例数据-1", "样例数据-2"]
sentences_2 = ["样例数据-3", "样例数据-4"]
model = SentenceTransformer('lier007/xiaobu-embedding')
embeddings_1 = model.encode(sentences_1, normalize_embeddings=True)
embeddings_2 = model.encode(sentences_2, normalize_embeddings=True)
similarity = embeddings_1 @ embeddings_2.T
print(similarity)
Evaluation
参考BGE中文CMTEB评估[2]
Finetune
参考BGE微调模块[2]
Reference
- Downloads last month
- 28
Spaces using lier007/xiaobu-embedding 14
🥇
mteb/leaderboard
🥇
mteb/leaderboard_legacy
🥇
SmileXing/leaderboard
🥇
maxpar1/leaderboard
🥇
sq66/leaderboard_legacy
Evaluation results
- cos_sim_pearson on MTEB AFQMCvalidation set self-reported49.379
- cos_sim_spearman on MTEB AFQMCvalidation set self-reported54.847
- euclidean_pearson on MTEB AFQMCvalidation set self-reported53.050
- euclidean_spearman on MTEB AFQMCvalidation set self-reported54.848
- manhattan_pearson on MTEB AFQMCvalidation set self-reported53.063
- manhattan_spearman on MTEB AFQMCvalidation set self-reported54.874
- cos_sim_pearson on MTEB ATECtest set self-reported48.160
- cos_sim_spearman on MTEB ATECtest set self-reported55.132
- euclidean_pearson on MTEB ATECtest set self-reported55.436
- euclidean_spearman on MTEB ATECtest set self-reported55.132
- manhattan_pearson on MTEB ATECtest set self-reported55.414
- manhattan_spearman on MTEB ATECtest set self-reported55.134
- accuracy on MTEB AmazonReviewsClassification (zh)test set self-reported46.722
- f1 on MTEB AmazonReviewsClassification (zh)test set self-reported45.039
- cos_sim_pearson on MTEB BQtest set self-reported63.518
- cos_sim_spearman on MTEB BQtest set self-reported65.570
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="lier007/xiaobu-embedding")