Instructions to use lier007/xiaobu-embedding-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use lier007/xiaobu-embedding-v2 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("lier007/xiaobu-embedding-v2") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
请问如何获取 encode 的 token 数?
#2
by jamesljl - opened
类似以下输出:
{
"object": "list",
"data": [
{
"object": "embedding",
"embedding": [
0.0023064255,
-0.009327292,
......
-0.0028842222,
],
"index": 0
}
],
"model": "xiaobu-embedding-v2",
"usage": {
"prompt_tokens": 8,
"total_tokens": 8
}
}
prompt_tokens 和 total_tokens 这两项的值。
jamesljl changed discussion title from 请问如何获取 embedding 的 token 数? to 请问如何获取 encode 的 token 数?
SentenceTransformer把tokenize过程封进encode去了,所以如果要拿token数:
1、重新tokenize一次(简单、但tokenize了两次)
2、继承SentenceTransformer重写encode方法,把自己想要的中间结果暴漏出来