Instructions to use BAAI/bge-large-zh with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use BAAI/bge-large-zh with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="BAAI/bge-large-zh")# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("BAAI/bge-large-zh") model = AutoModel.from_pretrained("BAAI/bge-large-zh") - Inference
- Notebooks
- Google Colab
- Kaggle
模型支持最大文本长度为512token,请问一个token对应几个英文字母或中文汉字?
#9
by kyonyan - opened
如题,谢谢。
您好,一个token会对应多个字母或汉子,没有一个恒定的比例。
可以根据一下方法计算tokenizer后的长度:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-large-zh')
length = len(tokenizer("hello world")['input_ids'])