Sentence Similarity
sentence-transformers
Safetensors
Transformers
Chinese
English
qwen2
feature-extraction
text-embeddings-inference
Instructions to use BAAI/bge-code-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use BAAI/bge-code-v1 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("BAAI/bge-code-v1") sentences = [ "那是 個快樂的人", "那是 條快樂的狗", "那是 個非常幸福的人", "今天是晴天" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Transformers
How to use BAAI/bge-code-v1 with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("BAAI/bge-code-v1") model = AutoModel.from_pretrained("BAAI/bge-code-v1") - Notebooks
- Google Colab
- Kaggle
Something wrong with tokenizer??
#4
by ctranslate2-4you - opened
Hello,
When I try to run this model using Huggingfaceembeddings from Langchain, which wraps sentence transformers, I get this error when trying to conduct a search and no results are returned:
There was a bug in Trie algorithm in tokenization. Attempting to recover. Please report it anyway.
This particular error originates from tokenization_utils.py within the transformers library and I've tried setting trust_remote_code among other troubleshooting steps.
@tomaarsen or the repository owners can help since you guys are the experts?
ctranslate2-4you changed discussion title from Something wrong with tokenizer to Something wrong with tokenizer??