Something wrong with tokenizer??

by ctranslate2-4you - opened May 19, 2025

May 19, 2025

Hello,

When I try to run this model using Huggingfaceembeddings from Langchain, which wraps sentence transformers, I get this error when trying to conduct a search and no results are returned:

There was a bug in Trie algorithm in tokenization. Attempting to recover. Please report it anyway.

This particular error originates from tokenization_utils.py within the transformers library and I've tried setting trust_remote_code among other troubleshooting steps.

@tomaarsen or the repository owners can help since you guys are the experts?

ctranslate2-4you changed discussion title from Something wrong with tokenizer to Something wrong with tokenizer?? May 19, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment