RAG needs embedding model and chunker to use the same tokenizer? WHich one is it?

#3
by dawgctor-air - opened

According to the Voyage AI documentation all the other embedding models have a specific tokenizer listed in the documentation but Voyage Context 3 does not.

What I want to know is which tokenizer this model is using? I see that it is part of the files but I don't know how to include that in my docling workflow !

I think is on the tokenizer_config.json, the architecture is based on Qwen2Tokenizer, so you can use any based on this model, because the tokens will match

Sign up or log in to comment