RAG needs embedding model and chunker to use the same tokenizer? WHich one is it?
#3
by dawgctor-air - opened
According to the Voyage AI documentation all the other embedding models have a specific tokenizer listed in the documentation but Voyage Context 3 does not.
What I want to know is which tokenizer this model is using? I see that it is part of the files but I don't know how to include that in my docling workflow !
I think is on the tokenizer_config.json, the architecture is based on Qwen2Tokenizer, so you can use any based on this model, because the tokens will match