How was the `tokenizer.json` created?
Hi Xenova team,
I'm trying to understand how you generated the tokenizer.json file used in your models. Was it directly exported from a SentencePiece model, converted via Hugging Face’s transformers tools, or created through a custom process?
Specifically, I'm interested in reproducing the same structure for a custom SentencePiece model so it works with your ONNX/transformers.js pipelines.
Could you please share how you built or converted it — and which tools or scripts were used?
Thanks in advance!
Hi Xenova team,
I hope you’re doing well. I sent the email below on October 22 regarding how the tokenizer.json file was generated for your models, but I haven’t heard back yet.
I’m still very interested in understanding whether it was exported directly from a SentencePiece model, converted via Hugging Face tools, or created through a custom process — and any guidance for reproducing the same structure for a custom SentencePiece model to work with your ONNX/transformers.js pipelines.
I’d greatly appreciate any insight or pointers whenever you have a chance.
Thank you very much!