How was the `tokenizer.json` created?

#3
by VaishalBusiness - opened

Hi Xenova team,

I'm trying to understand how you generated the tokenizer.json file used in your models. Was it directly exported from a SentencePiece model, converted via Hugging Face’s transformers tools, or created through a custom process?

Specifically, I'm interested in reproducing the same structure for a custom SentencePiece model so it works with your ONNX/transformers.js pipelines.

Could you please share how you built or converted it — and which tools or scripts were used?

Thanks in advance!

VaishalBusiness changed discussion status to closed
VaishalBusiness changed discussion status to open

Hi Xenova team,

I hope you’re doing well. I sent the email below on October 22 regarding how the tokenizer.json file was generated for your models, but I haven’t heard back yet.

I’m still very interested in understanding whether it was exported directly from a SentencePiece model, converted via Hugging Face tools, or created through a custom process — and any guidance for reproducing the same structure for a custom SentencePiece model to work with your ONNX/transformers.js pipelines.

I’d greatly appreciate any insight or pointers whenever you have a chance.

Thank you very much!

Sign up or log in to comment