Instructions to use monsoon-nlp/hindi-bert with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use monsoon-nlp/hindi-bert with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="monsoon-nlp/hindi-bert")# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("monsoon-nlp/hindi-bert") model = AutoModel.from_pretrained("monsoon-nlp/hindi-bert") - Notebooks
- Google Colab
- Kaggle
Trained tokenizer file
#5
by siba07 - opened
Hi, can the trained tokenizer model be published?
I'm not 100% on what you mean. I have the tokenizer's vocab.txt in the file, but not the original BertWordPieceTokenizer object from training.
If you were redoing it, I would use a smaller vocabulary size. There were lot of changes since 2020, so I don't think that we could rebuild it with the same random state, corpus, and code