obaidtambo
/

hinglish_bert_tokenizer

Text Generation

Model card Files Files and versions

obaidtambo commited on Feb 10, 2024

Commit

79cff9a

·

verified ·

1 Parent(s): 9c7d08b

Updated Usage code

Files changed (1) hide show

README.md +5 -0

README.md CHANGED Viewed

@@ -24,6 +24,11 @@ This repository contains a BERT tokenizer that has been trained on more than 200
 The tokenizer is capable of accurately tokenizing Hinglish text, splitting it into individual tokens that can be used as input to a BERT model. Here is an example of how the tokenizer works:
 ```python
 example = "aap se kuch keha tha kehte kehte reh gaye"
 tokens = tokenizer.tokenize(example)
 print(tokens)

 The tokenizer is capable of accurately tokenizing Hinglish text, splitting it into individual tokens that can be used as input to a BERT model. Here is an example of how the tokenizer works:
 ```python
+# Load model directly
+from transformers import AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained("obaidtambo/hinglish_bert_tokenizer")
 example = "aap se kuch keha tha kehte kehte reh gaye"
 tokens = tokenizer.tokenize(example)
 print(tokens)