Updated Usage code
Browse files
README.md
CHANGED
|
@@ -24,6 +24,11 @@ This repository contains a BERT tokenizer that has been trained on more than 200
|
|
| 24 |
The tokenizer is capable of accurately tokenizing Hinglish text, splitting it into individual tokens that can be used as input to a BERT model. Here is an example of how the tokenizer works:
|
| 25 |
|
| 26 |
```python
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
example = "aap se kuch keha tha kehte kehte reh gaye"
|
| 28 |
tokens = tokenizer.tokenize(example)
|
| 29 |
print(tokens)
|
|
|
|
| 24 |
The tokenizer is capable of accurately tokenizing Hinglish text, splitting it into individual tokens that can be used as input to a BERT model. Here is an example of how the tokenizer works:
|
| 25 |
|
| 26 |
```python
|
| 27 |
+
# Load model directly
|
| 28 |
+
from transformers import AutoTokenizer
|
| 29 |
+
|
| 30 |
+
tokenizer = AutoTokenizer.from_pretrained("obaidtambo/hinglish_bert_tokenizer")
|
| 31 |
+
|
| 32 |
example = "aap se kuch keha tha kehte kehte reh gaye"
|
| 33 |
tokens = tokenizer.tokenize(example)
|
| 34 |
print(tokens)
|