fathan
/

indojave-codemixed-bert-base

Generated from Trainer

Model card Files Files and versions

Metrics Training metrics Community

fathan commited on Feb 24, 2023

Commit

c2fe415

·

1 Parent(s): e164338

Update README.md

Files changed (1) hide show

README.md +26 -1

README.md CHANGED Viewed

@@ -47,7 +47,8 @@ In the second stage pre-processing, we do the following pre-processing tasks:
 - convert ‘@username’ to ‘@USER’,
 - convert URL to HTTPURL.
-Finally, we have 28,121,693 sentences for the training process.
 ## Model
 | Model name           | Architecture    | Size of training data      | Size of validation data |
@@ -62,6 +63,30 @@ The following are the results obtained from the training:
 |------------|------------|------------|
 |   3.5057   |   3.0559   |  21.2398   |
 ### Training hyperparameters
 The following hyperparameters were used during training:

 - convert ‘@username’ to ‘@USER’,
 - convert URL to HTTPURL.
+Finally, we have 28,121,693 sentences for the training process.
+This pretraining data will not be opened to public due to Twitter policy.
 ## Model
 | Model name           | Architecture    | Size of training data      | Size of validation data |
 |------------|------------|------------|
 |   3.5057   |   3.0559   |  21.2398   |
+## How to use
+### Load model and tokenizer
+```python
+from transformers import AutoTokenizer, AutoModel
+tokenizer = AutoTokenizer.from_pretrained("fathan/code_mixed_ijebert")
+model = AutoModel.from_pretrained("fathan/code_mixed_ijebert")
+```
+### Masked language model
+```python
+from transformers import pipeline
+pretrained_model = "fathan/code_mixed_ijebert"
+fill_mask = pipeline(
+    "fill-mask",
+    model=pretrained_model,
+    tokenizer=pretrained_model
+)
+```
 ### Training hyperparameters
 The following hyperparameters were used during training: