docs: add kaggle conversion code
Browse files
README.md
CHANGED
|
@@ -16,6 +16,7 @@ This repository includes the Thai pretrained language representation (HoogBERTa_
|
|
| 16 |
|
| 17 |
# Documentation
|
| 18 |
|
|
|
|
| 19 |
## Prerequisite
|
| 20 |
Since we use subword-nmt BPE encoding, input needs to be pre-tokenize using [BEST](https://huggingface.co/datasets/best2009) standard before inputting into HoogBERTa
|
| 21 |
```
|
|
@@ -81,6 +82,12 @@ with torch.no_grad():
|
|
| 81 |
features = model(token_ids) # where token_ids is a tensor with type "long".
|
| 82 |
```
|
| 83 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 84 |
# Citation
|
| 85 |
|
| 86 |
Please cite as:
|
|
|
|
| 16 |
|
| 17 |
# Documentation
|
| 18 |
|
| 19 |
+
|
| 20 |
## Prerequisite
|
| 21 |
Since we use subword-nmt BPE encoding, input needs to be pre-tokenize using [BEST](https://huggingface.co/datasets/best2009) standard before inputting into HoogBERTa
|
| 22 |
```
|
|
|
|
| 82 |
features = model(token_ids) # where token_ids is a tensor with type "long".
|
| 83 |
```
|
| 84 |
|
| 85 |
+
|
| 86 |
+
## Conversion Code
|
| 87 |
+
If you are interested in how to convert Fairseq and subword-nmt Roberta into Huggingface hub here is my code used to do the conversion and test for parity match:
|
| 88 |
+
https://www.kaggle.com/norapatbuppodom/hoogberta-conversion
|
| 89 |
+
|
| 90 |
+
|
| 91 |
# Citation
|
| 92 |
|
| 93 |
Please cite as:
|