lst-nectec
/

HoogBERTa

@@ -16,6 +16,7 @@ This repository includes the Thai pretrained language representation (HoogBERTa_
 # Documentation
 ## Prerequisite
 Since we use subword-nmt BPE encoding, input needs to be pre-tokenize using [BEST](https://huggingface.co/datasets/best2009) standard before inputting into HoogBERTa
 ```
@@ -81,6 +82,12 @@ with torch.no_grad():
   features = model(token_ids) # where token_ids is a tensor with type "long".
 ```
 # Citation
 Please cite as:

 # Documentation
 ## Prerequisite
 Since we use subword-nmt BPE encoding, input needs to be pre-tokenize using [BEST](https://huggingface.co/datasets/best2009) standard before inputting into HoogBERTa
 ```
   features = model(token_ids) # where token_ids is a tensor with type "long".
 ```
+## Conversion Code
+If you are interested in how to convert Fairseq and subword-nmt Roberta into Huggingface hub here is my code used to do the conversion and test for parity match:
+https://www.kaggle.com/norapatbuppodom/hoogberta-conversion
 # Citation
 Please cite as: