nsadeq
/

InformBERT

 ---
 license: apache-2.0
 ---
+# InformBERT
+## Introduction
+InformBERT is pretrained model trained using variable masking strategy, where informative tokens are masked more frequently compared to other tokens. InformBERT outperforms random masking based pretrained models on the factual recall benchmark LAMA and extractive question answering benchmark SQuAD.
+## How to use
+```Python
+from transformers import AutoTokenizer, AutoModel
+tokenizer = AutoTokenizer.from_pretrained("nsadeq/InformBERT")
+model = AutoModel.from_pretrained("nsadeq/InformBERT")
+```
+## Citation
+```bibtex
+@misc{https://doi.org/10.48550/arxiv.2210.11771,
+  doi = {10.48550/ARXIV.2210.11771},
+  url = {https://arxiv.org/abs/2210.11771},
+  author = {Sadeq, Nafis and Xu, Canwen and McAuley, Julian},
+  keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
+  title = {InforMask: Unsupervised Informative Masking for Language Model Pretraining},
+  publisher = {arXiv},
+  year = {2022},
+  copyright = {arXiv.org perpetual, non-exclusive license}
+}
+```