fbaigt
/

proc_roberta

Feature Extraction

Model card Files Files and versions

Fan Bai commited on Nov 7, 2021

Commit

62bad82

·

1 Parent(s): 4eec3a1

Add a model card

Files changed (1) hide show

README.md +30 -0

README.md ADDED Viewed

	@@ -0,0 +1,30 @@

+---
+language:
+- en
+datasets:
+- pubmed
+- chemical patent
+- recipe
+---
+## Proc-RoBERTa
+Proc-RoBERTa is a pre-trained language model for procedural text. It was built by fine-tuning the RoBERTa-based model on a procedural corpus (PubMed articles/chemical patents/cooking recipes), which contains 1.05B tokens. More details can be found in the following [paper](https://arxiv.org/abs/2109.04711):
+```
+@article{Bai2021PretrainOA,
+  title={Pre-train or Annotate? Domain Adaptation with a Constrained Budget},
+  author={Fan Bai and Alan Ritter and Wei Xu},
+  journal={ArXiv},
+  year={2021},
+  volume={abs/2109.04711}
+}
+```
+## Usage
+```
+from transformers import *
+tokenizer = AutoTokenizer.from_pretrained("fbaigt/proc_roberta")
+model = AutoModelForTokenClassification.from_pretrained("fbaigt/proc_roberta")
+```
+More usage details can be found [here](https://github.com/bflashcp3f/ProcBERT).