DAMO-NLP-SG
/

PMR-base

Transformers

PyTorch

roberta

Model card Files Files and versions

xet

Community

xww033 commited on May 22, 2023

Commit

13a5a6c

1 Parent(s): 3cf215e

Update README.md

Browse files

Files changed (1) hide show

README.md +10 -44

README.md CHANGED Viewed

@@ -2,8 +2,7 @@
 license: mit
 ---
 # From Clozing to Comprehending: Retrofitting Pre-trained Masked Language Model to Pre-trained Machine Reader
-Pre-trained Machine Reading Comprehension (MRC) model trained with Wikipedia Hyperlinks.
 It was introduced in the paper From Clozing to Comprehending: Retrofitting Pre-trained Masked Language Model to Pre-trained Machine Reader by
 Weiwen Xu, Xin Li, Wenxuan Zhang, Meng Zhou, Wai Lam, Luo Si, Lidong Bing
 and first released in [this repository](https://github.com/DAMO-NLP-SG/PMR).
@@ -12,7 +11,6 @@ The model is initialized with roberta-base and further continued pre-trained wit
 ## Model description
 The model is pre-trained with distantly labeled data using a learning objective called Wiki Anchor Extraction (WAE).
 Specifically, we constructed a large volume of general-purpose and high-quality MRC-style training data based on Wikipedia anchors (i.e., hyperlinked texts).
 For each Wikipedia anchor, we composed a pair of correlated articles.
 One side of the pair is the Wikipedia article that contains detailed descriptions of the hyperlinked entity, which we defined as the definition article.
@@ -28,60 +26,28 @@ During fine-tuning, we unified downstream NLU tasks in our MRC formulation, whic
 (2) span extraction with natural questions (e.g., EQA) in which the question is treated as the query for answer extraction from the given passage (context);
 (3) sequence classification with pre-defined task labels, such as sentiment analysis. Each task label is used as a query for the input text (context); and
 (4) sequence classification with natural questions on multiple choices, such as multi-choice QA (MCQA). We treated the concatenation of the question and one choice as the query for the given passage (context).
 Then, in the output space, we tackle span extraction problems by predicting the probability of context span being the answer.
 We tackle sequence classification problems by conducting relevance classification on [CLS] (extracting [CLS] if relevant).
 ## Model variations
-There are three versions of models released. The details are:
-| Model | Backbone | #params | accuracy | Speed | #Training data
 |------------|-----------|----------|-------|-------|----|
-|   [zero-shot-classify-SSTuning-base](https://huggingface.co/DAMO-NLP-SG/zero-shot-classify-SSTuning-base)    |  [roberta-base](https://huggingface.co/roberta-base)      |  125M    |  Low    |  High    | 20.48M |
-|   [zero-shot-classify-SSTuning-large](https://huggingface.co/DAMO-NLP-SG/zero-shot-classify-SSTuning-large)    |    [roberta-large](https://huggingface.co/roberta-large)      | 355M     |   Medium   | Medium | 5.12M |
-|   [zero-shot-classify-SSTuning-ALBERT](https://huggingface.co/DAMO-NLP-SG/zero-shot-classify-SSTuning-ALBERT)   |  [albert-xxlarge-v2](https://huggingface.co/albert-xxlarge-v2)      |  235M   |    High  | Low| 5.12M |
-Please note that zero-shot-classify-SSTuning-base is trained with more data (20.48M) than the paper, as this will increase the accuracy.
 ## Intended uses & limitations
-The model can be used for zero-shot text classification such as sentiment analysis and topic classification. No further finetuning is needed.
-The number of labels should be 2 ~ 20.
 ### How to use
-You can try the model with the Colab [Notebook](https://colab.research.google.com/drive/17bqc8cXFF-wDmZ0o8j7sbrQB9Cq7Gowr?usp=sharing).
-```python
-from transformers import AutoTokenizer, AutoModelForSequenceClassification
-import torch, string, random
-tokenizer = AutoTokenizer.from_pretrained("DAMO-NLP-SG/zero-shot-classify-SSTuning-base")
-model = AutoModelForSequenceClassification.from_pretrained("DAMO-NLP-SG/zero-shot-classify-SSTuning-base")
-text = "I love this place! The food is always so fresh and delicious."
-list_label = ["negative", "positive"]
-list_ABC = [x for x in string.ascii_uppercase]
-def add_prefix(text, list_label, shuffle = False):
-    list_label = [x+'.' if x[-1] != '.' else x for x in list_label]
-    list_label_new = list_label + [tokenizer.pad_token]* (20 - len(list_label))
-    if shuffle:
-        random.shuffle(list_label_new)
-    s_option = ' '.join(['('+list_ABC[i]+') '+list_label_new[i] for i in range(len(list_label_new))])
-    return f'{s_option} {tokenizer.sep_token} {text}', list_label_new
-text_new, list_label_new = add_prefix(text,list_label,shuffle=False)
-encoding = tokenizer([text_new],truncation=True, padding='max_length',max_length=512, return_tensors='pt')
-with torch.no_grad():
-    logits = model(**encoding).logits
-    probs = torch.nn.functional.softmax(logits, dim = -1).tolist()
-    predictions = torch.argmax(logits, dim=-1)
-print(probs)
-print(predictions)
-```
 ### BibTeX entry and citation info

 license: mit
 ---
 # From Clozing to Comprehending: Retrofitting Pre-trained Masked Language Model to Pre-trained Machine Reader
+Pre-trained Machine Reader (PMR) is  pre-trained with 18 million Machine Reading Comprehension (MRC) examples constructed with Wikipedia Hyperlinks.
 It was introduced in the paper From Clozing to Comprehending: Retrofitting Pre-trained Masked Language Model to Pre-trained Machine Reader by
 Weiwen Xu, Xin Li, Wenxuan Zhang, Meng Zhou, Wai Lam, Luo Si, Lidong Bing
 and first released in [this repository](https://github.com/DAMO-NLP-SG/PMR).
 ## Model description
 The model is pre-trained with distantly labeled data using a learning objective called Wiki Anchor Extraction (WAE).
 Specifically, we constructed a large volume of general-purpose and high-quality MRC-style training data based on Wikipedia anchors (i.e., hyperlinked texts).
 For each Wikipedia anchor, we composed a pair of correlated articles.
 One side of the pair is the Wikipedia article that contains detailed descriptions of the hyperlinked entity, which we defined as the definition article.
 (2) span extraction with natural questions (e.g., EQA) in which the question is treated as the query for answer extraction from the given passage (context);
 (3) sequence classification with pre-defined task labels, such as sentiment analysis. Each task label is used as a query for the input text (context); and
 (4) sequence classification with natural questions on multiple choices, such as multi-choice QA (MCQA). We treated the concatenation of the question and one choice as the query for the given passage (context).
 Then, in the output space, we tackle span extraction problems by predicting the probability of context span being the answer.
 We tackle sequence classification problems by conducting relevance classification on [CLS] (extracting [CLS] if relevant).
 ## Model variations
+There are five (including two multilingual variations) versions of models released. The details are:
+| Model | Backbone | #params |
 |------------|-----------|----------|-------|-------|----|
+|   [PMR-base](https://huggingface.co/DAMO-NLP-SG/PMR-base)    |  [roberta-base](https://huggingface.co/roberta-base)      |  125M    |
+|   [PMR-large](https://huggingface.co/DAMO-NLP-SG/PMR-large)    |    [roberta-large](https://huggingface.co/roberta-large)      | 355M     |
+|   [PMR-xxlarge](https://huggingface.co/DAMO-NLP-SG/PMR-xxlarge)   |  [albert-xxlarge-v2](https://huggingface.co/albert-xxlarge-v2)      |  235M   |
+|   [mPMR-base](https://huggingface.co/DAMO-NLP-SG/mPMR-base)   |  [xlm-roberta-base](https://huggingface.co/xlm-roberta-base)      |  270M   |
+|   [mPMR-large](https://huggingface.co/DAMO-NLP-SG/mPMR-large)   |  [xlm-roberta-large](https://huggingface.co/xlm-roberta-large)      |  550M   |
 ## Intended uses & limitations
+The models need to be fine-tuned on the data downstream tasks. During fine-tuning, no task-specific layer is required.
 ### How to use
+You can try the scripts from [this repo](https://github.com/DAMO-NLP-SG/PMR).
 ### BibTeX entry and citation info