Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,69 @@
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
---
|
| 4 |
+
# mPMR: A Multilingual Pre-trained Machine Reader at Scale
|
| 5 |
+
Multilingual Pre-trained Machine Reader (mPMR) is a multilingual extension of PMR.
|
| 6 |
+
mPMR is pre-trained with 18 million Machine Reading Comprehension (MRC) examples constructed with Wikipedia Hyperlinks.
|
| 7 |
+
It was introduced in the paper mPMR: A Multilingual Pre-trained Machine Reader at Scale by
|
| 8 |
+
Weiwen Xu, Xin Li, Wai Lam, Lidong Bing
|
| 9 |
+
and first released in [this repository](https://github.com/DAMO-NLP-SG/PMR).
|
| 10 |
+
|
| 11 |
+
This model is initialized with xlm-roberta-base and further continued pre-trained with an MRC objective.
|
| 12 |
+
|
| 13 |
+
## Model description
|
| 14 |
+
The model is pre-trained with distantly labeled data using a learning objective called Wiki Anchor Extraction (WAE).
|
| 15 |
+
Specifically, we constructed a large volume of general-purpose and high-quality MRC-style training data based on Wikipedia anchors (i.e., hyperlinked texts).
|
| 16 |
+
For each Wikipedia anchor, we composed a pair of correlated articles.
|
| 17 |
+
One side of the pair is the Wikipedia article that contains detailed descriptions of the hyperlinked entity, which we defined as the definition article.
|
| 18 |
+
The other side of the pair is the article that mentions the specific anchor text, which we defined as the mention article.
|
| 19 |
+
We composed an MRC-style training instance in which the anchor is the answer,
|
| 20 |
+
the surrounding passage of the anchor in the mention article is the context, and the definition of the anchor entity in the definition article is the query.
|
| 21 |
+
Based on the above data, we then introduced a novel WAE problem as the pre-training task of mPMR.
|
| 22 |
+
In this task, mPMR determines whether the context and the query are relevant.
|
| 23 |
+
If so, mPMR extracts the answer from the context that satisfies the query description.
|
| 24 |
+
|
| 25 |
+
During fine-tuning, we unified downstream NLU tasks in our MRC formulation, which typically falls into four categories:
|
| 26 |
+
(1) span extraction with pre-defined labels (e.g., NER) in which each task label is treated as a query to search the corresponding answers in the input text (context);
|
| 27 |
+
(2) span extraction with natural questions (e.g., EQA) in which the question is treated as the query for answer extraction from the given passage (context);
|
| 28 |
+
(3) sequence classification with pre-defined task labels, such as sentiment analysis. Each task label is used as a query for the input text (context); and
|
| 29 |
+
(4) sequence classification with natural questions on multiple choices, such as multi-choice QA (MCQA). We treated the concatenation of the question and one choice as the query for the given passage (context).
|
| 30 |
+
Then, in the output space, we tackle span extraction problems by predicting the probability of context span being the answer.
|
| 31 |
+
We tackle sequence classification problems by conducting relevance classification on [CLS] (extracting [CLS] if relevant).
|
| 32 |
+
|
| 33 |
+
## Model variations
|
| 34 |
+
There are two versions of models released. The details are:
|
| 35 |
+
|
| 36 |
+
| Model | Backbone | #params |
|
| 37 |
+
|------------|-----------|----------|
|
| 38 |
+
| [mPMR-base](https://huggingface.co/DAMO-NLP-SG/mPMR-base) | [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) | 270M |
|
| 39 |
+
| [mPMR-large](https://huggingface.co/DAMO-NLP-SG/mPMR-large) | [xlm-roberta-large](https://huggingface.co/xlm-roberta-large) | 550M |
|
| 40 |
+
|
| 41 |
+
|
| 42 |
+
|
| 43 |
+
## Intended uses & limitations
|
| 44 |
+
The models need to be fine-tuned on the data downstream tasks. During fine-tuning, no task-specific layer is required.
|
| 45 |
+
|
| 46 |
+
### How to use
|
| 47 |
+
You can try the codes from [this repo](https://github.com/DAMO-NLP-SG/mPMR).
|
| 48 |
+
|
| 49 |
+
|
| 50 |
+
|
| 51 |
+
### BibTeX entry and citation info
|
| 52 |
+
```bibtxt
|
| 53 |
+
@article{xu2022clozing,
|
| 54 |
+
title={From Clozing to Comprehending: Retrofitting Pre-trained Language Model to Pre-trained Machine Reader},
|
| 55 |
+
author={Xu, Weiwen and Li, Xin and Zhang, Wenxuan and Zhou, Meng and Bing, Lidong and Lam, Wai and Si, Luo},
|
| 56 |
+
journal={arXiv preprint arXiv:2212.04755},
|
| 57 |
+
year={2022}
|
| 58 |
+
}
|
| 59 |
+
@inproceedings{xu2022mpmr,
|
| 60 |
+
title = "mPMR: A Multilingual Pre-trained Machine Reader at Scale",
|
| 61 |
+
author = "Xu, Weiwen and
|
| 62 |
+
Li, Xin and
|
| 63 |
+
Lam, Wai and
|
| 64 |
+
Bing, Lidong",
|
| 65 |
+
booktitle = "The 61th Annual Meeting of the Association for Computational Linguistics.",
|
| 66 |
+
year = "2023",
|
| 67 |
+
}
|
| 68 |
+
|
| 69 |
+
```
|