File size: 4,925 Bytes
385436c b2cff61 385436c 71edeb4 385436c 17587c9 385436c 17587c9 1f7b27b 17587c9 1f7b27b 17587c9 b2cff61 17587c9 b2cff61 17587c9 b2cff61 a01ea93 b2cff61 a01ea93 b2cff61 17587c9 b2cff61 17587c9 b2cff61 17587c9 1f7b27b 17587c9 b2cff61 06d3b5d b2cff61 06d3b5d 1f7b27b b3506f3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
---
language: en
datasets:
- squad_v2
license: cc-by-4.0
model-index:
- name: deepset/roberta-base-squad2-covid
results:
- task:
type: question-answering
name: Question Answering
dataset:
name: squad_v2
type: squad_v2
config: squad_v2
split: validation
metrics:
- name: Exact Match
type: exact_match
value: 53.8209
verified: true
- name: F1
type: f1
value: 60.1806
verified: true
---
# roberta-base-squad2 for QA on COVID-19
## Overview
**Language model:** deepset/roberta-base-squad2
**Language:** English
**Downstream-task:** Extractive QA
**Training data:** [SQuAD-style CORD-19 annotations from 23rd April](https://github.com/deepset-ai/COVID-QA/blob/master/data/question-answering/200423_covidQA.json)
**Code:** See [example](https://github.com/deepset-ai/FARM/blob/master/examples/question_answering_crossvalidation.py) in [FARM](https://github.com/deepset-ai/FARM)
**Infrastructure**: Tesla v100
## Hyperparameters
```
batch_size = 24
n_epochs = 3
base_LM_model = "deepset/roberta-base-squad2"
max_seq_len = 384
learning_rate = 3e-5
lr_schedule = LinearWarmup
warmup_proportion = 0.1
doc_stride = 128
xval_folds = 5
dev_split = 0
no_ans_boost = -100
```
---
license: cc-by-4.0
---
## Performance
5-fold cross-validation on the data set led to the following results:
**Single EM-Scores:** [0.222, 0.123, 0.234, 0.159, 0.158]
**Single F1-Scores:** [0.476, 0.493, 0.599, 0.461, 0.465]
**Single top\\_3\\_recall Scores:** [0.827, 0.776, 0.860, 0.771, 0.777]
**XVAL EM:** 0.17890995260663506
**XVAL f1:** 0.49925444207319924
**XVAL top\\_3\\_recall:** 0.8021327014218009
This model is the model obtained from the **third** fold of the cross-validation.
## Usage
### In Haystack
For doing QA at scale (i.e. many docs instead of single paragraph), you can load the model also in [haystack](https://github.com/deepset-ai/haystack/):
```python
reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2-covid")
# or
reader = TransformersReader(model="deepset/roberta-base-squad2",tokenizer="deepset/roberta-base-squad2-covid")
```
### In Transformers
```python
from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline
model_name = "deepset/roberta-base-squad2-covid"
# a) Get predictions
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
QA_input = {
'question': 'Why is model conversion important?',
'context': 'The option to convert models between FARM and transformers gives freedom to the user and let people easily switch between frameworks.'
}
res = nlp(QA_input)
# b) Load model & tokenizer
model = AutoModelForQuestionAnswering.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
```
## Authors
**Branden Chan:** branden.chan@deepset.ai
**Timo M枚ller:** timo.moeller@deepset.ai
**Malte Pietsch:** malte.pietsch@deepset.ai
**Tanay Soni:** tanay.soni@deepset.ai
**Bogdan Kosti膰:** bogdan.kostic@deepset.ai
## About us
<div class="grid lg:grid-cols-2 gap-x-4 gap-y-3">
<div class="w-full h-40 object-cover mb-2 rounded-lg flex items-center justify-center">
<img alt="" src="https://raw.githubusercontent.com/deepset-ai/.github/main/deepset-logo-colored.png" class="w-40"/>
</div>
<div class="w-full h-40 object-cover mb-2 rounded-lg flex items-center justify-center">
<img alt="" src="https://raw.githubusercontent.com/deepset-ai/.github/main/haystack-logo-colored.png" class="w-40"/>
</div>
</div>
[deepset](http://deepset.ai/) is the company behind the open-source NLP framework [Haystack](https://haystack.deepset.ai/) which is designed to help you build production ready NLP systems that use: Question answering, summarization, ranking etc.
Some of our other work:
- [Distilled roberta-base-squad2 (aka "tinyroberta-squad2")]([https://huggingface.co/deepset/tinyroberta-squad2)
- [German BERT (aka "bert-base-german-cased")](https://deepset.ai/german-bert)
- [GermanQuAD and GermanDPR datasets and models (aka "gelectra-base-germanquad", "gbert-base-germandpr")](https://deepset.ai/germanquad)
## Get in touch and join the Haystack community
<p>For more info on Haystack, visit our <strong><a href="https://github.com/deepset-ai/haystack">GitHub</a></strong> repo and <strong><a href="https://haystack.deepset.ai">Documentation</a></strong>.
We also have a <strong><a class="h-7" href="https://haystack.deepset.ai/community/join">Discord community open to everyone!</a></strong></p>
[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai)
By the way: [we're hiring!](http://www.deepset.ai/jobs)
|