File size: 5,419 Bytes
385436c
b2cff61
 
 
385436c
 
 
bd56cd0
17587c9
 
 
 
 
 
bd56cd0
17587c9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
385436c
 
 
17587c9
 
 
 
 
 
1f7b27b
17587c9
 
1f7b27b
17587c9
 
 
 
 
b2cff61
bd56cd0
 
b2cff61
bd56cd0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b2cff61
bd56cd0
b2cff61
17587c9
 
 
 
bd56cd0
17587c9
 
 
 
 
 
 
 
 
 
 
 
 
 
bd56cd0
b2cff61
 
 
 
 
 
17587c9
b2cff61
bd56cd0
b2cff61
 
a01ea93
b2cff61
a01ea93
 
b2cff61
 
17587c9
bd56cd0
17587c9
b2cff61
bd56cd0
 
 
17587c9
b2cff61
 
781b291
b2cff61
bd56cd0
b2cff61
bd56cd0
1f7b27b
b3506f3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
---
language: en
datasets:
- squad_v2
license: cc-by-4.0
---

# roberta-base-squad2 for Extractive QA on COVID-19

## Overview
**Language model:** deepset/roberta-base-squad2  
**Language:** English  
**Downstream-task:** Extractive QA  
**Training data:** [SQuAD-style CORD-19 annotations from 23rd April](https://github.com/deepset-ai/COVID-QA/blob/master/data/question-answering/200423_covidQA.json)  
**Code:**  See [an example extractive QA pipeline built with Haystack](https://haystack.deepset.ai/tutorials/34_extractive_qa_pipeline)  
**Infrastructure**: Tesla v100

## Hyperparameters
```
batch_size = 24
n_epochs = 3
base_LM_model = "deepset/roberta-base-squad2"
max_seq_len = 384
learning_rate = 3e-5
lr_schedule = LinearWarmup
warmup_proportion = 0.1
doc_stride = 128
xval_folds = 5
dev_split = 0
no_ans_boost = -100
```
---
license: cc-by-4.0
---

## Performance
5-fold cross-validation on the data set led to the following results:  

**Single EM-Scores:**   [0.222, 0.123, 0.234, 0.159, 0.158]  
**Single F1-Scores:**   [0.476, 0.493, 0.599, 0.461, 0.465]  
**Single top\\_3\\_recall Scores:**   [0.827, 0.776, 0.860, 0.771, 0.777]  
**XVAL EM:**   0.17890995260663506  
**XVAL f1:**   0.49925444207319924  
**XVAL top\\_3\\_recall:**   0.8021327014218009

This model is the model obtained from the **third** fold of the cross-validation.

## Usage

### In Haystack
Haystack is an AI orchestration framework to build customizable, production-ready LLM applications. You can use this model in Haystack to do extractive question answering on documents. 
To load and run the model with [Haystack](https://github.com/deepset-ai/haystack/):
```python
# After running pip install haystack-ai "transformers[torch,sentencepiece]"

from haystack import Document
from haystack.components.readers import ExtractiveReader

docs = [
    Document(content="Python is a popular programming language"),
    Document(content="python ist eine beliebte Programmiersprache"),
]

reader = ExtractiveReader(model="deepset/roberta-base-squad2")
reader.warm_up()

question = "What is a popular programming language?"
result = reader.run(query=question, documents=docs)
# {'answers': [ExtractedAnswer(query='What is a popular programming language?', score=0.5740374326705933, data='python', document=Document(id=..., content: '...'), context=None, document_offset=ExtractedAnswer.Span(start=0, end=6),...)]}
```
For a complete example with an extractive question answering pipeline that scales over many documents, check out the [corresponding Haystack tutorial](https://haystack.deepset.ai/tutorials/34_extractive_qa_pipeline).

### In Transformers
```python
from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline

model_name = "deepset/roberta-base-squad2"

# a) Get predictions
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
QA_input = {
    'question': 'Why is model conversion important?',
    'context': 'The option to convert models between FARM and transformers gives freedom to the user and let people easily switch between frameworks.'
}
res = nlp(QA_input)

# b) Load model & tokenizer
model = AutoModelForQuestionAnswering.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
```


## Authors
**Branden Chan:** branden.chan@deepset.ai      
**Timo M枚ller:** timo.moeller@deepset.ai    
**Malte Pietsch:** malte.pietsch@deepset.ai      
**Tanay Soni:** tanay.soni@deepset.ai    
**Bogdan Kosti膰:** bogdan.kostic@deepset.ai      

## About us

<div class="grid lg:grid-cols-2 gap-x-4 gap-y-3">
    <div class="w-full h-40 object-cover mb-2 rounded-lg flex items-center justify-center">
         <img alt="" src="https://raw.githubusercontent.com/deepset-ai/.github/main/deepset-logo-colored.png" class="w-40"/>
     </div>
     <div class="w-full h-40 object-cover mb-2 rounded-lg flex items-center justify-center">
         <img alt="" src="https://raw.githubusercontent.com/deepset-ai/.github/main/haystack-logo-colored.png" class="w-40"/>
     </div>
</div>

[deepset](http://deepset.ai/) is the company behind the production-ready open-source AI framework [Haystack](https://haystack.deepset.ai/).

Some of our other work: 
- [Distilled roberta-base-squad2 (aka "tinyroberta-squad2")](https://huggingface.co/deepset/tinyroberta-squad2)
- [German BERT](https://deepset.ai/german-bert), [GermanQuAD and GermanDPR](https://deepset.ai/germanquad), [German embedding model](https://huggingface.co/mixedbread-ai/deepset-mxbai-embed-de-large-v1)
- [deepset Cloud](https://www.deepset.ai/deepset-cloud-product), [deepset Studio](https://www.deepset.ai/deepset-studio)

## Get in touch and join the Haystack community

<p>For more info on Haystack, visit our <strong><a href="https://github.com/deepset-ai/haystack">GitHub</a></strong> repo and <strong><a href="https://docs.haystack.deepset.ai">Documentation</a></strong>. 

We also have a <strong><a class="h-7" href="https://haystack.deepset.ai/community">Discord community open to everyone!</a></strong></p>

[Twitter](https://twitter.com/Haystack_AI) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://haystack.deepset.ai/) | [YouTube](https://www.youtube.com/@deepset_ai)

By the way: [we're hiring!](http://www.deepset.ai/jobs)