File size: 2,858 Bytes
6a88e6d
53c8a90
6a88e6d
53c8a90
e6d0e30
 
53c8a90
 
 
 
b7bbc93
3bfc9bd
53c8a90
e6d0e30
1babc2c
eb8217c
241d45c
 
 
 
 
f1c482f
a0aa528
53c8a90
b7bbc93
 
 
 
 
 
53c8a90
b7bbc93
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53c8a90
 
 
 
1cd1f2f
6a88e6d
b6da0f4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0274d65
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
---
language: en
license: mit
model_id: Covid19_Text_Model
tags:
- text-generation
developers: Matt Stammers
model_type: BERT
model_summary: This model looks to compare texts for relevance to Covid-19
shared_by: Matt Stammers
finetuned_from: https://thigm85.github.io/data/cord19/cord19-query-title-label.csv
repo: https://huggingface.co/MattStammers/Covid19_Text_Model?text=Comprehensive+overview+of+COVID-19.+Comprehensive+overview+of+Flu
paper: N/A
widget: 
- text: "Comprehensive overview of COVID-19. Comprehensive overview of Flu"
  example_title: "Covid 19 Article Status. Label_0 = Covid-19 probability"
  output: 
    - label: "Covid-19-article"
      score: 0.6
    - label: "Non-Covid-19-article"
      score: 0.4
demo: "https://huggingface.co/MattStammers/Covid19_Text_Model?text=Comprehensive+overview+of+COVID-19.+Comprehensive+overview+of+Flu"
direct_use: Test it out here"
downstream_use: This is a standalone app
out_of_scope_use: >-
  The model will not work with any very complex sentences or to compare more
  than 3 statements
bias_risks_limitations: >-
  Biases inherent in the google BERT base also apply here. Should not be used
  for clinical tasks. This is a toy demonstration app only.
bias_recommendations: Do not be surprised if unusual results are obtained
get_started_code: |2-

      ``` python 
      # Use a pipeline as a high-level helper
          from transformers import pipeline

          pipe = pipeline("text-classification", model="MattStammers/Covid19_Text_Model")
      # Load model directly
          from transformers import AutoTokenizer, AutoModelForSequenceClassification

          tokenizer = AutoTokenizer.from_pretrained("MattStammers/MattStammers/Covid19_Text_Model")
          model = AutoModelForSequenceClassification.from_pretrained("MattStammers/Covid19_Text_Model")
      ```
                          
training_data: https://thigm85.github.io/data/cord19/cord19-query-title-label.csv
preprocessing: Sentence Pairs to analyse similarity
training_regime: User Defined
speeds_sizes_times: Not Relevant
metrics: Not Given
pipeline_tag: text-classification
---
This is a basic inference BERT model which has been fine-tuned to discriminate between covid19 and non-covid-19 relevant texts. 

Unlike past models I have created this one raw and uploaded it as a standalone git repo to experiment with upload options. Not as streamlined as using the Huggingface card generation system but definitely simpler to do. 

This is also my first experiment with ONNX.

- The dataset came from Thiago Martins: https://github.com/thigm85

Training data can be obtained as follows:
```python
import pandas as pd

training_data = pd.read_csv("https://thigm85.github.io/data/cord19/cord19-query-title-label.csv")
training_data.head()
```

Please do not use this for any clinical/applied purpose. It is a toy app only.