| | --- |
| | language: en |
| | license: mit |
| | model_id: Covid19_Text_Model |
| | tags: |
| | - text-generation |
| | developers: Matt Stammers |
| | model_type: BERT |
| | model_summary: This model looks to compare texts for relevance to Covid-19 |
| | shared_by: Matt Stammers |
| | finetuned_from: https://thigm85.github.io/data/cord19/cord19-query-title-label.csv |
| | repo: https://huggingface.co/MattStammers/Covid19_Text_Model?text=Comprehensive+overview+of+COVID-19.+Comprehensive+overview+of+Flu |
| | paper: N/A |
| | widget: |
| | - text: "Comprehensive overview of COVID-19. Comprehensive overview of Flu" |
| | example_title: "Covid 19 Article Status. Label_0 = Covid-19 probability" |
| | output: |
| | - label: "Covid-19-article" |
| | score: 0.6 |
| | - label: "Non-Covid-19-article" |
| | score: 0.4 |
| | demo: "https://huggingface.co/MattStammers/Covid19_Text_Model?text=Comprehensive+overview+of+COVID-19.+Comprehensive+overview+of+Flu" |
| | direct_use: Test it out here" |
| | downstream_use: This is a standalone app |
| | out_of_scope_use: >- |
| | The model will not work with any very complex sentences or to compare more |
| | than 3 statements |
| | bias_risks_limitations: >- |
| | Biases inherent in the google BERT base also apply here. Should not be used |
| | for clinical tasks. This is a toy demonstration app only. |
| | bias_recommendations: Do not be surprised if unusual results are obtained |
| | get_started_code: |2- |
| |
|
| | ``` python |
| | |
| | from transformers import pipeline |
| |
|
| | pipe = pipeline("text-classification", model="MattStammers/Covid19_Text_Model") |
| | |
| | from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| |
|
| | tokenizer = AutoTokenizer.from_pretrained("MattStammers/MattStammers/Covid19_Text_Model") |
| | model = AutoModelForSequenceClassification.from_pretrained("MattStammers/Covid19_Text_Model") |
| | ``` |
| | |
| | training_data: https://thigm85.github.io/data/cord19/cord19-query-title-label.csv |
| | preprocessing: Sentence Pairs to analyse similarity |
| | training_regime: User Defined |
| | speeds_sizes_times: Not Relevant |
| | metrics: Not Given |
| | pipeline_tag: text-classification |
| | --- |
| | This is a basic inference BERT model which has been fine-tuned to discriminate between covid19 and non-covid-19 relevant texts. |
| |
|
| | Unlike past models I have created this one raw and uploaded it as a standalone git repo to experiment with upload options. Not as streamlined as using the Huggingface card generation system but definitely simpler to do. |
| |
|
| | This is also my first experiment with ONNX. |
| |
|
| | - The dataset came from Thiago Martins: https://github.com/thigm85 |
| |
|
| | Training data can be obtained as follows: |
| | ```python |
| | import pandas as pd |
| | |
| | training_data = pd.read_csv("https://thigm85.github.io/data/cord19/cord19-query-title-label.csv") |
| | training_data.head() |
| | ``` |
| |
|
| | Please do not use this for any clinical/applied purpose. It is a toy app only. |