| --- |
| language: en |
| license: mit |
| model_id: Covid19_Text_Model |
| tags: |
| - text-generation |
| developers: Matt Stammers |
| model_type: BERT |
| model_summary: This model looks to compare texts for relevance to Covid-19 |
| shared_by: Matt Stammers |
| finetuned_from: https://thigm85.github.io/data/cord19/cord19-query-title-label.csv |
| repo: https://huggingface.co/MattStammers/Covid19_Text_Model?text=Comprehensive+overview+of+COVID-19.+Comprehensive+overview+of+Flu |
| paper: N/A |
| widget: |
| - text: "Comprehensive overview of COVID-19. Comprehensive overview of Flu" |
| example_title: "Covid 19 Article Status. Label_0 = Covid-19 probability" |
| output: |
| - label: "Covid-19-article" |
| score: 0.6 |
| - label: "Non-Covid-19-article" |
| score: 0.4 |
| demo: "https://huggingface.co/MattStammers/Covid19_Text_Model?text=Comprehensive+overview+of+COVID-19.+Comprehensive+overview+of+Flu" |
| direct_use: Test it out here" |
| downstream_use: This is a standalone app |
| out_of_scope_use: >- |
| The model will not work with any very complex sentences or to compare more |
| than 3 statements |
| bias_risks_limitations: >- |
| Biases inherent in the google BERT base also apply here. Should not be used |
| for clinical tasks. This is a toy demonstration app only. |
| bias_recommendations: Do not be surprised if unusual results are obtained |
| get_started_code: |2- |
|
|
| ``` python |
| |
| from transformers import pipeline |
|
|
| pipe = pipeline("text-classification", model="MattStammers/Covid19_Text_Model") |
| |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
| tokenizer = AutoTokenizer.from_pretrained("MattStammers/MattStammers/Covid19_Text_Model") |
| model = AutoModelForSequenceClassification.from_pretrained("MattStammers/Covid19_Text_Model") |
| ``` |
| |
| training_data: https://thigm85.github.io/data/cord19/cord19-query-title-label.csv |
| preprocessing: Sentence Pairs to analyse similarity |
| training_regime: User Defined |
| speeds_sizes_times: Not Relevant |
| metrics: Not Given |
| pipeline_tag: text-classification |
| --- |
| This is a basic inference BERT model which has been fine-tuned to discriminate between covid19 and non-covid-19 relevant texts. |
|
|
| Unlike past models I have created this one raw and uploaded it as a standalone git repo to experiment with upload options. Not as streamlined as using the Huggingface card generation system but definitely simpler to do. |
|
|
| This is also my first experiment with ONNX. |
|
|
| - The dataset came from Thiago Martins: https://github.com/thigm85 |
|
|
| Training data can be obtained as follows: |
| ```python |
| import pandas as pd |
| |
| training_data = pd.read_csv("https://thigm85.github.io/data/cord19/cord19-query-title-label.csv") |
| training_data.head() |
| ``` |
|
|
| Please do not use this for any clinical/applied purpose. It is a toy app only. |