MattStammers
/

Covid19_Text_Model

Text Classification

text-generation

text-embeddings-inference

Model card Files Files and versions

Covid19_Text_Model / README.md

MattStammers's picture

Update README.md

eb8217c over 2 years ago

|

2.86 kB

	---
	language: en
	license: mit
	model_id: Covid19_Text_Model
	tags:
	- text-generation
	developers: Matt Stammers
	model_type: BERT
	model_summary: This model looks to compare texts for relevance to Covid-19
	shared_by: Matt Stammers
	finetuned_from: https://thigm85.github.io/data/cord19/cord19-query-title-label.csv
	repo: https://huggingface.co/MattStammers/Covid19_Text_Model?text=Comprehensive+overview+of+COVID-19.+Comprehensive+overview+of+Flu
	paper: N/A
	widget:
	- text: "Comprehensive overview of COVID-19. Comprehensive overview of Flu"
	example_title: "Covid 19 Article Status. Label_0 = Covid-19 probability"
	output:
	- label: "Covid-19-article"
	score: 0.6
	- label: "Non-Covid-19-article"
	score: 0.4
	demo: "https://huggingface.co/MattStammers/Covid19_Text_Model?text=Comprehensive+overview+of+COVID-19.+Comprehensive+overview+of+Flu"
	direct_use: Test it out here"
	downstream_use: This is a standalone app
	out_of_scope_use: >-
	The model will not work with any very complex sentences or to compare more
	than 3 statements
	bias_risks_limitations: >-
	Biases inherent in the google BERT base also apply here. Should not be used
	for clinical tasks. This is a toy demonstration app only.
	bias_recommendations: Do not be surprised if unusual results are obtained
	get_started_code: \|2-

	``` python
	# Use a pipeline as a high-level helper
	from transformers import pipeline

	pipe = pipeline("text-classification", model="MattStammers/Covid19_Text_Model")
	# Load model directly
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	tokenizer = AutoTokenizer.from_pretrained("MattStammers/MattStammers/Covid19_Text_Model")
	model = AutoModelForSequenceClassification.from_pretrained("MattStammers/Covid19_Text_Model")
	```

	training_data: https://thigm85.github.io/data/cord19/cord19-query-title-label.csv
	preprocessing: Sentence Pairs to analyse similarity
	training_regime: User Defined
	speeds_sizes_times: Not Relevant
	metrics: Not Given
	pipeline_tag: text-classification
	---
	This is a basic inference BERT model which has been fine-tuned to discriminate between covid19 and non-covid-19 relevant texts.

	Unlike past models I have created this one raw and uploaded it as a standalone git repo to experiment with upload options. Not as streamlined as using the Huggingface card generation system but definitely simpler to do.

	This is also my first experiment with ONNX.

	- The dataset came from Thiago Martins: https://github.com/thigm85

	Training data can be obtained as follows:
	```python
	import pandas as pd

	training_data = pd.read_csv("https://thigm85.github.io/data/cord19/cord19-query-title-label.csv")
	training_data.head()
	```

	Please do not use this for any clinical/applied purpose. It is a toy app only.