FactRank_e5_small / README.md

Update README.md

d828029 verified 9 months ago

3.63 kB

	---
	pipeline_tag: text-classification
	language:
	- nl
	base_model:
	- intfloat/multilingual-e5-small
	license: mit
	---
	# Model Card

	This model is a fine-tuned version of [intfloat/multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small). It was fine-tuned on [Factrank](https://github.com/lejafar/FactRank/tree/master/factrank) data with additional machine annotated data from Dutch and Belgian parliamentary proceedings.

	The primary goal of this model is to determine whether a given statement warrants fact-checking. It does not determine whether the statement is factually correct.

	1 label is given: FR, FNR or NF.

	- FR: Factual Relevant (the statement is fact-checkable and requires verification)
	- FNR: Factual, Not Relevant (the statement can be fact-checked, but the wider relevance is lower)
	- NF: Not Factual (the statement does not contain information for fact-checking)

	Examples:
	- FR: Toch blijkt uit cijfers van Flanders Investment & Trade dat ons handel met het Verenigd Koninkrijk opnieuw op het niveau ligt van voor de brexit.
	- FNR: Ayleen werd opgelicht via dating fraude door de Tinder Swindler: "Het zijn net vampiers."
	- NF: Het heeft weinig zin om zomaar een aantal maatregelen te tonen.

	Supported language: Dutch

	## Usage

	```python
	from transformers import pipeline
	from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig
	from huggingface_hub import login

	config = AutoConfig.from_pretrained("textgain/FactRank_e5_small")
	tokenizer = AutoTokenizer.from_pretrained("textgain/FactRank_e5_small")
	model = AutoModelForSequenceClassification.from_pretrained("textgain/FactRank_e5_small", config=config)
	pipe = pipeline(model=model, tokenizer=tokenizer, task="text-classification")


	sample_texts = [
	"In een wereld die steeds digitaler wordt, moeten we het ook makkelijker maken om de controle over je financiën te hebben.",
	"Ik wil helemaal geen haren tussen u en de heer De Cock leggen.",
	"Je kunt van mening verschillen over welk gevolg je daaraan moet verbinden.",
	"We hebben 4.500 nieuwe kankergevallen in Nederland per jaar als gevolg van alcoholgebruik.",
	"Alcoholgebruik kost de samenleving 2 tot 4 miljard euro.",
	"Dus kan de minister daar vandaag wat meer over zeggen?"
	]

	results = pipe(sample_texts)
	predicted_labels = [res["label"] for res in results]
	```

	## Interpretation of Results
	Factors Influencing the Label:
	- Subjective Evaluation: The presence of evaluations such as "interesting", "surprising", "incredible" might push the model towards predicting NF.
	- Research: The mention of research or studies pushes the model to consider the statement as a verifiable fact.
	- Context: Statements made in certain contexts may be more likely to get an FR label, e.g. statements about health and medicine.

	## Training Details
	The model was trained on a total of 13 786 data samples.

	Parameters:
	```python
	num_epochs = 5
	batch_size = 32
	learning_rate = 1e-5
	dropout = 0.5
	gradient_accumulation_steps = 4
	```

	## Acknowledgment

	<img src="https://benedmo.eu/wp-content/themes/benedmo/img/logo.svg" alt="BENEDMO Logo" width="200">

	This transformer was made in the context of the [BENEDMO](https://www.benedmo.eu) project. BENEDMO brings together a network of expertise on disinformation and fact-checking. Through a Flemish-Dutch collaboration, BENEDMO aims to address the impact and challenges of disinformation.

	[BENEDMO](https://www.benedmo.eu) has received funding from the European Union under Grant Agreement number 101158277-BENEDMO-2023-DEPLOY-04.