|
|
--- |
|
|
pipeline_tag: text-classification |
|
|
language: |
|
|
- nl |
|
|
base_model: |
|
|
- intfloat/multilingual-e5-small |
|
|
license: mit |
|
|
--- |
|
|
# Model Card |
|
|
|
|
|
This model is a fine-tuned version of [intfloat/multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small). It was fine-tuned on [Factrank](https://github.com/lejafar/FactRank/tree/master/factrank) data with additional machine annotated data from Dutch and Belgian parliamentary proceedings. |
|
|
|
|
|
The primary goal of this model is to determine whether a given statement warrants fact-checking. It does **not** determine whether the statement is factually correct. |
|
|
|
|
|
1 label is given: FR, FNR or NF. |
|
|
|
|
|
- **FR**: Factual Relevant (the statement is fact-checkable and requires verification) |
|
|
- **FNR**: Factual, Not Relevant (the statement can be fact-checked, but the wider relevance is lower) |
|
|
- **NF**: Not Factual (the statement does not contain information for fact-checking) |
|
|
|
|
|
**Examples**: |
|
|
- **FR**: *Toch blijkt uit cijfers van Flanders Investment & Trade dat ons handel met het Verenigd Koninkrijk opnieuw op het niveau ligt van voor de brexit.* |
|
|
- **FNR**: *Ayleen werd opgelicht via dating fraude door de Tinder Swindler: "Het zijn net vampiers."* |
|
|
- **NF**: *Het heeft weinig zin om zomaar een aantal maatregelen te tonen.* |
|
|
|
|
|
**Supported language**: Dutch |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import pipeline |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig |
|
|
from huggingface_hub import login |
|
|
|
|
|
config = AutoConfig.from_pretrained("textgain/FactRank_e5_small") |
|
|
tokenizer = AutoTokenizer.from_pretrained("textgain/FactRank_e5_small") |
|
|
model = AutoModelForSequenceClassification.from_pretrained("textgain/FactRank_e5_small", config=config) |
|
|
pipe = pipeline(model=model, tokenizer=tokenizer, task="text-classification") |
|
|
|
|
|
|
|
|
sample_texts = [ |
|
|
"In een wereld die steeds digitaler wordt, moeten we het ook makkelijker maken om de controle over je financiën te hebben.", |
|
|
"Ik wil helemaal geen haren tussen u en de heer De Cock leggen.", |
|
|
"Je kunt van mening verschillen over welk gevolg je daaraan moet verbinden.", |
|
|
"We hebben 4.500 nieuwe kankergevallen in Nederland per jaar als gevolg van alcoholgebruik.", |
|
|
"Alcoholgebruik kost de samenleving 2 tot 4 miljard euro.", |
|
|
"Dus kan de minister daar vandaag wat meer over zeggen?" |
|
|
] |
|
|
|
|
|
results = pipe(sample_texts) |
|
|
predicted_labels = [res["label"] for res in results] |
|
|
``` |
|
|
|
|
|
## Interpretation of Results |
|
|
**Factors Influencing the Label:** |
|
|
- **Subjective Evaluation**: The presence of evaluations such as "interesting", "surprising", "incredible" might push the model towards predicting NF. |
|
|
- **Research**: The mention of research or studies pushes the model to consider the statement as a verifiable fact. |
|
|
- **Context**: Statements made in certain contexts may be more likely to get an FR label, e.g. statements about health and medicine. |
|
|
|
|
|
## Training Details |
|
|
The model was trained on a total of 13 786 data samples. |
|
|
|
|
|
Parameters: |
|
|
```python |
|
|
num_epochs = 5 |
|
|
batch_size = 32 |
|
|
learning_rate = 1e-5 |
|
|
dropout = 0.5 |
|
|
gradient_accumulation_steps = 4 |
|
|
``` |
|
|
|
|
|
## Acknowledgment |
|
|
|
|
|
<img src="https://benedmo.eu/wp-content/themes/benedmo/img/logo.svg" alt="BENEDMO Logo" width="200"> |
|
|
|
|
|
This transformer was made in the context of the [BENEDMO](https://www.benedmo.eu) project. BENEDMO brings together a network of expertise on disinformation and fact-checking. Through a Flemish-Dutch collaboration, BENEDMO aims to address the impact and challenges of disinformation. |
|
|
|
|
|
[BENEDMO](https://www.benedmo.eu) has received funding from the European Union under Grant Agreement number 101158277-BENEDMO-2023-DEPLOY-04. |
|
|
|
|
|
|