textgain
/

FactRank_e5_small

+---
+pipeline_tag: text-classification
+language:
+- nl
+base_model:
+- intfloat/multilingual-e5-small
+license: mit
+---
+# Model Card
+This model is a fine-tuned version of [intfloat/multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small). It was fine-tuned on [Factrank](https://github.com/lejafar/FactRank/tree/master/factrank) data with additional samples from Dutch and Belgian parliaments tagged by GPT and Gemini. The primary goal of this model is to determine whether a given statement warrants fact-checking. It does **not** determine whether the statement is factually correct.
+1 label is given: FR, FNR or NF.
+**FR**: Factual Relevant (the statement is fact-checkable and requites verification)
+**FNR**: Factual, Not Relevant (the statement can be fact-checked, but the wider relevance is lower)
+**NF**: Not Factual (the statement does not contain information for fact-checking)
+**Examples**:
+- **FR**: *Toch blijkt uit cijfers van Flanders Investment & Trade dat ons handel met het Verenigd Koninkrijk opnieuw op het niveau ligt van voor de brexit.*
+- **FNR**: *Ayleen werd opgelicht via dating fraude door de Tinder Swindler: "Het zijn net vampiers."*
+- **NF**: *Het heeft weinig zin om zomaar een aantal maatregelen te tonen.*
+**Supported language**: Dutch
+## Usage
+```python
+from transformers import pipeline
+from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig
+from huggingface_hub import login
+hf_token = "insert_your_token_here"
+login(token=hf_token)
+config = AutoConfig.from_pretrained("textgain/FactRank_e5_small")
+tokenizer = AutoTokenizer.from_pretrained("textgain/FactRank_e5_small")
+model = AutoModelForSequenceClassification.from_pretrained("textgain/FactRank_e5_small", config=config)
+model.eval()
+pipe = pipeline(model=model, tokenizer=tokenizer, task="text-classification")
+sample_texts = [
+    "In een wereld die steeds digitaler wordt, moeten we het ook makkelijker maken om de controle over je financiën te hebben.",
+    "Ik wil helemaal geen haren tussen u en de heer De Cock leggen.",
+    "Je kunt van mening verschillen over welk gevolg je daaraan moet verbinden.",
+    "We hebben 4.500 nieuwe kankergevallen in Nederland per jaar als gevolg van alcoholgebruik.",
+    "Alcoholgebruik kost de samenleving 2 tot 4 miljard euro.",
+    "Dus kan de minister daar vandaag wat meer over zeggen?"
+    ]
+results = pipe(sample_texts)
+predicted_labels = [res["label"] for res in results]
+```
+## Interpretation of Results
+**Factors Influencing the Label:**
+- **Subjective Evaluation**: The presence of evaluations such as "interesting", "surprising", "incredible" might push the model towards predicting NF.
+- **Research**: The mention of research or studies pushes the model to consider the statement as a verifiable fact.
+- **Context**: Statements made in certain contexts may be more likely to get an FR label, e.g. statements about health and medicine.
+## Training Details
+The model was trained on a total of 13 786 data samples.
+Parameters:
+```python
+num_epochs = 5
+batch_size = 32
+learning_rate = 1e-5
+dropout = 0.5
+gradient_accumulation_steps = 4
+```