MKlioc commited on
Commit
32186da
·
verified ·
1 Parent(s): 71089a7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -3
README.md CHANGED
@@ -1,3 +1,70 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: text-classification
3
+ language:
4
+ - nl
5
+ base_model:
6
+ - intfloat/multilingual-e5-small
7
+ license: mit
8
+ ---
9
+ # Model Card
10
+
11
+ This model is a fine-tuned version of [intfloat/multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small). It was fine-tuned on [Factrank](https://github.com/lejafar/FactRank/tree/master/factrank) data with additional samples from Dutch and Belgian parliaments tagged by GPT and Gemini. The primary goal of this model is to determine whether a given statement warrants fact-checking. It does **not** determine whether the statement is factually correct.
12
+ 1 label is given: FR, FNR or NF.
13
+
14
+ **FR**: Factual Relevant (the statement is fact-checkable and requites verification)
15
+ **FNR**: Factual, Not Relevant (the statement can be fact-checked, but the wider relevance is lower)
16
+ **NF**: Not Factual (the statement does not contain information for fact-checking)
17
+
18
+ **Examples**:
19
+ - **FR**: *Toch blijkt uit cijfers van Flanders Investment & Trade dat ons handel met het Verenigd Koninkrijk opnieuw op het niveau ligt van voor de brexit.*
20
+ - **FNR**: *Ayleen werd opgelicht via dating fraude door de Tinder Swindler: "Het zijn net vampiers."*
21
+ - **NF**: *Het heeft weinig zin om zomaar een aantal maatregelen te tonen.*
22
+ **Supported language**: Dutch
23
+
24
+ ## Usage
25
+
26
+ ```python
27
+ from transformers import pipeline
28
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig
29
+ from huggingface_hub import login
30
+
31
+ hf_token = "insert_your_token_here"
32
+ login(token=hf_token)
33
+ config = AutoConfig.from_pretrained("textgain/FactRank_e5_small")
34
+ tokenizer = AutoTokenizer.from_pretrained("textgain/FactRank_e5_small")
35
+ model = AutoModelForSequenceClassification.from_pretrained("textgain/FactRank_e5_small", config=config)
36
+ model.eval()
37
+ pipe = pipeline(model=model, tokenizer=tokenizer, task="text-classification")
38
+
39
+
40
+ sample_texts = [
41
+ "In een wereld die steeds digitaler wordt, moeten we het ook makkelijker maken om de controle over je financiën te hebben.",
42
+ "Ik wil helemaal geen haren tussen u en de heer De Cock leggen.",
43
+ "Je kunt van mening verschillen over welk gevolg je daaraan moet verbinden.",
44
+ "We hebben 4.500 nieuwe kankergevallen in Nederland per jaar als gevolg van alcoholgebruik.",
45
+ "Alcoholgebruik kost de samenleving 2 tot 4 miljard euro.",
46
+ "Dus kan de minister daar vandaag wat meer over zeggen?"
47
+ ]
48
+
49
+ results = pipe(sample_texts)
50
+ predicted_labels = [res["label"] for res in results]
51
+
52
+ ```
53
+
54
+ ## Interpretation of Results
55
+ **Factors Influencing the Label:**
56
+ - **Subjective Evaluation**: The presence of evaluations such as "interesting", "surprising", "incredible" might push the model towards predicting NF.
57
+ - **Research**: The mention of research or studies pushes the model to consider the statement as a verifiable fact.
58
+ - **Context**: Statements made in certain contexts may be more likely to get an FR label, e.g. statements about health and medicine.
59
+
60
+ ## Training Details
61
+ The model was trained on a total of 13 786 data samples.
62
+
63
+ Parameters:
64
+ ```python
65
+ num_epochs = 5
66
+ batch_size = 32
67
+ learning_rate = 1e-5
68
+ dropout = 0.5
69
+ gradient_accumulation_steps = 4
70
+ ```