MorcuendeA commited on
Commit
191b62e
·
verified ·
1 Parent(s): 90c0077

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -9
README.md CHANGED
@@ -9,14 +9,54 @@ metrics:
9
  model-index:
10
  - name: MulderFinders
11
  results: []
 
 
 
 
12
  ---
13
 
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
 
 
16
 
17
  # MulderFinders
18
 
19
- This model is a fine-tuned version of [EuroBERT/EuroBERT-210m](https://huggingface.co/EuroBERT/EuroBERT-210m) on an unknown dataset.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  It achieves the following results on the evaluation set:
21
  - Loss: 0.0059
22
  - Accuracy: 0.9981
@@ -24,15 +64,28 @@ It achieves the following results on the evaluation set:
24
 
25
  ## Model description
26
 
27
- More information needed
 
 
28
 
29
  ## Intended uses & limitations
30
 
31
- More information needed
 
 
 
 
 
 
32
 
 
 
 
 
33
  ## Training and evaluation data
34
 
35
- More information needed
 
36
 
37
  ## Training procedure
38
 
@@ -62,7 +115,7 @@ The following hyperparameters were used during training:
62
 
63
  ### Framework versions
64
 
65
- - Transformers 4.54.0
66
  - Pytorch 2.6.0+cu124
67
- - Datasets 4.0.0
68
- - Tokenizers 0.21.2
 
9
  model-index:
10
  - name: MulderFinders
11
  results: []
12
+ datasets:
13
+ - MorcuendeA/ConspiraText-ES
14
+ language:
15
+ - es
16
  ---
17
 
18
+ ![MulderFinders Logo](./i_want_to_belive.png)
19
+
20
+
21
+ # MulderFinders
22
 
23
  # MulderFinders
24
 
25
+ The truth is out there... and this model is here to help you find it.
26
+
27
+ **MulderFinders** is a fine-tuned version of [EuroBERT/EuroBERT-210m](https://huggingface.co/EuroBERT/EuroBERT-210m), trained on [MorcuendeA/ConspiraText-ES](https://huggingface.co/datasets/MorcuendeA/ConspiraText-ES), a dataset full of Spanish-language conspiratorial and non-conspiratorial text. Whether it's aliens, 5G towers, or secret societies, this model is ready to classify them all.
28
+
29
+ Trust no one... except maybe the F1 score.
30
+
31
+
32
+ ## Usage
33
+
34
+ You can use the model directly with the 🤗 Transformers library:
35
+
36
+
37
+ ```python
38
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
39
+ import torch
40
+
41
+ model_name = "MorcuendeA/MulderFinders"
42
+
43
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
44
+ model = AutoModelForSequenceClassification.from_pretrained(model_name, trust_remote_code=True)
45
+
46
+ text = "las redes 5G nos ayudan a tener mejor internet"
47
+
48
+ inputs = tokenizer(text, return_tensors="pt")
49
+ outputs = model(**inputs)
50
+ logits = outputs.logits
51
+ probs = torch.softmax(logits, dim=1) [0]
52
+ labels = model.config.id2label
53
+ pred = torch.argmax(probs).item()
54
+ print(f"Prediction: {labels[pred]} ({probs[pred].item():.4f})")
55
+
56
+ # Output:
57
+ # Prediction: rational (0.9989)
58
+ ```
59
+
60
  It achieves the following results on the evaluation set:
61
  - Loss: 0.0059
62
  - Accuracy: 0.9981
 
64
 
65
  ## Model description
66
 
67
+ Model description
68
+
69
+ **MulderFinders** is a Spanish-language text classification model fine-tuned to detect conspiracy-related content. It is based on [EuroBERT/EuroBERT-210m](https://huggingface.co/EuroBERT/EuroBERT-210m), a transformer model pre-trained on multiple European languages. MulderFinders performs binary classification, identifying whether a given piece of text expresses conspiratorial ideas or not.
70
 
71
  ## Intended uses & limitations
72
 
73
+ **Intended uses:**
74
+
75
+ - Content moderation on social media or online forums.
76
+ - Research and analysis of conspiratorial discourse in Spanish-language texts.
77
+ - Assisting fact-checking workflows by flagging potentially conspiratorial statements.
78
+
79
+ **Limitations:**
80
 
81
+ - May not handle sarcasm, irony, or ambiguous language reliably.
82
+ - Performance outside the original domain (i.e., texts similar to the training dataset) may degrade.
83
+ - May reflect biases present in the training data.
84
+
85
  ## Training and evaluation data
86
 
87
+ The model was fine-tuned using the [ConspiraText-ES](https://huggingface.co/datasets/MorcuendeA/ConspiraText-ES) dataset, which contains Spanish-language examples labeled as conspiratorial or not. The dataset includes only synthetic text samples, covering various conspiracy-related themes.
88
+ During fine-tuning, regularization was applied with **attention_dropout** and **hidden_dropout** both set to 0.2.
89
 
90
  ## Training procedure
91
 
 
115
 
116
  ### Framework versions
117
 
118
+ - Transformers 4.53.2
119
  - Pytorch 2.6.0+cu124
120
+ - Datasets 2.14.4
121
+ - Tokenizers 0.21.2